Inferencing in the Large: Semantic Integration of Open-Data Tables

Inferencing in the Large (ITL) is a research problem encompassing knowledge extraction, knowledge organisation and knowledge retrieval from open structured data, especially from the Indian Open Government Data.

With vast amounts of tabular data freely available under several Open-Data initiatives, consumption of information depends upon the perspectives of the consumer. These perspectives can be viewed as various contexts the data can be placed in and analysed. Extraction and Organisation of these contexts are non-trivial and we address the problem using semantic integration of open structured data. A collection of open datasets can map to similar contexts (themes) and a single table can map to different themes. ITL presents a model that semantically integrates and aggregates open data in a data mesh of applicable inter-related contexts. Sandesh 1.0 are Sandesh-RDF (v 2.0) are implementations of ITL using open government data from the Indian Open Government Data portal. We use the Linked Open Data (LOD) to associate semantics to the data. The MWF (Many Worlds on a Frame) knowledge framework has been implemented using RDF N-Quads to represent the knowledge extraction in Sandesh-RDF (v 2.0). Sandesh-RDF queries the knowledge graph created from the N-Quads which is the semantic representation of data from data.gov.in. The previous version of Sandesh used the default SQLlite implementation of the MWF framework.