Inferencing in the Large: Semantic Integration of Open-Data Tables

Inferencing in the Large (ITL) is a research problem encompassing knowledge extraction, knowledge organisation and knowledge retrieval from open structured data, especially from the Indian Open Government Data.

With vast amounts of tabular data freely available under several Open-Data initiatives, consumption of information depends upon the perspectives of the consumer. These perspectives can be viewed as various contexts the data can be placed in and analysed. Extraction and Organisation of these contexts are non-trivial and we address the problem using semantic integration of open structured data. A collection of open datasets can map to similar contexts (themes) and a single table can map to different themes. ITL presents a model that semantically integrates and aggregates open data in a data mesh of applicable inter-related contexts. Sandesh 1.0 are Sandesh-RDF (v 2.0) are implementations of ITL using open government data from the Indian Open Government Data portal. We use the Linked Open Data (LOD) to associate semantics to the data. The MWF (Many Worlds on a Frame) knowledge framework has been implemented using RDF N-Quads to represent the knowledge extraction in Sandesh-RDF (v 2.0). Sandesh-RDF queries the knowledge graph created from the N-Quads which is the semantic representation of data from data.gov.in. The previous version of Sandesh used the default SQLlite implementation of the MWF framework.

Reach

Sponsors and Collaborators: Horizon 2020, European Commission

Time Frame: Jan 2016 – Dec 2020

Status – Active

The REACH project aims to develop solution to avail the provision for high speed Internet access in rural India using unlicensed TV white space spectrum and designing the Geolocation database for it. With the wide increase of population and use of Internet in India, the efficient utilization and management of spectrum is needed. The utilization of TV white space spectrum is emerging as a best alternative to fulfill this need since there are many unused channel in TV spectrum due to migration from analog to digital transmission technology.

At IIIT-B, we are working on Distributed Algorithms for Spectrum Assignment for White Space Devices. Spectrum assignment for devices in white-space spectrum is challenging due to the fact that, white-space spectrum has temporal and spatial variations and is most often fragmented. We have created an autonomous agent model for spectrum assignment of white space devices at a given location. Each white space device (WSD) acts autonomously out of self-interest, choosing a strategy from its bag of strategies. It obtains a payoff based on its choice and choices made by all other agents. WSDs interact with each other using a central shared memory located at a “Master” device. Based on payoffs received by different strategies, WSDs evolve their strategic profile over time. This has the effect of “demographic changes” in the population. The system is said to have reached a state of equilibrium (or, in a state of evolutionary best-response) when the demographic profile stabilises. The system is trained on different load profiles to compute their respective evolutionary best responses.

Project Outcomes

  1. Chaitali Diwan. Autonomous Spectrum Assignment of White Space devices. MTech thesis. June 2016.
  2. Chaitali Diwan, Srinath Srinivasa, Bala Murali Krishna. Autonomous Spectrum Assignment of White Space Devices. Proceedings of the 12th EAI International Conference on Cognitive Radio Oriented Wireless Networks. Lisbon, Portugal. September 2017.
  3. Simulation dashboard for autonomous spectrum allocation algorithm

Other Relevant Links

Sandesh

Sandesh is Semantic Data Mesh for publishing of Knowledge aggregated from Indian Open Data. Open structured data is published by several agencies like World Health Organization (WHO), United Nations Organization (UNO), private firms, NGOs, governmental bodies etc. Government of India publishes open data on its data portal called data.gov.in. To aggregate and integrate data from disparate datasets,  a framework called Many Worlds on a Frame (MWF) is proposed. The framework is partially implemented in software called RootSet on top of which, the module Sandesh is implemented.

Center of Excellence in Big Data Engineering

This project aims to establish a collaboration between the International Institute of Information Technology (IIIT-B), with City University London, as the UK partner and Siemens Research, India, as the industry partner, to set up a centre of excellence in Big Data Engineering. With emerging trends like Web Science and the Internet of Things, expertise in Big Data is going to be in high demand in the future.

As part of our initiatives to create a talent pool of research and engineering expertise, IIIT-B has collaborated with several partners in this area on specific projects. This project aims to consolidate our disparate activities in this area and create a Centre of Excellence in Big Data Engineering. The term “Big Data” is defined here to mean any kind of data management problem for which, conventional RDBMS based solutions are inadequate. The “Big” refers to not just the volume of data, but also challenges concerning variety, veracity and velocity of the data.

This centre is hosted by the Web Science Lab at IIIT-B.

Members

  • Prof. Srinath Srinivasa
  • Prof. Vinu E. Venugopal
  • Apurva Kulkarni, Postdoc
  • Praseeda, Research Scholar
  • Raksha, Research Scholar
  • Anish, MTech. Thesis Student

Collaborators

  • Prof Muttukrishnan Rajarajan, City University, Northampton Square, London,
    United Kingdom
  • Dr. Amarnath Bose, Siemens Technology and Service, Bangalore

Activities

The centre focuses on integrating open datasets– specially Open Government Data (OGD) and building AI models that can help explain causal dependencies between several variables and indicators pertaining to Sustainable Development Goals (SDGs).

This project involves the creation of Big Data processing pipelines to process different kinds of datasets and create case files for one or more SDG indicators, showing factors that are highly correlated with them. Based on this case file, we build AI models that can potentially identify causal dependencies between these factors and the indicator.

Based on these models, we now perform– predictive or “what if” analysis, and prescriptive analysis. The former is an exploratory exercise that predicts the expected impact of a policy change on SDG indicators in different geographical regions. The latter is another form of exploratory exercise that prescribes values of affecting factors for bringing a given indicator towards its intended target.

We have also developed models for assessing the stability of policy interventions, asking whether a given outcome due to an intervention will sustain over time, or will it revert back to its earlier state, due to disparity in outcomes.

This project also has matching funding from the Planning Dept of the Govt of Karnataka, which supports project staff who develop interactive dashboards based on the models generated, for use by policy makers. All research activities carried out under this project are supported by the BDE centre.

Events

Associated Projects

  • Open City: The project looks at managing large-scale access control of IOT devices data in a secure fashion.
  • Cogno Web Observatory

Reports

Publications

  • Aniket Mitra and Vinu Venugopal. Enhancing Region-Based Geometric Embedding for Gene-Disease Associations. 7th International Conference on Data Science and Management of Data (CODS-COMAD 2024), Bangalore, India, Jan 2024
  • Apurva Kulkarni, Pooja Bassin, Niharika Sri Parasa, Srinath Srinivasa, Vinu EV, Chandrashekar Ramanathan. Ontology Augmented Data Lake System for Policy Support. 10th International Conference on Big Data Analytics in Astronomy, Science and Engineering (BASE) December 05 – 07, 2022
  • Srinivasa S., Pavagada Subbanarasimha R. (2018) Design of the Cogno Web Observatory for Characterizing Online Social Cognition. In: Anirban Mondal, Himanshu Gupta, Jaideep Srivastava, P.Krishna Reddy, D.V.L.N. Somayajulu. (eds) Big Data Analytics. BDA 2018. Lecture Notes in Computer Science. Springer, Cham.
  • Raksha Pavagada Subbanarasimha, Lokesh Todwal, Mamillapalli Rachana, Aditya Naidu, and Srinath Srinivasa. 2018. Mithya: A Framework For Identifying Opinion Drivers On Social Media. Demo at ACM IKDD Conference on Data Science and International Conference on Management of Data, Goa, India, Jan 2018 (CODS-COMAD 2018).
  • Anish Bhanushali, Raksha Pavagada Subbanarasimha, and Srinath Srinivasa. 2017. Identifying Opinion Drivers on Social Media. In On the Move to Meaningful Internet Systems. OTM 2017 Conferences: Confederated International Conferences: CoopIS, C&TC, and ODBASE 2017, Rhodes, Greece, October 23-27, 2017, Proceedings, Part II. Springer International Publishing, Cham, 242–253.

Tweet Summarization

The project involves extracting and collating important information from large volume of  short reports.

  •  Characterization of important entities and actions
  • Mine and associate semantics into entities and actions
  • Semi-automate the summary generation process by generating a set of candidate sentences
  • Based on key entities and key actions of interest
  •  User feedback to refine the sentences

Cogno Web Observatory

It is important to occasionally remember that the World Wide Web (WWW) is the largest information network the world has ever seen. Just about every sphere of human activity has been altered in some way, due to the web. Our understanding of the web has been evolving over the past few decades ever since it was born. In its early days, the web was
seen just as an unstructured hypertext document collection. However, over time, we have come to model the web as a global, participatory, socio-cognitive space. One of the consequences of modeling the web as a space rather than as a tool, is the emergence of the concept of Web observatories. These are application programs that are meant to observe and curate data about online phenomena. This paper details the design of a Web observatory called Cogno, that is meant to observe online social cognition. Social cognition refers to the way social discourses lead to the formation of collective worldviews. As part of the design of Cogno, we also propose a computational model for characterizing social cognition. Social media is modeled as a “marketplace of opinions” where different opinions come together to form “narratives” that not only drive the discourse, but may also bring some form of returns to the opinion holders. The problem of characterizing social cognition is defined as breaking down a social discourse into its constituent narratives, and for each narrative, its key opinions, and the key people driving the narrative.

  • Demonstration:
  • Current Project Members
  • Previous Project Members
    • Nimisha Garg
    • Kavish Agnihotri
    • Vaishnavi Jerry
    • Komal Popli
    • Kashish Jain
    • Aadhithya Ramesh
    • Shreyas Iyer
    • Mamillapalli Rachana
    • Meghana Kotagiri
    • Aditya Naidu
    • Lokesh Todwal
    • Anish Bhanushali
    • Pulkit Aneja
    • Pushp Ranjan
  • Publications
    • Raksha Pavagada Subbanarasimha, Srinath Srinivasa and Sridhar Mandyam, “Invisible Stories That Drive Online Social Cognition,” in IEEE Transactions on Computational Social Systems, vol. 7, no. 5, pp. 1264-1277, Oct. 2020, doi: 10.1109/TCSS.2020.3009474.
    • Raksha Pavagada Subbanarasimha. 2019. Designing the Cogno-Web Observatory: To Characterize the Dynamics of Online Social Cognition. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (WSDM ’19). ACM, New York, NY, USA, 814-815. DOI: https://doi.org/10.1145/3289600.3291600.
    • Srinivasa S., Pavagada Subbanarasimha R. (2018) Design of the Cogno Web Observatory for Characterizing Online Social Cognition. In: Anirban Mondal, Himanshu Gupta, Jaideep Srivastava, P.Krishna Reddy, D.V.L.N. Somayajulu. (eds) Big Data Analytics. BDA 2018. Lecture Notes in Computer Science. Springer, Cham.
    • Raksha Pavagada Subbanarasimha, Lokesh Todwal, Mamillapalli Rachana, Aditya Naidu, and Srinath Srinivasa. 2018. Mithya: A Framework For Identifying Opinion Drivers On Social Media. Demo at ACM IKDD Conference on Data Science and International Conference on Management of Data, Goa, India, Jan 2018 (CODS-COMAD 2018).
    • Anish Bhanushali, Raksha Pavagada Subbanarasimha, and Srinath Srinivasa. 2017. Identifying Opinion Drivers on Social Media. In On the Move to Meaningful Internet Systems. OTM 2017 Conferences: Confederated International Conferences: CoopIS, C&TC, and ODBASE 2017, Rhodes, Greece, October 23-27, 2017, Proceedings, Part II. Springer International Publishing, Cham, 242–253.

Narrative Arc for Effective Learning

Narrative Arc is one of the research projects under the umbrella of Navigated Learning project at Gooru Labs, IIITB.

The Narrative Arc refers to presenting the sequence of learning activities as a narrative to the learner to make learning interesting and to help the learner navigate seamlessly through the learning space. The project has two parts: First is creating the learning pathways automatically given a corpus of learning resources, such that the generated pathways are semantically coherent and pedagogically progressive. Second part is modelling an AI-based automatic conversational agent which makes the learning pathway interesting and adapts the learning pathways according to the users knowledge and preferences. Here, the learning pathway is first presented to the user according to her learning goal, then the conversation agent interacts with the learner to keep the user interested in the learning pathway and to augment her knowledge. The agent also gauges the knowledge of the learner and supports the learner by providing knowledge and if required re-route the learner through a different learning pathway.

Following link has the presentation for the project in RISE 2019 workshop held at IIITB on 14-16 Feb 2019. The title of the presentation is “Narrative Arc Computation towards Digital Empowerment”. Narrative Arc Computation

  • Lead Researcher
  • Current Project Members
    • Mirambika Sikdar(Summer Intern)
  • Previous Project Members
    • Nikhil Bukka Sai
    • Sai Sri Harsha Vallabhuni
    • Rochan Avlur ( Intern)
    • Niharika Chaudhari (Intern)
    • Vibhav Agarwal
    • Abhiramon R
    • Sanket Kutumbe
    • Karan Kumar Gupta
    • Srinivasan P.S
  • Publications

Navigated Learning

Figure 1: Navigated Learning

Navigated Learning is a new paradigm of learning that aims to balance the three independent requirements: Scale, Personalization and Social Interactions. Please see figure 1 that shows parallels among the concepts that are technological solutions and the three requirements of learning.

This is achieved by representing learning as situated within an abstract “competency space,” and computing semantic embeddings of learning objects and learners into the competency map. The competency map is organized as a progression space– which is a metric space with a partial order. Here, not only is there a notion of “distance” between any two points, but also an element of “progress”. These embeddings can be computed for any semantic object like learning resources, activities, learners, etc. Each point in the space represents a “competency” or a demonstrable skill that can be acquired by the learner.

A primary element of research into Navigated Learning is to construct a competency map for a given subject area of study and to build semantic embedding models for different kinds of objects relevant to the learning process. Semantic embeddings may take different forms depending on the nature of the object. While some objects can be neatly represented as points in the logical space, other objects may be represented by regions, pathways or other contours in the space. In an organizational setting, objects that are embedded onto this space include not just learning resources and learners, but also departments, projects and other organizational elements that require or work with relevant skill sets represented in the competency map.

Navigated learning is manged by a “Learning Navigator” with which every learner interacts. The Learning Navigator (or just, navigator), continuously interacts with the learning map and the learner to perform the following:

Locate: Based on data about their activities and outcomes from formal assessments, the “Locate” module of the navigator embeds learners in the space, and continuously updates their location. Unlike a geographical space, a learner may have acquired several competencies in the competency space. Thus, their location is not identified by a point, but by a data structure called a Skyline, that is detailed in a later section.

Curate: Once a learner’s location is known, based on their stated goals or recently acquired competencies, a set of further candidate competencies are identified. Curating is based on competency modeling principles, that identifies complementary, supplementary and conflicting competencies.

Mediate: This is the logic by which the navigator navigates the learner by making suggestions. Mediation is based on computing an underlying “Narrative Arc” that computes a semantically coherent and meaningful learning sequence individualized for each learner. Mediation also involves suggesting connections with other learners as well as group learning activities.

This project is sponsored by Gooru Learning.

Team Members:

Dr Aparna Lalingkar (PostDoctorate Research Fellow)

Ms Chaitali Diwan (PhD Research Scholar)

Ms Praseeda Kalkur (PhD Research Scholar)

Mr Naman Churiwala (Research Associate)

Mr Prakhar Mishra (MS Research Scholar)

Mr Shyam Kumar VN (MS Research Scholar)

Publications:

Chaitali Diwan, Srinath Srinivasa, and Prasad Ram.Automatic Generation of Coherent Learning Pathways for Open Educational Resources, In Proceedings of the Fourteenth European Conference on Technology Enhanced Learning (EC-TEL 2019), Springer LNCS, Delft, Netherlands, 16-19 September 2019 (to appear)

Aparna Lalingkar, Srinath Srinivasa, PrasadRam (2019), Characterization of Technology-based Mediations for Navigated Learning, Advanced Computing and Communications, Vol 3 (2), June 2019, ACCS  Publications, pp. 33-47. (Paper Link)

Praseeda, Srinath Srinivasa and Prasad Ram “Validating the Myth of Average through Evidences” In: The 12th International Conference on Educational Data Mining, Michel Desmarais, Collin F. Lynch, Agathe Merceron, & Roger Nkambou (eds.) 2019, pp. 631 – 634

Chaitali Diwan, Srinath Srinivasa, and Prasad Ram. Computing Exposition Coherence of Learning Resources, In Proceedings of The 17th International Conference on Ontologies, Databases and Applications of Semantics (ODBASE 2018), Valletta, Malta, October 22-26, 2018, Springer International Publishing.

Lalingkar, A.; Srinivasa, S. & Ram, P. (2018). Deriving Semantics of Learning Mediations, In Proceedings of the 18th IEEE International Conference on Advanced Learning Technologies (ICALT), IEEE, pp. 54-55. (Cited by 1 as per GoogleScholar citation index) (Paper Link)

Sub Projects

Details of Project Hosted

SDG Map showing various states in 2 dimensions is here

The competency map with polylines is hosted in the link here

The corresponding learning map for the learner is hosted in the link here