Workshop on Big Data Engineering (BDE 2017)

This workshop is a part of a project Co-creation of a Center of Excellence in Big Data Engineering , a collaboration between International Institute of Information Technology (IIIT-B) and City University London, to set up a centre of excellence in Big Data Engineering. This project aims to create a Centre of Excellence in Big Data Engineering. The objectives of the centre would be to co-create research agendas, curriculum and outreach programmes in Big Data.

The abstract of the talks can be accessed here.

Workshop main web page may be accessed here.

Workshop agenda:

Web Sciences Lab Workshop – 19th December 2016

WSL Worksop Dec 2016

Date: 19th December 2016

Venue: Room no 226, IIIT Bangalore

Time: 9:30 AM to 3:30 PM

We are conducting a one day workshop to collate and present research work by research scholars at the Web Sciences lab, IIIT Bangalore. Research Scholars will present their work, discuss ideas, share problems encountered, retrospect and provide updates on their progress.

Following is the schedule for the workshop

Time Task
9:30 – 9:45 Overview of the work done by lab in past 6 months – Prof Srinath Srinivasa
10:30 – 11:00 Inferencing in the Large:Towards Automation of Semantic Integration and Knowledge Representation of Open Data – Presenter : Asha Subramanian
11:00 – 11:30 A talk on Trust and Mediation – Presenter : Praseeda
11:30 – 12:00 Narratives Plot Comparison – Presenter : Sharath Srivatsa
12:00 – 12:30 Framework for Mediation Driven Learning – Presenter : Chaitali Diwan
12:30 – 1:30 Break for lunch
Afternoon
1:30 – 2:00 A talk on The Marketplace of Opinions – Presenter : Raksha
2:00 – 2:30 Semantic Summarization from User Generated Short Reports – Presenter : Jaya
2:30 – 3:30 Open discussion with all the participants on “Research and Me”

The abstracts of various talks are given below.

Title: Inferencing in the Large: Towards Automation of Semantic Integration and Knowledge Representation of Open Data

Abstract: Data available on public domain especially though open data initiatives such as data.gov, data.gov.in, data.gov.uk publish useful information on various aspects of government policies and administration. One could derive immense insights by semantically integrating such datasets across various domains. Semantic Integration involves extraction of common domains or themes that explain a collection of datasets by identifying unique resources for data values and relations amongst rows of data across these datasets using known or custom vocabularies and knowledge bases. The natural taxonomy and classification of the entities, instances and properties in the vocabularies allow for extraction of themes relevant to the datasets. Multiple research efforts have addressed the problem of semantic annotation of web tables and csv tables, which mainly involves interpreting tabular data by linking them to relevant vocabularies, however they have not focussed on the problem of semantic integration of tables. Linking Government Data is an active research interest. The current process to semantically link such datasets is largely manual and involves manual identification of vocabularies, classes and properties for each dataset, creating templates which will then automate the process of mapping the data to the identified vocabularies.
Our work presents two models, 1) the generation of semantically linked data for the open datasets using vocabularies from LOD cloud such as Dbpedia, YAGO, Schema.org, UMBEL etc and 2) representing the data in an intuitive home-grown Knowledge Representation Framework called MWF (Many Worlds on a Frame), a framework loosely modelled on Kripke Semantics. MWF allows for rich representation of data across two aspects – the type hierarchy(is-a) relationship and the containment hierarchy(is-in) relationship supported by roles and associations to transform the open datasets into a web of semantically interlinked themes and their associations.

Title: Understanding  trust in mediation

Abstract: Intermediaries have always been a part of the society. It was individuals who played a role of broker to orchestrate and facilitate transactions between various parties. Click here for more

Title: Narratives Plot Comparison

Abstract: Narratives are extremely versatile way of telling imaginary or fictional and true or empirical incidents whereas expositions are simple and concise documentation based on true and well researched content. Writing narratives is not bounded by any style, it is limited by the author’s intention to entertain, his experience and effort to compose. A similar message can be conveyed in varying grades of style and illustrative cases and hence comparing two narratives and scoring their similarity is non-trivial. Narratives have two aspects the flow of events called the Fabula and the expression style called Discourse, both aspects affect the reading experience and the impact of the intention or message to be conveyed by the author. Our hypothesis is that two narratives can be compared by matching the verbs and nouns of events of each subject. Click here for more

Title: Framework for Mediation Driven Learning

Abstract: Learning is a complex process in which the learner experiences permanent and lasting changes in knowledge, behaviour, or ways of processing the world. Every learner is unique and learns and perceives things differently, at a different pace. In the classroom environment which is designed for an average student, same content is delivered to all the students in the same way. There is a fundamental flaw in designing the curriculum in this way for an average student, since there are virtually no students who fit into this category of average [1]. Hence, there is a need to address the individuality of the student for effective learning. A learning theory called as “Independent Learning” addresses this. Independent learning encourages and enables students to become self-directed in their learning experiences and to have more autonomy and control over their learning. In addition to this, it is found that learning is very effective where there is a collaboration with other learners. In our work, we propose the concept of “mediation driven learning” which builds upon the theories of independent learning and collaborative learning and uses the power of Web to mediate or facilitate learning. We create a framework for mediation driven learning where we get the learners and tutors together on one platform and provide a mediation algorithm that finds an optimal matching between the learners and tutors for a particular learning concept. Click here for more

Title: Understanding the Marketplace of Opinions

Abstract: Our understanding of web has been evolving from that of a passive repository to a participatory socio-cognitive space, where human beings are participants rather than users of it. More than effecting the daily transactions this space has created a huge impact on how thoughts are shaping at individual level and also in a community. To be able to interpret how the society is transforming, it is very important to understand how the web is impacting the social cognition….Click here for more

Title: Semantic Summarization from User Generated Text Reports

Abstract:Text summarization is an active research area among Natural Language Processing research community. The community have been developed diverse paradigms for generating summary from long documents, even-though there is minimal effort on creating summary from large collection of short and noisy documents. Here, the short documents refers to user generated social media activity messages or any short reports which are generated as part of any closed domain. The proposed research aims to (semi-) automate the process of summary generation from a given set of short documents with more emphasis on the semantics of the document content. The research is initiated with a completely unsupervised techniques. The entire document collection is represented as an undirected graph of key phrases and later the graph clustering, graph centrality based measures and Markov Random Field based factor computation techniques are used to glean the important information. Further simple natural language generation techniques and natural language specific heuristics are applied to generate the candidate sentences for the final summary.

Open Discussion:

During the open discussion, all the participants will briefly share their individual views and comments on whether research pursuits have changed their approach in life towards achieving their passions or goals, and if yes, share their experiences.

 

Talk on “Web Annotation, Community Narratives and Familiarizing Stories” by Dr. Dinesh from Servelots

Speaker will visit the idea of Renarration Web with examples from Bio Diversity Protocol and Intangible Heritage of Hampi. He will then look at the ongoing Web Annotation Standards work at the W3C Web Annotation Working Group. Then we will spend some time discussing how the work of Web Sciences Lab can help in finding Similar Stories.

Date: 24th August 2016
Time: 3:00 PM
Venue: IIIT-Bangalore

About the speaker: Dinesh is the technical director at Janastu (janastu.org, 2002) and Servelots (servelots.com, 1999) in Bangalore, India which have been providing free and open source (FOSS) solutions and support, including R&D, to SME and NPOs/NGOs. They have introduced the concept of the SWeeT Web architecture and used it with platforms such as “re-narration web” in order to address the issue of contextualisation needs of web content, in particular for the case of low-literate web users who need a multi-lingual re-narration capable Web. He is a member of the W3C Working Group on Web Annotations as an Invited Expert.

Their work in recent years can be capture by these subject tags:
web annotations, social semantic web, location intelligence interpretation, 3d augmenting real spaces, re-narration, community radio, wifi-mesh and anthillhacks

Click here for more information about the speaker