Talk on “Bridging big data and qualitative methods in the social sciences”

Date: Jan 3rd 2018

Time: 2:15 PM

Location: TBD

Abstract: With the rise of social media, a vast amount of new primary research material has become available to social scientists, but the sheer volume and variety of this make it difficult to access through the traditional approaches: close reading and nuanced interpretations of manual qualitative coding and analysis. This work sets out to bridge the gap by developing semi-automated replacements for manual coding through a mixture of crowdsourcing and machine learning, seeded by the development of a careful manual coding scheme from a small sample of data. To show the promise of this approach, we attempt to create a nuanced categorisation of responses on Twitter to several cases of extreme circumstances.


Bio of speaker:

Dima is a Senior Data Scientist at Skyscanner where his focus is on developing and optimizing the Skyscanner’s travel search engine. Prior to Skyscanner, Dima was with King’s College London where he worked on analysis of BBC iPlayer (a joint project with BBC) and various social media websites (Twitter, Pinterest, Foursquare, etc.). He contributes to the data mining (KDD, WWW, etc.) and computer networks communities (Infocom, ComMag, etc.) and have his works featured by New ScientistBBC News and other media outlets. Dima has also co-founded and was a former CEO of More information –


Gooru is a learning environment modeled as a social machine. It acts as a learning navigator. Any learning experience is in the form of a navigation across the “learning map” using “competencies”. Competencies are made up of two dimensions; namely pedagogic depth and topic. In order to learn a particular topic or to reach a target competency, Gooru provides the learner with an optimal learning route. The skyline of a learner is the set of the highest achieved competencies in every topic. Every time the learner learns a new topic or achieves a new competency, his skyline is updated. This gives the learner his position on the learning map. The learner skyline is used to compute the route the learner has to take to reach a target competency.

  • Project Members


Gooru honors the human right to education by creating technology that enables educators and researchers to “open-source” effective practices and content to improve learning outcomes for all.

The Gooru model of learning is a Computer Aided Instruction (CAI) environment that aims to provide effective, individualized learning experiences at scale, by utilizing both adaptive tutoring as well as community interactions. User interaction is modeled around a “Navigation” paradigm, where the subject matter is represented as learning activities scattered across a logical space called the “learning map.” Lessons are modeled as learning pathways in this logical space. The logical space metaphor enables the student to obtain different levels of overview of the subject matter by perusing the landscape, and understand semantic relatedness of topics based on their visual proximity. The learning map is also a social space, where students interact with other learners (other students, human tutors and teachers), rather than only an algorithmic agent.

The underlying paradigm of such a system is called a social machine. The social machine approach is characteristically different from both scalable classroom models like MOOCs and personalized learning environments like Intelligent Tutoring Systems

Sub Projects

Open City

Open City projects aims at large scale access control management. Large amount of data is generated by IOT devices installed in the city but its limited usage and difficult accessibility has lead to under utilization of resources and huge expenditure for e and the government. If we could create a system where such sensitive data is uploaded and the owner of the data could decide who should be provided access then  it would be of great help. We could get real time update of the traffic with the cameras installed outside buildings and we could manage the traffic better. We can have automated surveillance mechanisms built.

There is great advantage of building such a system, but there are equally large challenges involved. Misuse of open ended data could lead to people losing trust from the system.

Open City aims at building a system that would share relevant data based on some events and triggers and only to types of people defined by the owner of the data.

Big Data Engineering Workshop – 18 – 19 April 2017



This workshop is a part of a project Co-creation of a Center of Excellence in Big Data Engineeringa collaboration between  International Institute of Information Technology (IIIT-B) and City University London, as the UK partner, to set up a centre of excellence in Big Data Engineering. This project aims to  create a Centre of Excellence in Big Data Engineering.  The objectives of the centre would be to co-create research agendas, curriculum and outreach programmes in Big Data.

event name

Please Join Us April 18 – 19

Big data Workshop — IIIT-Bangalore

2017 International Workshop on Big Data Engineering: Education, Research to Innovation

Big Data is changing the world of retail,banking,transport,healthcare and cyber security.
With increasing adaptation of Cloud computing by many sectors big data has become a topic in education,
research and innovation. Research institutions and industries are building tools and technologies
based on big data. Business players and technology providers are working on creating new products
and services and are deploying novel business models that can aggregate and analyse these data.
Research has shown that Big Data has direct influence in the business efficiency and decision making
process. In addition Big Data available from the social media platforms are providing sufficient
intelligence to the governments to identify criminals and prosecute them based on these collected
evidence. On the other hand the social media data is also helping attackers to identify the most
influential nodes in the network.

To address some of the challenges highlighted above, this workshop will discuss following topics
description description

Acquisition and Preparation
of Big Data

Identifying the source, Data cleaning, Life Cycle of Big Data.To address the distinct requirements for performing analysis on Big Data,a step-by-step methodology is needed to organize the activities and tasks involved with acquiring, processing and analyzing  data.

by Prof. Chandrashekar Ramanathan


Managing Large Tabular Data

Map Reduce, Hadoop, Big Table / HBase, Columnar Databases a software framework for distributed processing of large data sets on compute clusters of commodity hardware. It is a sub-project of the Apache Hadoop project. The framework takes care of scheduling tasks.

by Prof. G Srinivasaraghavan

description description

Managing Data Streams

Standing queries, incremental query answering, complex event processing, event stream processing,

by Prof. Chandrashekar Ramanathan


Managing Large Graphs

NoSQL and graph databases, Google Pregel, Titan.

by Prof. G Srinivasaraghavan

description description

Managing Large Text Corpora

Document vector model, information retrieval, subspace partitioning and generative clustering

by Prof. Srinath Srinivasa


Infrastructure and Visualisation

Industry Standards, Visualisation methods, Visualization techniques for Graph Data Structures, Levels-of-Detail and Focus & Context Methods for Large Datasets, Seriation Methods for Table Visualization, Visual Analytics: Using Visualization for Data Analytics

by Prof. Jaya Sreevalsan Nair



IIIT Bangalore, #26/C, Opposite Infosys Gate 1, Electronic City, Bangalore

Time & Date

April 18-19, 2017
9:30 AM to 5 PM

Event Registration link