Workshop Venue: Teaching and Learning Building (M208/M209) at the University of Birmingham (workshop venue).
The below schedule is based on UK time zone(UTC+1).
Session 1: 9.00-10.30
Introduction & Initial announcements: 9.00-9.30
Lessons from the age of user-generated content for the age of AI-generated content (Prof. Nishanth Sastry: 9.30 to 10.30 )
Refreshment Break: 10:30-11:00
Session 2: 11.00-12.30
Enhancing Enterprise Knowledge Base Construction with Fine-Tuned Generative Language Models (Liana Mikaelyan:11.00-11.45)
Research Session 1: 11.45 – 12:30 1. Related Table Search for Numeric data using Large Language Models and Enterprise Knowledge Graphs(Pranav Subramaniam, Udayan Khurana, Kavitha Srinivas and Horst Samulowitz) 2. Cognitive Retrieve: Empowering Document Retrieval with Semantics and Domain Specific Knowledge Graph(Apurva Kulkarni, Chandrashekar Ramanathan and Vinu E Venugopal)
Lunch: 12.30- 14.00
Session 3: 14:00- 15:30
Building Knowledge Graph for Products at Scale and infusing it into LLMs(Dr. Manoj Agarwal: : 14.00-14.45)
Research Session 2:(14.45 -15.30) 1. EduEmbedd – A Knowledge Graph Embedding for Education(Anurag Mohanty) 2. CRUSH: Cybersecurity Research using Universal LLMs and Semantic Hypernetworks (Mohit Sewak, Vamsi Emani and Annam Naresh)
Refreshment Break: 15.30 – 16.00
Session 4: 16.00- 17:00
LLMs for Social Networks: Applications, Challenges and Solutions (Bojan Babic: 16.00-17.00)
Lessons from the age of user-generated content for the age of AI-generated content
Prof. Nishanth Sastry, Director of Research of the Department of Computer Science, University of Surrey.
Abstract:
The past decade and more has been defined by the rise and near universal adoption of user-generated content (UGC) on social media. Initial excitement about the promise of UGC has since become tempered by concerns about misinformation, hate speech and other online harms. We are now witnessing a similar enthusiasm for content generated by Large Language Models. This talk will draw parallels between the two, and extract lessons about the perils, potentials and pitfalls awaiting us in the future age of AI-generated content.
Biography:
Prof. Nishanth Sastry is the Director of Research of the Department of Computer Science, University of Surrey. His research spans a number of topics relating to social media, content delivery and networking, and online safety and privacy. He is joint Head of the Distributed and Networked Systems Group and co-leads the Pan University Surrey Security Network. He is also a Surrey AI Fellow and a Visiting Researcher at the Alan Turing Institute, where he is a co-lead of the Social Data Science Special Interest Group.
Can machines discover new knowledge?
Dr. Fabio Petroni, Co-Founder & CTO at Samaya AI
Abstract:
For many years, the quest to determine the most efficacious representations of knowledge for machines has been at the forefront of research. Historically, this focus has centered on knowledge retrieval, whether from unstructured text corpora, structured collections (e.g, knowledge graphs, key-value memories), or the parameters of a neural model. How can we evolve these representations to not just retrieve, but actively discover new knowledge?
Biography:
Dr. Fabio Petroniis the Co-Founder & CTO at Samaya AI, building an AI-powered knowledge-discovery platform. Before that he was a Researcher at FAIR and Thomson Reuters, focusing on representing, gathering, extracting, using, reasoning on and creating world knowledge using AI.
Enhancing Enterprise Knowledge Base Construction with Fine-Tuned Generative Language Models
Ms. Liana Mikaelyan, Research Software Development Engineer in the Alexandria team, Microsoft Research Cambridge UK .
Abstract:
In this talk, we will present our latest work on leveraging the power of generative language models for knowledge base construction. We have fine-tuned a generative LLM to extract entities and their relevant properties from text passages and represent them in a structured JSON format. This task was accomplished by creating a dataset of short passages and corresponding JSON outputs using GPT4, which was then used to fine-tune the OpenLlama 3B model on a single A100 GPU. Our approach has demonstrated superior performance compared to the existing template matching algorithm in Alexandria, both in terms of precision and coverage, as well as extracting a richer set of properties from the text. Furthermore, the addition of new properties to the knowledge base has been significantly simplified. Future work involves exploring ways to improve the generation time as well as investigating other models to further enhance our system’s performance
Biography:
Ms. Liana Mikaelyan is a Research Software Development Engineer in the Alexandria team at Microsoft Research Cambridge UK . Before joining Microsoft Research Cambridge she worked on various machine learning projects mainly in speech synthesis and recognition. She completed her MSc in Machine Learning at UCL with a background in mathematics.
LLMs for Social Networks: Applications, Challenges and Solutions
Bojan Babic, Nextdoor.
Abstract:
Last couple of years we have witnessed an explosion of Generative AI research and respective applications that are simultaneously transforming how companies operate internally and how they communicate with their customers.
In this talk we will present work of the Nextdoor GenAI team and respective LLM applications in social networks in the areas such as Knowledge tasks, Engagement tasks and Governance. We will cover what we have tried, what works and what does not work. At the same time, in this talk we will present a framework that we used that helped us iterate fast and systematically improve each of the product areas.
Biography:
Bojan Babic is currently working on various Generative AI problems at the social media platform Nextdoor. Preceding this position, he has been working on the Search/Information Retrieval, Ads and recommendations and respective application spanning from e-commerce to social media space
Building Knowledge Graph for Products at Scale and infusing it in to LLMs
Dr. Manoj Agarwal,Senior Staff Engineer in Discovery Intelligence team at Uber AI.
Abstract:
A knowledge graph is the key to entity search as it can store the factual entity related information in a structured manner without the rigidity of a fixed schema. Both Google and Bing have web scale knowledge graphs and for a large fraction of user queries knowledge graph is invoked. E-commerce search is primarily an entity search. Therefore, building a Knowledge Graph is the key to improve the eCommerce search in many ways. However, building it at web scale is a highly challenging problem. It is an equally or even more challenging problem to build the knowledge graph for products. In this talk, we present our methodology to build the knowledge graph for products at web scale. With recent success of LLMs, can we infuse such semantic understanding of the world, encoded in the form of Knowledge Graph, in the LLMs? There are some advances in this direction, however it remains an open question if the Knowledge graphs can be replaced by the LLMs.
Biography:
Dr. Manoj Agarwal is Senior Staff Engineer in Discovery Intelligence team at Uber AI. Before Uber, he was Principal Applied Scientist at Microsoft – AI and Research and a senior researcher in IBM Research. Manoj was the chief architect for building a web scale product knowledge graph for Microsoft – Shopping, comprising a few hundred million products and a few billion facts with high accuracy. Currently, he is engaged in the efforts to build the scalable knowledge graph as well as discovering the taxonomy to improve the semantic search and recommendations for Uber Delivery. His research interests are in the areas of web mining, graph mining, pattern recognition, data mining, knowledge graphs, LLMs and information retrieval with more than 30 patents and over 25 research papers in reputed journals and conferences.
Knowledge graphs can integrate diverse data sources and provide a holistic view to the downstream applications. By virtue of being structured, knowledge graphs offer transparency and interpretability to the search and recommendations applications. Combining Knowledge Graphs with current-day advances in LLMs can create several opportunities.
The EKG-LLM workshop as part of CIKM 2023, would be addressing how large language models can help with the construction and usage of these enterprise knowledge graphs. This involves improving all the aspects of EKG workflow using large language models: entity extraction, entity enrichment, EKG construction, querying EKG for search and recommendations, scenario specific EKG, etc. Through this workshop we would like to highlight research issues specific to the integration of the enterprise knowledge graphs with large language models and associated applications.
Topics of interest include but are not limited to, the following:
Designing Enterprise Knowledge Graph (EKG)
EKG Implementation
Scalable extraction of enterprise entities using LLMs
Building EKGs for specific domains or applications
Natural Language Processing (NLP) algorithms to build EKGs.
Relationship extraction using large language models
Federated graph learning with LLMs
Privacy in graph algorithms
Privacy preserving graph construction and mining
Semantic reasoning based on deep learning on graph
Industrial applications of EKGs: banking, financing, retail, healthcare, medicine, etc.
Explainable AI based on EKG
Use of EKG and LLMs for search and recommendations
Submission
Manuscripts should be submitted in PDF format with 6 pages of content , plus references. Please follow two-column CEUR style template (https://ceur-ws.org/Vol-XXX/) for paper submissions .
Authors of accepted papers should prepare a camera-ready (final) version of their paper and submit it using the EasyChair system no later than Sunday, October 1, 2023. Please email the camera ready version(PDF as well as editable versions (doc/latex)) to rajeev.gupta@microsoft.com , sri@iiitb.ac.in, aparna.m@iiitb.ac.in and bhoomika.ap@iiitb.ac.in.
Each accepted paper requires at least one author to perform in person registration using the link https://uobevents.eventsair.com/cikm2023/cikmauthpreandmain and be presented at the workshop in-person in order to include and publish the paper in the workshop proceedings.
Preparation of Camera Ready Paper
Authors are advised to address the comments of the reviewers in the camera-ready version suitably.
Policymakers are people who are responsible for formulating policies and making policy decisions. UN has come up with 17 Sustainable development goals across domains with around 167 targets to transform the world. And it is upon policymakers, government officials, researchers, and data scientists to help achieve these sustainable targets by identifying key problem areas and their factors, collecting, processing, and analyzing relevant historic and current data to provide necessary insights to make informed decisions by designing policies for sustainable development.
This activity brings you an exciting opportunity to be a Policymaker for a day
If you’ve made a PolicyMaker for a day 🤔, how do you approach, understand and resolve the problem? 🧐 What policy and budget decisions will you make to build a sustainable society and achieve an SDG target? 🤓
The Task :
The activity requires participants to be divided among groups.
Each group is presented with a problem statement for a region. Considering the data and analysis provided by the KDL site or any other sources, the team should design the policies to help achieve the UN’s Sustainable development target not limited to the below questions but can be based on your expertise.
By the end of the activity, each group should present a case study by performing the below tasks ✅
Explain the problem statement, its relevant SDG Target, and the Karnataka context for the given target.
Investigate factors leading to the problem. Observe if there is a pattern w.r.t the district’s neighboring regions.
If you are given a budget of 20 crores / 2 Million to reach the SDG target for the district. How will you allocate the budget for improving different factors leading to the problem?
Based on your analysis so far, please suggest policies/schemes/action items that help improve the factors affecting the problem sustainably.
Present your case study (ppt or doc) with relevant references, visualizations (optional), dashboards (optional), and datasets (optional) and justify your budget and policy decisions.
(A sample Example is provided for reference purposes.)
References :
Karnataka Data Lake Dashboard Gallery: Analysts can identify and observe the trends/patterns in the data or join datasets from different departments to get a bigger picture in solving a problem. Karnataka Data Lake’s pre-module on Exploratory Data Analytics provides initial reports, observations, and correlations from user-selected datasets and has the ability to perform real-time analysis using AI/ML models.
Karnataka Data Stories Page : Users can utilize Karnataka Data Lake with a focus on intervention modeling and counterfactual analysis to understand potential causes that provide clues on how, when, and where to intervene with policy instruments so as to achieve targets/goals in the best possible way.
Open Source Karnataka Data can be found in KODI, OpenCity.
Please form groups based on your familiarity with the below SDG Goals. Through this activity, we would love to hear your group’s narratives in solving and achieving the SDG targets not limited to just using the dashboards, but any research papers/reports/news articles/personal or professional expertise.
(*Please note that our volunteers will be around in case of queries or if any help is required to solve/present the problem)
Bidar district is reported to haveless Rice production compared to the state’s average
Koppal district has reported having less Wheat production compared to the state’s average.
Vijayanagar district has reported a high IMR compared to the state’s average.
Haveri district reported a high MMR compared to the state’s average.
Kalburgi district reported high U5MR compared to the state’s average.
Vijayanagar district reported high secondary dropouts compared to the state’s average.
Shivmoga district reported high girls’ dropout rate compared to thestate’s average.
One Post-doctoral position for 1 year is open, starting from August 1, 2023 for Karnataka Data Lake project : Policy Research using Big Data Analytics (https://kdl.iiitb.ac.in). The position may be extended based on requirement and performance.
The Key responsibilities include the following :
Be the technical interface between the unversity research group and industry sponsor.
Work on designing and implementing data pipelines automating ingestion, modeling and visualization tasks.
Build conversational AI for KDL considering both structured and unstructured data using Large Language Models.
Publish/present findings in research publications and at professional meetings.
Interested candidates may send their CV along with 2-3 publications to sri@iiitb.ac.in.
Jan 2023: Closed
3-month internship in visual analytics and predictive modelling
Two positions for 3-month paid internships are open, starting January 15, 2023 till April 14, 2023. The positions may be extended for 3 more months based on requirement and performance.
Internship applicants should have a BTech or equivalent in Computer Science or related disciplines like Information Technology, Data Science, etc. Programming proficiency in python is highly desirable, and experience with visual analytic tools like Tableau or Kibana is an added plus.
Interns will get an opportunity to work on a major project involving planning and resource allocation, and pick up skills like Bayesian modelling, building data stories and providing actionable insights.
The internships will come with a stipend of INR 15,000 per month. Applications may be sent to sri@iiitb.ac.in
Dec 2022: Closed
Project Elective applications are open to work on various WSL projects for the upcoming Jan-Apr 2023 semester at IIIT-Bangalore. Students can apply for 4-credit, 12-credit or 20-credit projects as applicable. This call is open to students currently enrolled at IIIT-B.
Applications will close on 2nd Dec 2022. If applied after 2nd Dec, we will reach out to you only in case there are any further open positions.
Use of ML/AI to find the type of event (touching/groping/sexual invites/commenting/etc.) from the reports; a study on the diverse forms of sexual harassment
Street violence
Gender-based violence in public transport
Women’s strategies to address assault and violence
Study of crowdsourced data
Challenge themes:
The following points are for processing data and analyzing it deliberately, and using the knowledge to create a compelling visualization as a narrative/summary (preferably) or a tool. The visualization (tool) must be shareable on social media to spread awareness and to inspire action against gender-based violence and others.
Theme-Mythbusters: Time-related clustering/visualization or integration of time (time of day, evolution over time) with spatial and categories of crime – ( http://maps.safecity.in/ ): This will help us debunk the myths of where and when different kinds of sexual violence tend to take place. Hence, the challenge starts with picking/identifying a myth as a hypothesis, and demonstrating if the data confirm it or not.
Theme-MirrorMirrorOnTheWall: Comparison of Indian cities with others in the world where data is available: this will give us a sense of India’s position in sexual violence across different parameters captured in the existing datasets. For example, do we see a concentration of specific kinds of violence in India? Such data help us make aware of specific social structures within which sexual crime takes place.
Theme-Mash-up: Integration with other relevant datasets — police data, sex ratio, etc. available for a specific city. This will help us understand the overall situation of the safety and status of women in a city. Such data will be crucial in shaping institutional strategies for coping with the incidence of sexual violence.
For Theme-MythBusters, relevant myths (as a sample):
Gender-based violence of all forms is highly prevalent in Delhi.
Gender-based violence occurs in dimly lit streets and at night.
Sexual violence and harassment occur only in very crowded or very deserted regions.
Not many women get distressed with non-physical forms of violence.
For Theme-MirrorMirrorOnTheWall, relevant datasets and sources:
social indicators: the general status of women in a specific city, for example, sex ratio, gender-segregated literacy rates, rate of female workforce participation.
Districtwise Education Data 2015-16 based on sex ratio, male/female literacy, schools by category, boys/girls schools by category, male/female teachers by category, etc.
The goal of this session is to have research discussion among the PhD research scholars across multiple institutes who are working in the areas related to Web Science. We hope these discussions will be useful and will foster research collaborations in future!
Participants:
Moderators:Faculty Panelists:PhD Research Scholars Audience:Research Scholars
Agenda of each Theme Discussion
Theme Introduction by Moderator
Short introduction by panelists (5 panelists 5 mins each)
Q&A (30 mins)
PhD Colloquium Themes
Empowerment In this theme, we discuss how the WWW and digital technologies in general can be used for education and upskilling of the population at scale. As mobile phones and high-speed data connections become ubiquitous, this has created a huge opportunity for disseminating knowledge and skills to a vast population efficiently. However, a dearth of sound understanding of how this can be achieved, is still an impediment. We can also discuss how digital empowerment is essential and how access to resources can help in that context.
Inclusion & Accessibility In this theme, we discuss how inclusion is necessary and not just preferable to build models or solutions which are useful, relevant and applicable to all. In this context, inclusion might be in terms of gender, race, color etc. It will be relevant to also discuss how web and digitization can be conducive in building solutions which are designed keeping accessibility into account. Topics like rennaration, multi-language support, transcriptions, alternate text of images etc might be relevant.
Digital Governance + Privacy + Security In this theme, we discuss how different forms of data management processes can be woven into the fabric of administrative decision-making. These include structured data generated by different government departments, corporates and other organisations; as well as the so-called Big Data, generated from several sources like sensors, social media posts, etc. that often contain useful inputs for decision-making. We also discuss topics like privacy and security in this context.
Social Cognition In this theme we address questions about how the web, and particularly social media and open online knowledge portals like Wikipedia, is affecting collective opinion and worldview. Social cognition is playing a central role in the making and breaking of reputations of individuals, businesses, and countries. There is a pressing need to understand social cognition in the post-web world. We also discuss topics like opinions, campaigns in networks, marketing and recommendation and discourse modeling.