CIKM 2023 Workshop on Enterprise Knowledge Graphs Using Large Language Models

21st October 2023,University of Birmingham, UK

Contact Us

Rajeev Gupta (Principal scientist, Microsoft, India)
Microsoft R&D India Pvt. Ltd., Hyderabad, India

CIKM 2023 Workshop on Enterprise Knowledge Graphs Using Large Language Models

21st October 2023,University of Birmingham, UK

Programme Committee

  • Manoj Agarwal (Senior Researcher, Discovery Intelligence, Uber Research)
  • Manish Bhide (CTO, AI Governance, IBM)
  • Mukesh Mohania (Professor, CSE, IIIT Delhi)
  • Prasad Deshpande (Senior Staff Software Engineer, Databricks)
  • Qi He (Head of AI, Nextdoor)
  • Ranganath Kondapally (Principal Applied Scientist, Microsoft)
  • Rushi Bhatt (Partner, ML Systems and Services, Microsoft)
  • Sauvik Ghosh (Director of AI, LinkedIn)

Workshop Chairs

Website Chairs

CIKM 2023 Workshop on Enterprise Knowledge Graphs Using Large Language Models

21st October 2023,University of Birmingham, UK

Workshop Schedule(Tentative)

09:30-10:00 Tea and Registration
Event 1
11:15-11:30Tea Break
Event 2
12:30-14:00Lunch Break
Event 3
15:30-15:45Tea Break
Event 4
17:00 -17:30High Tea and Closing

CIKM 2023 Workshop on Enterprise Knowledge Graphs Using Large Language Models

21st October 2023,University of Birmingham, UK

Important Dates

Abstract Submission: 8th September 2023 Closed
Paper submission : 10th September 2023 Closed
Notification of paper acceptance : 25th September 2023
Camera Ready Paper Submission : 1st October 2023
The workshop : 21st October 2023

CIKM 2023 Workshop on Enterprise Knowledge Graphs Using Large Language Models

21st October 2023,University of Birmingham, UK

Invited Speakers

Prof. Nishanth Sastry is the Director of Research of the Department of Computer Science, University of Surrey. His research spans a number of topics relating to social media, content delivery and networking, and online safety and privacy. He is joint Head of the Distributed and Networked Systems Group and co-leads the Pan University Surrey Security Network. He is also a Surrey AI Fellow and a Visiting Researcher at the Alan Turing Institute, where he is a co-lead of the Social Data Science Special Interest Group.

Dr. Fabio Petroni is the Co-Founder & CTO at Samaya AI, building an AI-powered knowledge-discovery platform. He obtained PhD from Sapienza University where he was part of the Research Center on Cyber Intelligence and Information Security. He was a Researcher at the FAIR (Fundamental AI Research) team of Meta AI (formerly known as Facebook AI Research), from 2018 to 2022,where he focused on Knowledge-Intensive Natural Language Processing. Before that, he also worked as a Research Scientist at the R&D department at Thomson Reuters and a Data Engineer at KMPG.

Ms. Liana Mikaelyan is a Research Software Development Engineer in the Alexandria team at Microsoft Research Cambridge UK . Before joining Microsoft Research Cambridge she worked on various machine learning projects mainly in speech synthesis and recognition. She completed her MSc in Machine Learning at UCL with a background in mathematics.

Title: Enhancing Enterprise Knowledge Base Construction with Fine-Tuned Generative Language Models

Abstract: In this talk, we will present our latest work on leveraging the power of generative language models for knowledge base construction. We have fine-tuned a generative LLM to extract entities and their relevant properties from text passages and represent them in a structured JSON format. This task was accomplished by creating a dataset of short passages and corresponding JSON outputs using GPT4, which was then used to fine-tune the openLlama 3B model on a single A100 GPU. Our approach has demonstrated superior performance compared to the existing template matching algorithm in Alexandria, both in terms of precision and coverage, as well as extracting a richer set of properties from the text. Furthermore, the addition of new properties to the knowledge base has been significantly simplified. Future work involves exploring ways to improve the generation time as well as investigating other models to further enhance our system’s performance.

CIKM 2023 Workshop on Enterprise Knowledge Graphs Using Large Language Models

21st October 2023,University of Birmingham, UK

Call For Papers

Knowledge graphs can integrate diverse data sources and provide a holistic view to the downstream applications. By virtue of being structured, knowledge graphs offer transparency and interpretability to the search and recommendations applications. Combining Knowledge Graphs with current-day advances in LLMs can create several opportunities.

The EKG-LLM workshop as part of CIKM 2023, would be addressing how large language models can help with the construction and usage of these enterprise knowledge graphs. This involves improving all the aspects of EKG workflow using large language models: entity extraction, entity enrichment, EKG construction, querying EKG for search and recommendations, scenario specific EKG, etc. Through this workshop we would like to highlight research issues specific to the integration of the enterprise knowledge graphs with large language models and associated applications.

Topics of interest include but are not limited to, the following:

  • Designing Enterprise Knowledge Graph (EKG)
  • EKG Implementation
  • Scalable extraction of enterprise entities using LLMs
  • Building EKGs for specific domains or applications
  • Natural Language Processing (NLP) algorithms to build EKGs.
  • Relationship extraction using large language models
  • Federated graph learning with LLMs
  • Privacy in graph algorithms
  • Privacy preserving graph construction and mining
  • Semantic reasoning based on deep learning on graph
  • Industrial applications of EKGs: banking, financing, retail, healthcare, medicine, etc.
  • Explainable AI based on EKG
  • Use of EKG and LLMs for search and recommendations


Manuscripts should be submitted in PDF format with 6 pages of content , plus references. Please follow two-column CEUR style template ( for paper submissions .

Link for paper submission:


1-Day PM Activity Details

Prime Minister PolicyMaker 😁

Time ⏳: 45 mins

Background :

Policymakers are people who are responsible for formulating policies and making policy decisions.  UN has come up with 17 Sustainable development goals across domains with around 167 targets to transform the world. And it is upon policymakers, government officials, researchers, and data scientists to help achieve these sustainable targets by identifying key problem areas and their factors, collecting, processing, and analyzing relevant historic and current data to provide necessary insights to make informed decisions by designing policies for sustainable development.

Karnataka Data Lake is an ongoing project serving as the Data Analytics partner for the Department of Planning and Statistics, Karnataka. 

This activity brings you an exciting opportunity to be a Policymaker for a day

If you’ve made a PolicyMaker for a day 🤔, how do you approach, understand and resolve the problem? 🧐 What policy and budget decisions will you make to build a sustainable society and achieve an SDG target? 🤓

The Task :

The activity requires participants to be divided among groups.

Each group is presented with a problem statement for a region. Considering the data and analysis provided by the KDL site or any other sources, the team should design the policies to help achieve the UN’s Sustainable development target not limited to the below questions but can be based on your expertise.

By the end of the activity, each group should present a case study by performing the below tasks ✅

  1. Explain the problem statement, its relevant SDG Target, and the Karnataka context for the given target.
  2. Investigate factors leading to the problem. Observe if there is a pattern w.r.t the district’s neighboring regions.
  3. If you are given a budget of 20 crores / 2 Million to reach the SDG target for the district. How will you allocate the budget for improving different factors leading to the problem?
  4. Based on your analysis so far, please suggest policies/schemes/action items that help improve the factors affecting the problem sustainably.
  5. Present your case study (ppt or doc) with relevant references, visualizations (optional), dashboards (optional), and datasets (optional) and justify your budget and policy decisions.

(A sample Example is provided for reference purposes.)

References :

Please form groups based on your familiarity with the below SDG Goals. Through this activity, we would love to hear your group’s narratives in solving and achieving the SDG targets not limited to just using the dashboards, but any research papers/reports/news articles/personal or professional expertise.

(*Please note that our volunteers will be around in case of queries or if any help is required to solve/present the problem)

  1. Bidar district is reported to have less Rice production compared to the state’s average
  2. Koppal district has reported having less Wheat production compared to the state’s average.
  1. Vijayanagar district has reported a high IMR compared to the state’s average.
  2. Haveri district reported a high MMR compared to the state’s average.
  3. Kalburgi district reported high U5MR compared to the state’s average.
  1. Vijayanagar district reported high secondary dropouts compared to the state’s average.
  2. Shivmoga district reported high girls’ dropout rate compared to the state’s average.

Opportunities to work with WSL

Aug 2023 : Open

One Post-doctoral position for 1 year is open, starting from August 1, 2023 for Karnataka Data Lake project : Policy Research using Big Data Analytics ( The position may be extended based on requirement and performance. 

The Key responsibilities include the following :

  • Be the technical interface between the unversity research group and industry sponsor.
  • Work on designing and implementing data pipelines automating ingestion, modeling and visualization tasks.
  • Build conversational AI for KDL considering both structured and unstructured data using Large Language Models.
  • Publish/present findings in research publications and at professional meetings.

Interested candidates may send their CV along with 2-3 publications to

Jan 2023: Closed

3-month internship in visual analytics and predictive modelling 

Two positions for 3-month paid internships are open, starting January 15, 2023 till April 14, 2023. The positions may be extended for 3 more months based on requirement and performance. 

Internship applicants should have a BTech or equivalent in Computer Science or related disciplines like Information Technology, Data Science, etc. Programming proficiency in python is highly desirable, and experience with visual analytic tools like Tableau or Kibana is an added plus. 

Interns will get an opportunity to work on a major project involving planning and resource allocation, and pick up skills like Bayesian modelling, building data stories and providing actionable insights. 

The internships will come with a stipend of INR 15,000 per month. Applications may be sent to

Dec 2022: Closed

Project Elective applications are open to work on various WSL projects for the upcoming Jan-Apr 2023 semester at IIIT-Bangalore. Students can apply for 4-credit, 12-credit or 20-credit projects as applicable. This call is open to students currently enrolled at IIIT-B.

Applications will close on 2nd Dec 2022. If applied after 2nd Dec, we will reach out to you only in case there are any further open positions.

Project details available here:

Form to apply:

WS4D Datathon: Concept and Details

Concept Note for the SafeCity Data Visualisation Challenge

WS4D Datathon


The key dataset(s) pertain to information gathered from India, and provided by the Red Dot Foundation.

  1. Reports: time, place, type of event, report
  2. MobileApp: time, place, type of event

Reference articles  pertain to the following topics:

  1. Use of ML/AI to find the type of event (touching/groping/sexual invites/commenting/etc.) from the reports; a study on the diverse forms of sexual harassment
  2. Street violence
  3. Gender-based violence in public transport
  4. Women’s strategies to address assault and violence
  5. Study of crowdsourced data

Challenge themes:

The following points are for processing data and analyzing it deliberately, and using the knowledge to create a compelling visualization as a narrative/summary (preferably) or a tool.  The visualization (tool) must be shareable on social media to spread awareness and to inspire action against gender-based violence and others.

  1. Theme-Mythbusters: Time-related clustering/visualization or integration of time (time of day, evolution over time) with spatial and categories of crime – ( ): This will help us debunk the myths of where and when different kinds of sexual violence tend to take place. Hence, the challenge starts with picking/identifying a myth as a hypothesis, and demonstrating if the data confirm it or not. 
  2. Theme-MirrorMirrorOnTheWall: Comparison of Indian cities with others in the world where data is available: this will give us a sense of India’s position in sexual violence across different parameters captured in the existing datasets. For example, do we see a concentration of specific kinds of violence in India? Such data help us make aware of specific social structures within which sexual crime takes place. 
  3. Theme-Mash-up: Integration with other relevant datasets — police data, sex ratio, etc. available for a specific city. This will help us understand the overall situation of the safety and status of women in a city.  Such data will be crucial in shaping institutional strategies for coping with the incidence of sexual violence.  

For Theme-MythBusters, relevant myths (as a sample):

  1. Gender-based violence of all forms is highly prevalent in Delhi.
  2. Gender-based violence occurs in dimly lit streets and at night.
  3. Sexual violence and harassment occur only in very crowded or very deserted regions.
  4. Not many women get distressed with non-physical forms of violence.

For Theme-MirrorMirrorOnTheWall, relevant datasets and sources:

  2. New York City crime:
  3. Country and World data: consolidated as an excel sheet by Red Dot Foundation using multiple sources:

For Theme-Mash-up, relevant datasets and sources:

  1. social indicators: the general status of women in a specific city, for example, sex ratio, gender-segregated literacy rates, rate of female workforce participation. 
    1. Demographics data with gender segregation – raw data:
    2. Report: Women and Men in India:
      1. 2017:
      2. 2018:
    5. Districtwise Education Data 2015-16 based on sex ratio, male/female literacy, schools by category, boys/girls schools by category, male/female teachers by category, etc.
    6.  Rural Female broad employment status
    7. Urban female broad employment status
    8. Women prisoners with children
    9.  Statewise schools with female teachers
    10. Statewise registered cases against stalking, rape, acid attacks
    11. Financial assistance provided to OBC women
    12.  Budgetary allocation for women safety
    13. State level literacy rate
  2. infrastructure indicators: the general state of law and order, safety in public spaces, gender-based crime, street lights, CCTV cameras, etc.
    1. Street lighting:
    2. Crime against women:
      3. Crime against Women in Metropolitan Cities — tables from a book chapter. [provided separately as a pdf].


A compelling visual narrative to be shared on social media:

  1. Appropriate fonts and color palettes
  2. Situation-sensitive text, e.g. without victim shaming
  3. Use of popular NLP tools in python, visualization tools like D3.js, Tableau, etc.

For further queries:

WS4D PhD Colloquium

WS4D PhD Colloquium

Feb 14, 2020 | 10AM to 4PM | IIIT-Bangalore

Register HERE

The goal of this session is to have research discussion among the PhD research scholars across multiple institutes who are working in the areas related to Web Science. We hope these discussions will be useful and will foster research collaborations in future!


Moderators: Faculty
Panelists: PhD Research Scholars
Audience: Research Scholars

Agenda of each Theme Discussion

  • Theme Introduction by Moderator
  • Short introduction by panelists (5 panelists 5 mins each)
  • Q&A (30 mins)

PhD Colloquium Themes

  1. Empowerment
    In this theme, we discuss how the WWW and digital technologies in general can be used for education and upskilling of the population at scale. As mobile phones and high-speed data connections become ubiquitous, this has created a huge opportunity for disseminating knowledge and skills to a vast population efficiently. However, a dearth of sound understanding of how this can be achieved, is still an impediment. We can also discuss how digital empowerment is essential and how access to resources can help in that context.
  2. Inclusion & Accessibility
    In this theme, we discuss how inclusion is necessary and not just preferable to build models or solutions which are useful, relevant and applicable to all. In this context, inclusion might be in terms of gender, race, color etc. It will be relevant to also discuss how web and digitization can be conducive in building solutions which are designed keeping accessibility into account. Topics like rennaration, multi-language support, transcriptions, alternate text of images etc might be relevant.
  3. Digital Governance + Privacy  + Security
    In this theme, we discuss how different forms of data management processes can be woven into the fabric of administrative decision-making. These include structured data generated by different government departments, corporates and other organisations; as well as the so-called Big Data, generated from several sources like sensors, social media posts, etc. that often contain useful inputs for decision-making. We also discuss topics like privacy and security in this context.
  4. Social Cognition
    In this theme we address questions about how the web, and particularly social media and open online knowledge portals like Wikipedia, is affecting collective opinion and worldview. Social cognition is playing a central role in the making and breaking of reputations of individuals, businesses, and countries. There is a pressing need to understand social cognition in the post-web world. We also discuss topics like opinions, campaigns in networks, marketing and recommendation and discourse modeling.