Bharatiya Bhasha Diwas 2024

Bharatiya Bhasha Diwas will be observed on 11th December, 2024 at IIIT Bangalore. The focus will be on technologies in, for and through Indian languages. The event will include several technical talks and demonstrations focusing on Indian language technologies by eminent speakers from industry and academia.

Bharatiya Bhasha Diwas is celebrated to honour the Janma Jayanti (birth anniversary) of Mahakavi Subramania Bharati and celebrates India’s rich linguistic heritage and nurtures multilingualism. With the growth of Artificial Intelligence (AI) throughout the globe, India is on the verge of a digital revolution with the aims of bridging linguistic and regional gaps using various tasks such as text generation, machine translation, question answering, voice recognition, and conversational AI. This one-day event aims to bring together enthusiasts from diverse backgrounds including AI practitioners, linguists, and social scientists. The goal is to promote research and innovation, build an inclusive and collaborative community, and foster student engagement. We welcome students, researchers, practitioners, and anyone keen to contribute to the growth of technologies in Indian languages to join the event by registering here.

Time : 10:00 – 16:30 IST

Venue : In-person – International Institute of Information Technology Bangalore (IIITB) 26/C, Hosur Rd, Electronics City Phase 1, Electronic City, Bengaluru, Karnataka 560100

Online – TBA

Prof. Pushpak Bhattacharyya (Keynote talk)

Title: Low Resource Machine Translation of Indic Languages

Abstract: Indic Languages provide a diverse and exciting panorama of linguistic phenomena. Translation among these languages (including English too) involves several linguistic and resource challenges. In this talk, we discuss the techniques for and performance with analysis in handling the challenges of low resources in Indic MT. Subwording, pivoting, phrase table injection, use of translationese, Multilingual training, post-editing, etc. are among the techniques. The discussions are based on our work reported in top-quality conferences and journals.

Speaker Bio: Prof Pushpak Bhattacharyya (http://www.cse.iitb.ac.in/~pb) is Bhagat Singh Rekhi Chair Professor of Computer Science and Engineering at IIT Bombay. He has done extensive research in Natural Language Processing and Machine Learning. Some of his noteworthy contributions are Sarcasm Metaphor Hyperbole Detection, IndoWordnet, Cognitive NLP, Low Resource MT, and Knowledge Graph-Deep Learning Synergy in Information Extraction and Question Answering.  He has published more than 450 research papers (https://scholar.google.co.in/citations?user=vvg-pAkAAAAJ&hl=en, 17K+ citations and h-index 62 as on Oct 24), has authored/co-authored 8 books including a textbook on machine translation (2015) and one on NLP (2023), and has guided close to 400 students for their Ph.D., Masters, and Undergraduate thesis. Prof. Bhattacharyya has been a visiting researcher at MIT and a visiting faculty at Stanford. He is a Fellow of the National Academy of Engineering, an Abdul Kalam National Fellow, a Distinguished Alumnus of IIT Kharagpur, an ex-director of IIT Patna, and a past President of ACL (Association of Computational Linguistics).

Dr. Shakira Jabeen

Title: Multiple Languages of India, Multilingualism, Constitutional Guarentees and Preventing Language Death

Abstract: The talk is aimed at explaining crucial concepts pertaining to social behaviour towards language/s. The age old language scene of India is used as a diving board to focus on existing language issues. Effort is made to draw a distinction between multilingualism of the West and Indian multilinguality. Analyzing  the issue of language death, the talk tries to focus on ways to prevent the loss of languages. This framework is handled with an  aim to address graduate and post graduate students of technical and managerial stream. 

IndicNLP

IndicNLP project focuses on building an knowledge management framework for oral community knowledge in low-resource and colloquial Kannada language.

Background

Knowledge in rural communities is largely created, preserved, and is transferred verbally, and it is limited. This information is valuable to these communities, and managing and making it available digitally with state-of-the-art approaches enriches awareness and collective knowledge of people of these communities. The large amounts of data and information produced on the Internet are inaccessible to the population in these rural communities due to factors like lack of infrastructure, connectivity, and limited literacy. Knowledge internal to rural communities is also not conserved and made available in any global Big Data information systems. Artificial Intelligence (AI) technologies such as Automatic Speech Recognition (ASR) and Natural Language Processing (NLP) provide substantial assistance when vast quantities of data, like Big Data, are available to build solutions. In the case of low-resource languages like Kannada and rural colloquial dialects, publicly available corpora are significantly less. Building state-of-the-art AI solutions is challenging in this context, and we address this problem in this work. Knowledge management in rural communities requires a low-cost and efficient approach that social workers can use. Organizations such as Namma Halli Radio have collected an audio corpus of a few hours containing community interactions spoken in colloquial language. We propose an architecture for oral knowledge management for rural communities speaking colloquial Kannada using audio recordings.

Funding Agency

Mphasis F1 foundation

Publications

Aparna, M., Srivatsa, S., Sai Madhavan, G., Dinesh, T.B., Srinivasa, S. (2024). AI-Based Assistance for Management of Oral Community Knowledge in Low-Resource and Colloquial Kannada Language. In: Sachdeva, S., Watanobe, Y. (eds) Big Data Analytics in Astronomy, Science, and Engineering. BDA 2023. Lecture Notes in Computer Science, vol 14516. Springer, Cham. https://doi.org/10.1007/978-3-031-58502-9_1

Sharath Srivatsa, Aparna M, Sai Madhavan G, and Srinath Srinivasa. 2024. Knowledge Management Framework Over Low Resource Indian Colloquial Language Audio Contents. In Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD) (CODS-COMAD ’24). Association for Computing Machinery, New York, NY, USA, 553–557. https://doi.org/10.1145/3632410.3632483 

Aparna M and Srinath Srinivasa. 2023. Active learning for Named Entity Recognition in Kannada. TechRxiv. Preprint. https://doi.org/10.36227/techrxiv.24580582.v1

Media Mentions

Demo

Graama-Kannada Audio Search webapp : http://103.156.19.244:33035/,
(username : guest, password : guest123)

Graama-Kannada demo video:

People

Research Scholars

Project Students

  • Goutham U R
  • Ram Sai Koushik Polisetti
  • Sai Madhavan G
  • Kappagantula Lakshmi Abhigna
  • Manuj Malik
  • Debmalya Sen
  • Vikram Adithya C P
  • Venumula Sai Sumanth Reddy