Imphal
Deceptive Humor: A Synthetic Multilingual Benchmark Dataset for Bridging Fabricated Claims with Humorous Content
Kasu, Sai Kartheek Reddy, Biradar, Shankar, Saumya, Sunil
This paper presents the Deceptive Humor Dataset (DHD), a novel resource for studying humor derived from fabricated claims and misinformation. In an era of rampant misinformation, understanding how humor intertwines with deception is essential. DHD consists of humor-infused comments generated from false narratives, incorporating fabricated claims and manipulated information using the ChatGPT-4o model. Each instance is labeled with a Satire Level, ranging from 1 for subtle satire to 3 for high-level satire and classified into five distinct Humor Categories: Dark Humor, Irony, Social Commentary, Wordplay, and Absurdity. The dataset spans multiple languages including English, Telugu, Hindi, Kannada, Tamil, and their code-mixed variants (Te-En, Hi-En, Ka-En, Ta-En), making it a valuable multilingual benchmark. By introducing DHD, we establish a structured foundation for analyzing humor in deceptive contexts, paving the way for a new research direction that explores how humor not only interacts with misinformation but also influences its perception and spread. We establish strong baselines for the proposed dataset, providing a foundation for future research to benchmark and advance deceptive humor detection models.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China (0.05)
- Asia > India > Andhra Pradesh (0.04)
- (6 more...)
- Government (1.00)
- Health & Medicine > Therapeutic Area > Immunology (0.95)
- Media > News (0.91)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.69)
EzSQL: An SQL intermediate representation for improving SQL-to-text Generation
Bhardwaj, Meher, Ethari, Hrishikesh, Moirangthem, Dennis Singh
The SQL-to-text generation task traditionally uses template base, Seq2Seq, tree-to-sequence, and graph-to-sequence models. Recent models take advantage of pre-trained generative language models for this task in the Seq2Seq framework. However, treating SQL as a sequence of inputs to the pre-trained models is not optimal. In this work, we put forward a new SQL intermediate representation called EzSQL to align SQL with the natural language text sequence. EzSQL simplifies the SQL queries and brings them closer to natural language text by modifying operators and keywords, which can usually be described in natural language. EzSQL also removes the need for set operators. Our proposed SQL-to-text generation model uses EzSQL as the input to a pre-trained generative language model for generating the text descriptions. We demonstrate that our model is an effective state-of-the-art method to generate text narrations from SQL queries on the WikiSQL and Spider datasets. We also show that by generating pretraining data using our SQL-to-text generation model, we can enhance the performance of Text-to-SQL parsers.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.05)
- Europe > Germany > Berlin (0.04)
- (4 more...)
Navigating Text-to-Image Generative Bias across Indic Languages
Mittal, Surbhi, Sudan, Arnav, Vatsa, Mayank, Singh, Richa, Glaser, Tamar, Hassner, Tal
This research investigates biases in text-to-image (TTI) models for the Indic languages widely spoken across India. It evaluates and compares the generative performance and cultural relevance of leading TTI models in these languages against their performance in English. Using the proposed IndicTTI benchmark, we comprehensively assess the performance of 30 Indic languages with two open-source diffusion models and two commercial generation APIs. The primary objective of this benchmark is to evaluate the support for Indic languages in these models and identify areas needing improvement. Given the linguistic diversity of 30 languages spoken by over 1.4 billion people, this benchmark aims to provide a detailed and insightful analysis of TTI models' effectiveness within the Indic linguistic landscape.
- North America > United States > California > San Mateo County > Menlo Park (0.04)
- North America > United States > California > Alameda County > Alameda (0.04)
- Asia > Indonesia > Bali (0.04)
- (3 more...)
Bridging or Breaking: Impact of Intergroup Interactions on Religious Polarization
Chaturvedi, Rochana, Chaturvedi, Sugat, Zheleva, Elena
While exposure to diverse viewpoints may reduce polarization, it can also have a backfire effect and exacerbate polarization when the discussion is adversarial. Here, we examine the question whether intergroup interactions around important events affect polarization between majority and minority groups in social networks. We compile data on the religious identity of nearly 700,000 Indian Twitter users engaging in COVID-19-related discourse during 2020. We introduce a new measure for an individual's group conformity based on contextualized embeddings of tweet text, which helps us assess polarization between religious groups. We then use a meta-learning framework to examine heterogeneous treatment effects of intergroup interactions on an individual's group conformity in the light of communal, political, and socio-economic events. We find that for political and social events, intergroup interactions reduce polarization. This decline is weaker for individuals at the extreme who already exhibit high conformity to their group. In contrast, during communal events, intergroup interactions can increase group conformity. Finally, we decompose the differential effects across religious groups in terms of emotions and topics of discussion. The results show that the dynamics of religious polarization are sensitive to the context and have important implications for understanding the role of intergroup interactions.
- Europe > France (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Singapore > Central Region > Singapore (0.06)
- (20 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Media > News (1.00)
- Leisure & Entertainment (1.00)
- Information Technology > Services (1.00)
- (6 more...)
IndicVoices: Towards building an Inclusive Multilingual Speech Dataset for Indian Languages
Javed, Tahir, Nawale, Janki Atul, George, Eldho Ittan, Joshi, Sakshi, Bhogale, Kaushal Santosh, Mehendale, Deovrat, Sethi, Ishvinder Virender, Ananthanarayanan, Aparna, Faquih, Hafsah, Palit, Pratiti, Ravishankar, Sneha, Sukumaran, Saranya, Panchagnula, Tripura, Murali, Sunjay, Gandhi, Kunal Sharad, R, Ambujavalli, M, Manickam K, Vaijayanthi, C Venkata, Karunganni, Krishnan Srinivasa Raghavan, Kumar, Pratyush, Khapra, Mitesh M
We present INDICVOICES, a dataset of natural and spontaneous speech containing a total of 7348 hours of read (9%), extempore (74%) and conversational (17%) audio from 16237 speakers covering 145 Indian districts and 22 languages. Of these 7348 hours, 1639 hours have already been transcribed, with a median of 73 hours per language. Through this paper, we share our journey of capturing the cultural, linguistic and demographic diversity of India to create a one-of-its-kind inclusive and representative dataset. More specifically, we share an open-source blueprint for data collection at scale comprising of standardised protocols, centralised tools, a repository of engaging questions, prompts and conversation scenarios spanning multiple domains and topics of interest, quality control mechanisms, comprehensive transcription guidelines and transcription tools. We hope that this open source blueprint will serve as a comprehensive starter kit for data collection efforts in other multilingual regions of the world. Using INDICVOICES, we build IndicASR, the first ASR model to support all the 22 languages listed in the 8th schedule of the Constitution of India. All the data, tools, guidelines, models and other materials developed as a part of this work will be made publicly available
- Health & Medicine (1.00)
- Education (1.00)
- Consumer Products & Services (0.92)
- (3 more...)
Unleashing the Power of Dynamic Mode Decomposition and Deep Learning for Rainfall Prediction in North-East India
Chowdary, Paleti Nikhil, P, Sathvika, U, Pranav, S, Rohan, V, Sowmya, A, Gopalakrishnan E, M, Dhanya
Accurate rainfall forecasting is crucial for effective disaster preparedness and mitigation in the North-East region of India, which is prone to extreme weather events such as floods and landslides. In this study, we investigated the use of two data-driven methods, Dynamic Mode Decomposition (DMD) and Long Short-Term Memory (LSTM), for rainfall forecasting using daily rainfall data collected from India Meteorological Department in northeast region over a period of 118 years. We conducted a comparative analysis of these methods to determine their relative effectiveness in predicting rainfall patterns. Using historical rainfall data from multiple weather stations, we trained and validated our models to forecast future rainfall patterns. Our results indicate that both DMD and LSTM are effective in forecasting rainfall, with LSTM outperforming DMD in terms of accuracy, revealing that LSTM has the ability to capture complex nonlinear relationships in the data, making it a powerful tool for rainfall forecasting. Our findings suggest that data-driven methods such as DMD and deep learning approaches like LSTM can significantly improve rainfall forecasting accuracy in the North-East region of India, helping to mitigate the impact of extreme weather events and enhance the region's resilience to climate change.
Now, Indian Railways to use Artificial Intelligence to control air circulation
While Indian Railways is already using this technique to detect flaws or problems in the signalling system, the national transporter will now reportedly use artificial intelligence (AI) to control air circulation, signages, supervision and maintenance work in the 10.28 km-long tunnel in Imphal, the capital of Manipur. This is part of Indian Railways' 110-km line Jiribam to Imphal project. The Artificial Intelligence system will control air circulation and alert the passenger in case of fire. This tunnel has reportedly a safety spot every 500 meters. "In case of an accident or any mishap, passengers have to go 500 meters into the safety tunnel and through signages find their way out," Yogesh Verma, deputy chief, construction, Northeast Frontier Railway told PTI.
In a first, Railways to use artificial intelligence to control air circulation in Manipur tunnel
For the first time, the railways will use artificial intelligence to control air circulation, signages and even supervision and maintenance work in the 10.28 km-long tunnel in Imphal as part of its 110 km railway line from Jiribam to Manipur's capital city. While the national transporter is already using this technique to detect flaws or problems in the signalling system on a real-time basis and rectifying it in order to avoid possible delays and mishaps, this is the first time that such a technology will be used in a tunnel in the country. "The system will control air circulation in the system, along with other aspects. It will alert the passenger in case of fire and help us in quick evacuation in case of any issue. This tunnel is specially unique as it also has a safety tunnel at every 500 meters," said Yogesh Verma, deputy chief, construction, Northeast Frontier Railway.