Goto

Collaborating Authors

 language barrier


ChatGPT as Linguistic Equalizer? Quantifying LLM-Driven Lexical Shifts in Academic Writing

Lin, Dingkang, Zhao, Naixuan, Tian, Dan, Li, Jiang

arXiv.org Artificial Intelligence

The advent of ChatGPT has profoundly reshaped scientific research practices, particularly in academic writing, where non-native English-speakers (NNES) historically face linguistic barriers. This study investigates whether ChatGPT mitigates these barriers and fosters equity by analyzing lexical complexity shifts across 2.8 million articles from OpenAlex (2020-2024). Using the Measure of Textual Lexical Diversity (MTLD) to quantify vocabulary sophistication and a difference-in-differences (DID) design to identify causal effects, we demonstrate that ChatGPT significantly enhances lexical complexity in NNES-authored abstracts, even after controlling for article-level controls, authorship patterns, and venue norms. Notably, the impact is most pronounced in preprint papers, technology- and biology-related fields and lower-tier journals. These findings provide causal evidence that ChatGPT reduces linguistic disparities and promotes equity in global academia.


Team A at SemEval-2025 Task 11: Breaking Language Barriers in Emotion Detection with Multilingual Models

Sahil, P Sam, Jamatia, Anupam

arXiv.org Artificial Intelligence

This paper describes the system submitted by Team A to SemEval 2025 Task 11, "Bridging the Gap in Text-Based Emotion Detection. " The task involved identifying the perceived emotion of a speaker from text snippets, with each instance annotated with one of six emotions: joy, sadness, fear, anger, surprise, or disgust. A dataset provided by the task organizers served as the foundation for training and evaluating our models. Among the various approaches explored, the best performance was achieved using multilingual em-beddings combined with a fully connected layer. This paper details the system architecture, discusses experimental results, and highlights the advantages of leveraging multilingual representations for robust emotion detection in text.


Breaking Language Barriers: A Question Answering Dataset for Hindi and Marathi

Sabane, Maithili, Litake, Onkar, Chadha, Aman

arXiv.org Artificial Intelligence

The recent advances in deep-learning have led to the development of highly sophisticated systems with an unquenchable appetite for data. On the other hand, building good deep-learning models for low-resource languages remains a challenging task. This paper focuses on developing a Question Answering dataset for two such languages- Hindi and Marathi. Despite Hindi being the 3rd most spoken language worldwide, with 345 million speakers, and Marathi being the 11th most spoken language globally, with 83.2 million speakers, both languages face limited resources for building efficient Question Answering systems. To tackle the challenge of data scarcity, we have developed a novel approach for translating the SQuAD 2.0 dataset into Hindi and Marathi. We release the largest Question-Answering dataset available for these languages, with each dataset containing 28,000 samples. We evaluate the dataset on various architectures and release the best-performing models for both Hindi and Marathi, which will facilitate further research in these languages. Leveraging similarity tools, our method holds the potential to create datasets in diverse languages, thereby enhancing the understanding of natural language across varied linguistic contexts. Our fine-tuned models, code, and dataset will be made publicly available.


How Artificial Intelligence Can Bring People Together

#artificialintelligence

Artificial intelligence (AI) enables people to spend more time with those that matter the most. AI is bringing people together, whether it's assisting in the planning of a family vacation, getting to a place securely, allowing everyone to see each other on a video call, or making gift shopping a little easier. The continuous evolution of AI has the potential to revolutionize the way we live, work, and interact with each other. While AI is often portrayed as a divisive force that could lead to job losses and social inequality, it also has the power to bring people together and create new opportunities for collaboration and cooperation. In this article, we will explore some of the ways in which AI can be used to bridge social, cultural, and linguistic barriers and foster greater understanding and empathy among diverse communities. One of the most significant benefits of AI is its ability to facilitate communication across different languages.


Introduction to No Language Left Behind (NLLB-200)

#artificialintelligence

Meta AI recently open-sourced its massive translation model, No Language Left Behind (NLLB-200), intending to exclude language barriers across the globe. As we know, that machine translation has become a key area of research nowadays, and it has become a great news for many researchers and organisations who can use it for their respective research and work. So let's take a look at the news and understand a bit about NLLB-200 with the below points: No Language Left Behind (NLLB-200) is a model from the series of massive machine translation models of MetaAI for language translation. A newer member of the series NLLB-200 is capable of translating between 200 languages, representing Meta's capacity of Meta in the direction of AI researchers. These development aims to allow people to access, share and use online content in their native languages and communicate across the world regardless of language preferences.


Meta's AI translation breaks 200 language barrier

#artificialintelligence

Meta's quest to translate underserved languages is marking its first victory with the open source release of a language model able to decipher 202 languages. Named after Meta's No Language Left Behind initiative and dubbed NLLB-200, the model is the first able to translate so many languages, according to its makers, all with the goal to improve translation for languages overlooked by similar projects. "The vast majority of improvements made in machine translation in the last decades have been for high-resource languages," Meta researchers wrote in a paper [PDF]. "While machine translation continues to grow, the fruits it bears are unevenly distributed," they said. According to the announcement of NLLB-200, the model can translate 55 African languages "with high-quality results."


Break through language barriers with Amazon Transcribe, Amazon Translate, and Amazon Polly

#artificialintelligence

Imagine a surgeon taking video calls with patients across the globe without the need of a human translator. What if a fledgling startup could easily expand their product across borders and into new geographical markets by offering fluid, accurate, multilingual customer support and sales, all without the need of a live human translator? What happens to your business when you're no longer bound by language? It's common today to have virtual meetings with international teams and customers that speak many different languages. Whether they're internal or external meetings, meaning often gets lost in complex discussions and you may encounter language barriers that prevent you from being as effective as you could be.


Transcription & Data Collection - CCC

#artificialintelligence

Artificial intelligence is the most advanced level in computer programming, this is where machines are fed details that enable them to learn and initiate commands, this helps in automation and urgent task scheduling like transcription and speech and audio data collection. Artificial intelligence is also being used in home automation systems, from detecting and opening door locks and turning on light and entertainment systems, this is now a trend. Every modern home has some sort of utility or entertainment management system of sorts, may it be Siri or Alexa and/or Google, these are the examples of which. Modern tasks and processes are put into computer commands to enable the machine to learn a certain function and follow a certain trigger or timeline to initiate the commands. These commands can be something that is hard-coded into the system with minimal requirement for user input, or it can also be a continuous feed and learning structure where input commands are fed towards a command recorder.


Cracking the Language Barrier for a Multilingual Africa, 2021

VideoLectures.NET

This webinar series will be hosted by the International Research Centre in Artificial Intelligence (IRCAI) and supported by UNESCO and Knowledge 4 All Foundation, to present the Fellowship to develop datasets and strengthen capacities and innovation potential for Low Resource African Languages project that is composed of research in natural language processing, open dataset creation and publishing, and the development of an interface between policy and technology sphere. The project delivered three main components from research in natural language processing, dataset creation, and policy creation: 1. Fellowship for African AI researchers focused on African languages, based on previously IDRC and Knowledge 4 All Foundation funded work on language datasets. This work contributes to a roadmap for better integration of African languages on digital platforms in aid of lowering the barrier for African participation in the digital economy, 2. Improvement of the representation of AI research carried out on African languages by creating resources for a variety of NLP tasks and in a variety of African languages that will enable good, data-driven results in AI research, 3. Attract an African community of native speakers as contributors of language resources and language technology tools to adopt and support Masakhane NLP, a platform for sharing, maintaining and making use of language resources and tools; establishing widely agreed benchmarks for NLP tasks and stimulating competition between methods and systems, 4. Be used as a model case to inform African evidence-based policymaking concerning Artificial Intelligence and will be included in UNESCO’s AI Decision maker’s Essential to inform policymakers. Find more information at IRCAI Webinar Series


Zoom to acquire German startup to bring real-time translation to meetings – TechCrunch

#artificialintelligence

As companies expand worldwide and meet online in tools like Zoom, the language barrier can be a real impediment to getting work done. Zoom announced that it intends to acquire German startup Karlsruhe Information Technology Solutions or Kites for short, to bring real-time machine-learning-based translation to the platform. The companies did not share the terms of the deal, but with Kites, the company gets a team of top researchers, who can help enhance the machine-learning translation knowledge at the company. "Kites' talented team of 12 research scientists will help Zoom's engineering team advance the field of [machine translation] to improve meeting productivity and efficiency by providing multilanguage translation capabilities for Zoom users," the company said in a statement. The deal appears to be an acqui-hire as the company adds those 12 researchers to the Zoom engineering group.