AITopics

This paper describes Elyadata \& LIA's joint submission to the NADI multi-dialectal Arabic Speech Processing 2025. We participated in the Spoken Arabic Dialect Identification (ADI) and multi-dialectal Arabic ASR subtasks. Our submission ranked first for the ADI subtask and second for the multi-dialectal Arabic ASR subtask among all participants. Our ADI system is a fine-tuned Whisper-large-v3 encoder with data augmentation. This system obtained the highest ADI accuracy score of \textbf{79.83\%} on the official test set. For multi-dialectal Arabic ASR, we fine-tuned SeamlessM4T-v2 Large (Egyptian variant) separately for each of the eight considered dialects. Overall, we obtained an average WER and CER of \textbf{38.54\%} and \textbf{14.53\%}, respectively, on the test set. Our results demonstrate the effectiveness of large pre-trained speech models with targeted fine-tuning for Arabic speech processing.

artificial intelligence, machine learning, natural language, (15 more...)

doi: 10.18653/v1/2025.arabicnlp-sharedtasks.105

2511.1009

Country:

Europe (1.00)
Africa > Middle East > Morocco (0.15)
Asia > Middle East > Republic of Türkiye (0.14)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.50)

Elleuch, Haroun, Mdhaffar, Salima, Estève, Yannick, Bougares, Fethi

ADI-20: Arabic Dialect Identification dataset and models

We present ADI-20, an extension of the previously published ADI-17 Arabic Dialect Identification (ADI) dataset. ADI-20 covers all Arabic-speaking countries' dialects. It comprises 3,556 hours from 19 Arabic dialects in addition to Modern Standard Arabic (MSA). We used this dataset to train and evaluate various state-of-the-art ADI systems. We explored fine-tuning pre-trained ECAPA-TDNN-based models, as well as Whisper encoder blocks coupled with an attention pooling layer and a classification dense layer. We investigated the effect of (i) training data size and (ii) the model's number of parameters on identification performance. Our results show a small decrease in F1 score while using only 30% of the original training data. We open-source our collected data and trained models to enable the reproduction of our work, as well as support further research in ADI.

artificial intelligence, machine learning, natural language, (16 more...)

doi: 10.21437/Interspeech.2025-884

2511.1007

Country: Africa > Middle East (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

EnchTable: Unified Safety Alignment Transfer in Fine-tuned Large Language Models

Wu, Jialin, Li, Kecen, Huang, Zhicong, Li, Xinfeng, Wang, Xiaofeng, Hong, Cheng

Many machine learning models are fine-tuned from large language models (LLMs) to achieve high performance in specialized domains like code generation, biomedical analysis, and mathematical problem solving. However, this fine-tuning process often introduces a critical vulnerability: the systematic degradation of safety alignment, undermining ethical guidelines and increasing the risk of harmful outputs. Addressing this challenge, we introduce EnchTable, a novel framework designed to transfer and maintain safety alignment in downstream LLMs without requiring extensive retraining. EnchTable leverages a Neural Tangent Kernel (NTK)-based safety vector distillation method to decouple safety constraints from task-specific reasoning, ensuring compatibility across diverse model architectures and sizes. Additionally, our interference-aware merging technique effectively balances safety and utility, minimizing performance compromises across various task domains. We implemented a fully functional prototype of EnchTable on three different task domains and three distinct LLM architectures, and evaluated its performance through extensive experiments on eleven diverse datasets, assessing both utility and model safety. Our evaluations include LLMs from different vendors, demonstrating EnchTable's generalization capability. Furthermore, EnchTable exhibits robust resistance to static and dynamic jailbreaking attacks, outperforming vendor-released safety models in mitigating adversarial prompts. Comparative analyses with six parameter modification methods and two inference-time alignment baselines reveal that EnchTable achieves a significantly lower unsafe rate, higher utility score, and universal applicability across different task domains. Additionally, we validate EnchTable can be seamlessly integrated into various deployment pipelines without significant overhead.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

2511.0988

Country:

Asia (0.93)
North America (0.68)
Africa (0.67)
Europe > Austria (0.28)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine (0.67)
Law (0.67)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages

Omnilingual ASR team, null, Keren, Gil, Kozhevnikov, Artyom, Meng, Yen, Ropers, Christophe, Setzler, Matthew, Wang, Skyler, Adebara, Ife, Auli, Michael, Balioglu, Can, Chan, Kevin, Cheng, Chierh, Chuang, Joe, Droof, Caley, Duppenthaler, Mark, Duquenne, Paul-Ambroise, Erben, Alexander, Gao, Cynthia, Gonzalez, Gabriel Mejia, Lyu, Kehan, Miglani, Sagar, Pratap, Vineel, Sadagopan, Kaushik Ram, Saleem, Safiyyah, Turkatenko, Arina, Ventayol-Boada, Albert, Yong, Zheng-Xin, Chung, Yu-An, Maillard, Jean, Moritz, Rashel, Mourachko, Alexandre, Williamson, Mary, Yates, Shireen

Automatic speech recognition (ASR) has advanced in high-resource languages, but most of the world's 7,000+ languages remain unsupported, leaving thousands of long-tail languages behind. Expanding ASR coverage has been costly and limited by architectures that restrict language support, making extension inaccessible to most--all while entangled with ethical concerns when pursued without community collaboration. To transcend these limitations, we introduce Omnilingual ASR, the first large-scale ASR system designed for extensibility. Omnilingual ASR enables communities to introduce unserved languages with only a handful of data samples. It scales self-supervised pre-training to 7B parameters to learn robust speech representations and introduces an encoder-decoder architecture designed for zero-shot generalization, leveraging a LLM-inspired decoder. This capability is grounded in a massive and diverse training corpus; by combining breadth of coverage with linguistic variety, the model learns representations robust enough to adapt to unseen languages. Incorporating public resources with community-sourced recordings gathered through compensated local partnerships, Omnilingual ASR expands coverage to over 1,600 languages, the largest such effort to date--including over 500 never before served by ASR. Automatic evaluations show substantial gains over prior systems, especially in low-resource conditions, and strong generalization. We release Omnilingual ASR as a family of models, from 300M variants for low-power devices to 7B for maximum accuracy. We reflect on the ethical considerations shaping this design and conclude by discussing its societal impact. In particular, we highlight how open-sourcing models and tools can lower barriers for researchers and communities, inviting new forms of participation. Open-source artifacts are available at https://github.com/facebookresearch/omnilingual-asr.

large language model, machine learning, natural language, (18 more...)

2511.0969

Country:

Africa (1.00)
North America > United States (0.67)
Asia > Indonesia (0.67)
(3 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine (1.00)
Education (0.67)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Villeneuve, Catherine, Akera, Benjamin, Teng, Mélisande, Rolnick, David

BATIS: Bayesian Approaches for Targeted Improvement of Species Distribution Models

Species distribution models (SDMs), which aim to predict species occurrence based on environmental variables, are widely used to monitor and respond to biodiversity change. Recent deep learning advances for SDMs have been shown to perform well on complex and heterogeneous datasets, but their effectiveness remains limited by spatial biases in the data. In this paper, we revisit deep SDMs from a Bayesian perspective and introduce BATIS, a novel and practical framework wherein prior predictions are updated iteratively using limited observational data. Models must appropriately capture both aleatoric and epistemic uncertainty to effectively combine fine-grained local insights with broader ecological patterns. We benchmark an extensive set of uncertainty quantification approaches on a novel dataset including citizen science observations from the eBird platform. Our empirical study shows how Bayesian deep learning approaches can greatly improve the reliability of SDMs in data-scarce locations, which can contribute to ecological understanding and conservation efforts.

artificial intelligence, bayesian inference, machine learning, (20 more...)

2510.19749

Country:

North America > United States (0.94)
Africa (0.71)
North America > Canada (0.68)

Genre: Research Report > New Finding (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.82)

Quantifying Climate Policy Action and Its Links to Development Outcomes: A Cross-National Data-Driven Analysis

Dutta, Aditi

Addressing climate change effectively requires more than cataloguing the number of policies in place; it calls for tools that can reveal their thematic priorities and their tangible impacts on development outcomes. Existing assessments often rely on qualitative descriptions or composite indices, which can mask crucial differences between key domains such as mitigation, adaptation, disaster risk management, and loss and damage. To bridge this gap, we develop a quantitative indicator of climate policy orientation by applying a multilingual transformer-based language model to official national policy documents, achieving a classification accuracy of 0.90 (F1-score). Linking these indicators with World Bank development data in panel regressions reveals that mitigation policies are associated with higher GDP and GNI; disaster risk management correlates with greater GNI and debt but reduced foreign direct investment; adaptation and loss and damage show limited measurable effects. This integrated NLP-econometric framework enables comparable, theme-specific analysis of climate governance, offering a scalable method to monitor progress, evaluate trade-offs, and align policy emphasis with development goals.

artificial intelligence, machine learning, natural language, (15 more...)

2510.17425

Country:

Europe (0.47)
North America (0.28)
Africa (0.28)
Asia > India (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine (1.00)
Government (1.00)
Banking & Finance (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Al-Kharusi, Mohammed Hilal, Hayat, Khizar, Ruqeishi, Khalil Bader Al, Lone, Haroon Rashid

A Critical Review of the Need for Knowledge-Centric Evaluation of Quranic Recitation

The art and science of Quranic recitation (Tajweed), a discipline governed by meticulous phonetic, rhythmic, and theological principles, confronts substantial educational challenges in today's digital age. Although modern technology offers unparalleled opportunities for learning, existing automated systems for evaluating recitation have struggled to gain broad acceptance or demonstrate educational effectiveness. This literature review examines this crucial disparity, offering a thorough analysis of scholarly research, digital platforms, and commercial tools developed over the past twenty years. Our analysis uncovers a fundamental flaw in current approaches that adapt Automatic Speech Recognition (ASR) systems, which emphasize word identification over qualitative acoustic evaluation. These systems suffer from limitations such as reliance on biased datasets, demographic disparities, and an inability to deliver meaningful feedback for improvement. Challenging these data-centric methodologies, we advocate for a paradigm shift toward a knowledge-based computational framework. By leveraging the unchanging nature of the Quranic text and the well-defined rules of Tajweed, we propose that an effective evaluation system should be built upon rule-based acoustic modeling centered on canonical pronunciation principles and articulation points (Makhraj), rather than depending on statistical patterns derived from flawed or biased data. The review concludes that the future of automated Quranic recitation assessment lies in hybrid systems that combine linguistic expertise with advanced audio processing. Such an approach paves the way for developing reliable, fair, and pedagogically effective tools that can authentically assist learners across the globe.

data mining, machine learning, natural language, (16 more...)

2510.12858

Country:

Asia > Middle East (0.92)
Africa > Middle East > Egypt (0.46)

Genre:

Instructional Material (0.93)
Overview (0.88)
Research Report > New Finding (0.67)
Research Report > Promising Solution (0.46)

Industry:

Education > Educational Setting > Online (0.93)
Education > Educational Technology > Educational Software > Computer Based Training (0.68)
Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
(4 more...)

WIREDNov-13-2025, 20:19:22 GMT

Jeffrey Epstein Claimed Intimate Knowledge of Donald Trump's Views in Texts With Bill Gates Adviser

In text messages from 2017, Jeffrey Epstein seemingly represented himself as positioned to pass information from the Trump White House to Bill Gates through an intermediary. In text messages sent in 2017, disgraced financier and registered sex offender Jeffrey Epstein appears to position himself as a middleman between president Donald Trump's administration and Microsoft cofounder Bill Gates, even seemingly representing himself as passing on information directly from Trump to Gates through an intermediary. The messages, which the House Committee on Oversight and Government Reform released on Wednesday and originated with the Epstein estate, begin on January 27, 2017, years after Epstein had already pleaded guilty to state prostitution solicitation charges. In them, Epstein purports to show intimate awareness of Trump's plans for domestic and global public health policy, and to be directly familiar with the president's thinking. Trump has continued to claim, as recently as this summer, that he stopped speaking with Epstein around 2004.

artificial intelligence, epstein, trump, (12 more...)

WIRED

Country:

Asia > Nepal (0.14)
North America > United States > Illinois > Cook County > Chicago (0.05)
Asia > Middle East > Republic of Türkiye (0.05)
(8 more...)

Industry:

Law (1.00)
Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Communications > Mobile (0.70)
Information Technology > Artificial Intelligence > Robots (0.47)

BBC NewsNov-13-2025, 19:41:54 GMT

UK billionaire Joe Lewis receives pardon from Trump

Billionaire UK businessman Joe Lewis, whose family trust owns Tottenham Hotspur football club, has received a pardon from US President Donald Trump. Lewis, 88, pleaded guilty to insider trading as part of an agreement with prosecutors in 2024 that saw him avoid prison. He was accused of passing on information about his companies to his private pilots, friends, personal assistants and romantic partners in a fraud that authorities said netted millions of dollars in profit. A White House official said Trump approved the pardon for Lewis, who requested it so he could receive medical treatment and visit his grandchildren and great grandchildren in the US. Mr Lewis admitted he made a terrible mistake, did not fight extradition in the case, and paid a $5 million fine, the official told the BBC.

artificial intelligence, lewis, pardon, (10 more...)

BBC News

Country:

South America (0.16)
North America > Central America (0.16)
North America > United States > California (0.07)
(15 more...)

Industry:

Leisure & Entertainment (1.00)
Law (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology: Information Technology > Artificial Intelligence (0.36)

BBC NewsNov-13-2025, 16:03:44 GMT

Lack of trust and racism concerns: Five key failings in Sara Sharif review

An independent review of the Sara Sharif case has identified multiple failings from agencies before her murder in Surrey in 2023, following two years of abuse. The child safeguarding practice review, published on Thursday, said there were clearly several points in Sara's life, in particular during the last few months, where different actions could and should have been taken by the authorities. The system failed to keep her safe, it added. Responding to the report, the Children's Commissioner said the case was a catalogue of missed opportunities, poor communication and ill-informed assumptions. The education secretary said there had been the glaring failures across all agencies.

artificial intelligence, sara sharif, sharif, (15 more...)

BBC News

Country:

North America > United States (0.35)
South America (0.15)
North America > Central America (0.15)
(15 more...)

Industry:

Leisure & Entertainment (0.73)
Education (0.70)
Law > Civil Rights & Constitutional Law (0.52)
Government > Regional Government (0.50)

Technology: Information Technology > Artificial Intelligence (0.32)