AITopics

This paper investigates a critical design decision in the practice of massively multilingual continual pre-training -- the inclusion of parallel data. Specifically, we study the impact of bilingual translation data for massively multilingual language adaptation of the Llama3 family of models to 500 languages. To this end, we construct the MaLA bilingual translation corpus, containing data from more than 2,500 language pairs. Subsequently, we develop the EMMA-500 Llama 3 suite of four massively multilingual models -- continually pre-trained from the Llama 3 family of base models extensively on diverse data mixes up to 671B tokens -- and explore the effect of continual pre-training with or without bilingual translation data. Comprehensive evaluation across 7 tasks and 12 benchmarks demonstrates that bilingual data tends to enhance language transfer and performance, particularly for low-resource languages. We open-source the MaLA corpus, EMMA-500 Llama 3 suite artefacts, code, and model generations.

large language model, latn, machine learning, (21 more...)

2506.00469

Country:

Europe (1.00)
North America > Canada (0.27)
North America > United States > Pennsylvania (0.27)
Asia > Middle East (0.27)

Genre: Research Report > New Finding (0.92)

Industry:

Education (0.46)
Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

TreeRare: Syntax Tree-Guided Retrieval and Reasoning for Knowledge-Intensive Question Answering

Zhang, Boyi, Liu, Zhuo, He, Hangfeng

In real practice, questions are typically complex and knowledge-intensive, requiring Large Language Models (LLMs) to recognize the multifaceted nature of the question and reason across multiple information sources. Iterative and adaptive retrieval, where LLMs decide when and what to retrieve based on their reasoning, has been shown to be a promising approach to resolve complex, knowledge-intensive questions. However, the performance of such retrieval frameworks is limited by the accumulation of reasoning errors and misaligned retrieval results. To overcome these limitations, we propose TreeRare (Syntax Tree-Guided Retrieval and Reasoning), a framework that utilizes syntax trees to guide information retrieval and reasoning for question answering. Following the principle of compositionality, TreeRare traverses the syntax tree in a bottom-up fashion, and in each node, it generates subcomponent-based queries and retrieves relevant passages to resolve localized uncertainty. A subcomponent question answering module then synthesizes these passages into concise, context-aware evidence. Finally, TreeRare aggregates the evidence across the tree to form a final answer. Experiments across five question answering datasets involving ambiguous or multi-hop reasoning demonstrate that TreeRare achieves substantial improvements over existing state-of-the-art methods.

large language model, machine learning, question answering, (20 more...)

doi: 10.18653/v1/2025.emnlp-main.947

2506.00331

Country:

North America > United States (1.00)
Europe (1.00)
Asia > Middle East > UAE (0.46)

Genre: Research Report > Promising Solution (0.54)

Industry:

Media > Film (0.95)
Leisure & Entertainment > Sports > Motorsports (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference

Nrusimha, Aniruddha, Brandon, William, Mishra, Mayank, Shen, Yikang, Panda, Rameswar, Ragan-Kelley, Jonathan, Kim, Yoon

The size and compute characteristics of modern large language models have led to an increased interest in developing specialized kernels tailored for particular training and inference workloads. Existing kernels primarily optimize for compute utilization, targeting the large-batch training and inference settings. However, low-batch inference, where memory bandwidth and kernel launch overheads are significant factors, remains important for many applications of interest such as in edge deployment and latency-sensitive applications. This paper describes FlashFormer, which fuses the entire transformer forward pass into a single kernel for accelerating low-batch inference of large language models. Across various model sizes and quantizations settings, FlashFormer achieves nontrivial speedups compared to existing inference kernels.

large language model, machine learning, natural language, (21 more...)

2505.22758

Country: North America > United States > Massachusetts (0.46)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Turing Test 2.0: The General Intelligence Threshold

Mappouras, Georgios

With the rise of artificial intelligence (A.I.) and large language models like ChatGPT, a new race for achieving artificial general intelligence (A.G.I) has started. While many speculate how and when A.I. will achieve A.G.I., there is no clear agreement on how A.G.I. can be detected in A.I. models, even when popular tools like the Turing test (and its modern variations) are used to measure their intelligence. In this work, we discuss why traditional methods like the Turing test do not suffice for measuring or detecting A.G.I. and provide a new, practical method that can be used to decide if a system (computer or any other) has reached or surpassed A.G.I. To achieve this, we make two new contributions. First, we present a clear definition for general intelligence (G.I.) and set a G.I. Threshold (G.I.T.) that can be used to distinguish between systems that achieve A.G.I. and systems that do not. Second, we present a new framework on how to construct tests that can detect if a system has achieved G.I. in a simple, comprehensive, and clear-cut fail/pass way. We call this novel framework the Turing test 2.0. We then demonstrate real-life examples of applying tests that follow our Turing test 2.0 framework on modern A.I. models.

information, large language model, machine learning, (16 more...)

2505.1955

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Industry:

Education (1.00)
Leisure & Entertainment (0.93)
Health & Medicine > Therapeutic Area > Neurology (0.92)
Media (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Turing's Test (1.00)
(2 more...)

Data Mixing Can Induce Phase Transitions in Knowledge Acquisition

Gu, Xinran, Lyu, Kaifeng, Li, Jiazheng, Zhang, Jingzhao

Large Language Models (LLMs) are typically trained on data mixtures: most data come from web scrapes, while a small portion is curated from high-quality sources with dense domain-specific knowledge. In this paper, we show that when training LLMs on such data mixtures, knowledge acquisition from knowledge-dense datasets, unlike training exclusively on knowledge-dense data (arXiv:2404.05405), does not always follow a smooth scaling law but can exhibit phase transitions with respect to the mixing ratio and model size. Through controlled experiments on a synthetic biography dataset mixed with web-scraped data, we demonstrate that: (1) as we increase the model size to a critical value, the model suddenly transitions from memorizing very few to most of the biographies; (2) below a critical mixing ratio, the model memorizes almost nothing even with extensive training, but beyond this threshold, it rapidly memorizes more biographies. We attribute these phase transitions to a capacity allocation phenomenon: a model with bounded capacity must act like a knapsack problem solver to minimize the overall test loss, and the optimal allocation across datasets can change discontinuously as the model size or mixing ratio varies. We formalize this intuition in an information-theoretic framework and reveal that these phase transitions are predictable, with the critical mixing ratio following a power-law relationship with the model size. Our findings highlight a concrete case where a good mixing recipe for large models may not be optimal for small models, and vice versa.

knowledge management, large language model, machine learning, (18 more...)

2505.18091

Country:

Asia (0.67)
North America > United States (0.46)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.66)
Research Report > Strength High (0.54)

Industry:

Health & Medicine (0.68)
Leisure & Entertainment > Sports (0.45)

Technology:

Information Technology > Knowledge Management > Knowledge Engineering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

On-Policy Optimization with Group Equivalent Preference for Multi-Programming Language Understanding

Wu, Haoyuan, Ming, Rui, Gao, Jilong, Zhao, Hangyu, Chen, Xueyi, Yang, Yikai, Zheng, Haisheng, He, Zhuolun, Yu, Bei

Large language models (LLMs) achieve remarkable performance in code generation tasks. However, a significant performance disparity persists between popular programming languages (e.g., Python, C++) and others. To address this capability gap, we leverage the code translation task to train LLMs, thereby facilitating the transfer of coding proficiency across diverse programming languages. Moreover, we introduce OORL for training, a novel reinforcement learning (RL) framework that integrates on-policy and off-policy strategies. Within OORL, on-policy RL is applied during code translation, guided by a rule-based reward signal derived from unit tests. Complementing this coarse-grained rule-based reward, we propose Group Equivalent Preference Optimization (GEPO), a novel preference optimization method. Specifically, GEPO trains the LLM using intermediate representations (IRs) groups. LLMs can be guided to discern IRs equivalent to the source code from inequivalent ones, while also utilizing signals about the mutual equivalence between IRs within the group. This process allows LLMs to capture nuanced aspects of code functionality. By employing OORL for training with code translation tasks, LLMs improve their recognition of code functionality and their understanding of the relationships between code implemented in different languages. Extensive experiments demonstrate that our OORL for LLMs training with code translation tasks achieves significant performance improvements on code benchmarks across multiple programming languages.

large language model, natural language, programming language, (16 more...)

2505.12723

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Let the Trial Begin: A Mock-Court Approach to Vulnerability Detection using LLM-Based Agents

Widyasari, Ratnadira, Weyssow, Martin, Irsan, Ivana Clairine, Ang, Han Wei, Liauw, Frank, Ouh, Eng Lieh, Shar, Lwin Khin, Kang, Hong Jin, Lo, David

Detecting vulnerabilities in source code remains a critical yet challenging task, especially when benign and vulnerable functions share significant similarities. In this work, we introduce VulTrial, a courtroom-inspired multi-agent framework designed to identify vulnerable code and to provide explanations. It employs four role-specific agents, which are security researcher, code author, moderator, and review board. Using GPT-4o as the base LLM, VulTrial almost doubles the efficacy of prior best-performing baselines. Additionally, we show that role-specific instruction tuning with small quantities of data significantly further boosts VulTrial's efficacy. Our extensive experiments demonstrate the efficacy of VulTrial across different LLMs, including an open-source, in-house-deployable model (LLaMA-3.1-8B), as well as the high quality of its generated explanations and its ability to uncover multiple confirmed zero-day vulnerabilities in the wild.

large language model, machine learning, natural language, (18 more...)

doi: 10.1145/3744916.3773256

2505.10961

Country: South America > Brazil > Rio de Janeiro (0.16)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

"I Can See Forever!": Evaluating Real-time VideoLLMs for Assisting Individuals with Visual Impairments

Zhang, Ziyi, Sun, Zhen, Zhang, Zongmin, Peng, Zifan, Zhao, Yuemeng, Wang, Zichun, Luo, Zeren, Zuo, Ruiting, He, Xinlei

The visually impaired population faces significant challenges in daily activities. While prior works employ vision language models for assistance, most focus on static content and cannot address real-time perception needs in complex environments. Recent VideoLLMs enable real-time vision and speech interaction, offering promising potential for assistive tasks. In this work, we conduct the first study evaluating their effectiveness in supporting daily life for visually impaired individuals. We first conducted a user survey with visually impaired participants to design the benchmark VisAssistDaily for daily life evaluation. Using VisAssistDaily, we evaluate popular VideoLLMs and find GPT-4o achieves the highest task success rate. We further conduct a user study to reveal concerns about hazard perception. To address this, we propose SafeVid, an environment-awareness dataset, and fine-tune VITA-1.5, improving risk recognition accuracy from 25.00% to 76.00%.We hope this work provides valuable insights and inspiration for future research in this field.

large language model, machine learning, natural language, (20 more...)

2505.04488

Country:

Europe (0.93)
North America > United States (0.28)
Asia > China (0.28)

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)
Transportation > Infrastructure & Services (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

WIREDDec-4-2025, 22:04:31 GMT

Cloudflare Has Blocked 416 Billion AI Bot Requests Since July 1

Cloudflare CEO Matthew Prince claims the internet infrastructure company's efforts to block AI crawlers are already seeing big results. As the large language models powering generative AI tools slurp up ever more data across the web, Cloudflare cofounder and CEO Matthew Prince said at WIRED's Big Interview event in San Francisco on Thursday that the internet infrastructure company has blocked more than 400 billion AI bot requests for its customers since July 1. The action comes after the company announced a Content Independence Day in July--an initiative with prominent publishers and AI firms to block AI crawlers by default on content creators' work unless the AI companies pay for access. Since July 2024, Cloudflare has offered customers tools to block AI bots from scraping their content. Cloudflare told WIRED that the number of AI bots blocked since July 1, 2025 is 416 billion.

large language model, machine learning, natural language, (17 more...)

WIRED

Country:

North America > United States > California > San Francisco County > San Francisco (0.26)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.05)
North America > United States > Arizona (0.05)
(3 more...)

Genre: Press Release (0.75)

Industry: Media (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.36)

The Atlantic - TechnologyDec-4-2025, 21:41:00 GMT

The Strange Disappearance of an Anti-AI Activist

Sam Kirchner wants to save the world from artificial superintelligence. He's been missing for two weeks. B efore Sam Kirchner vanished, before the San Francisco Police Department began to warn that he could be armed and dangerous, before OpenAI locked down its offices over the potential threat, those who encountered him saw him as an ordinary, if ardent, activist. Phoebe Thomas Sorgen met Kirchner a few months ago at Travis Air Force Base, northeast of San Francisco, at a protest against immigration policy and U.S. military aid to Israel. Sorgen, a longtime activist whose first protests were against the Vietnam War, was going to block an entrance to the base with six other older women. Kirchner, 27 years old, was there with a couple of other members of a new group called Stop AI, and they all agreed to go along to record video on their phones in case of a confrontation with the police.

large language model, machine learning, natural language, (19 more...)

The Atlantic - Technology

Country:

North America > United States > California > San Francisco County > San Francisco (0.46)
Asia > Vietnam (0.24)
Asia > Middle East > Israel (0.24)
Asia > China > Hong Kong (0.04)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government > Military > Air Force (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.37)