AITopics

Ensuring Large Language Model (LLM) safety remains challenging due to the absence of universal standards and reliable content validators, making it difficult to obtain effective training signals. We discover that aligned models already possess robust internal safety beliefs: they consistently produce high-confidence refusals to harmful requests while exhibiting high entropy when generating potentially dangerous content. This entropy gap reveals an untapped signal--models intrinsically "know" when to refuse. We introduce Safety Instincts Reinforcement Learning (SIRL), which transforms this internal confidence into a self-generated reward signal, eliminating dependence on external validators or human annotations. SIRL teaches models to trust their safety instincts by reinforcing low-entropy refusal behaviors. Evaluated on Llama and Qwen models, SIRL maintains 89%+ Defense Success Rates (DSRs) against 20+ jailbreak methods, from static prompts to adaptive attacks. Using only 15,000 unlabeled prompts, SIRL surpasses resource-intensive supervised methods while preserving performance on mathematics, coding, and conversation benchmarks. Our work demonstrates that effective alignment can emerge from within, paving the way for more autonomous and robust AI safety mechanisms that scale without extensive human oversight. The widespread deployment of large language models (LLMs) has made defending against jailbreak attacks a critical priority (Yi et al., 2024; Wei et al., 2023; Shen et al., 2025b). Unlike well-defined tasks with clear metrics, determining what constitutes "safe" behavior requires expensive human annotation, carefully crafted reward models, or predefined rules that often fail to generalize (Casper et al., 2023; Zou et al., 2023b). As sophisticated jailbreak techniques continue to evolve (Samvelyan et al., 2024; Zou et al., 2023b; Chao et al., 2025; Andriushchenko & Flammarion, 2024; Andriushchenko et al., 2025), the question remains: can models learn to enhance their own safety without relying on these external validators? Recent advances in self-alignment (Burns et al., 2023; Christiano et al., 2018) and the pursuit of su-peralignment (Leike & Sutskever, 2023) suggest that models may possess untapped internal signals for improvement. Inspired by this possibility, we investigate whether aligned LLMs harbor intrinsic safety beliefs that could guide self-improvement.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

2510.01088

Country:

North America > United States (0.46)
Europe > Austria (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Education (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Goworek, Roksana, Macmillan-Scott, Olivia, Özyiğit, Eda B.

Bridging Language Gaps: Advances in Cross-Lingual Information Retrieval with Multilingual LLMs

Cross-lingual information retrieval (CLIR) addresses the challenge of retrieving relevant documents written in languages different from that of the original query. Research in this area has typically framed the task as monolingual retrieval augmented by translation, treating retrieval methods and cross-lingual capabilities in isolation. Both monolingual and cross-lingual retrieval usually follow a pipeline of query expansion, ranking, re-ranking and, increasingly, question answering. Recent advances, however, have shifted from translation-based methods toward embedding-based approaches and leverage multilingual large language models (LLMs), for which aligning representations across languages remains a central challenge. The emergence of cross-lingual embeddings and multilingual LLMs has introduced a new paradigm, offering improved retrieval performance and enabling answer generation. This survey provides a comprehensive overview of developments from early translation-based methods to state-of-the-art embedding-driven and generative techniques. It presents a structured account of core CLIR components, evaluation practices, and available resources. Persistent challenges such as data imbalance and linguistic variation are identified, while promising directions are suggested for advancing equitable and effective cross-lingual information retrieval. By situating CLIR within the broader landscape of information retrieval and multilingual language processing, this work not only reviews current capabilities but also outlines future directions for building retrieval systems that are robust, inclusive, and adaptable.

information retrieval, large language model, machine learning, (19 more...)

2510.00908

Country:

Europe (1.00)
Asia > Middle East (1.00)
North America > United States > California (0.67)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Education (1.00)
Government (0.67)
Media > News (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Erase to Improve: Erasable Reinforcement Learning for Search-Augmented LLMs

Wang, Ziliang, An, Kang, Zheng, Xuhui, Qian, Faqiang, Zhang, Weikun, Ouyang, Cijun, Cai, Jialu, Wang, Yuhang, Wu, Yichao

While search-augmented large language models (LLMs) exhibit impressive capabilities, their reliability in complex multi-hop reasoning remains limited. This limitation arises from three fundamental challenges: decomposition errors, where tasks are incorrectly broken down; retrieval missing, where key evidence fails to be retrieved; and reasoning errors, where flawed logic propagates through the reasoning chain. A single failure in any of these stages can derail the final answer. We propose Erasable Reinforcement Learning (ERL), a novel framework that transforms fragile reasoning into a robust process. ERL explicitly identifies faulty steps, erases them, and regenerates reasoning in place, preventing defective logic from propagating through the reasoning chain. This targeted correction mechanism turns brittle reasoning into a more resilient process. Models trained with ERL, termed ESearch, achieve substantial improvements on HotpotQA, MuSiQue, 2Wiki, and Bamboogle, with the 3B model achieving +8.48% EM and +11.56% F1, and the 7B model achieving +5.38% EM and +7.22% F1 over previous state-of-the-art(SOTA) results. These findings suggest that erasable reinforcement learning provides a powerful paradigm shift for robust multi-step reasoning in LLMs.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

2510.00861

Country:

Europe (1.00)
Asia > India (0.68)
North America > United States > Massachusetts > Norfolk County (0.28)

Genre: Research Report > New Finding (0.87)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Government > Voting & Elections (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Kala, Divy, Khandelwal, Eshika, Tapaswi, Makarand

What You See is What You Ask: Evaluating Audio Descriptions

Audio descriptions (ADs) narrate important visual details in movies, enabling Blind and Low Vision (BLV) users to understand narratives and appreciate visual details. Existing works in automatic AD generation mostly focus on few-second trimmed clips, and evaluate them by comparing against a single ground-truth reference AD. However, writing ADs is inherently subjective. Through alignment and analysis of two independent AD tracks for the same movies, we quantify the subjectivity in when and whether to describe, and what and how to highlight. Thus, we show that working with trimmed clips is inadequate. We propose ADQA, a QA benchmark that evaluates ADs at the level of few-minute long, coherent video segments, testing whether they would help BLV users understand the story and appreciate visual details. ADQA features visual appreciation (VA) questions about visual facts and narrative understanding (NU) questions based on the plot. Through ADQA, we show that current AD generation methods lag far behind human-authored ADs. We conclude with several recommendations for future work and introduce a public leaderboard for benchmarking.

large language model, machine learning, natural language, (20 more...)

2510.00808

Genre: Research Report > New Finding (0.46)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Choi, Woongjib, Lee, Sangmin, Lim, Hyungseob, Kang, Hong-Goo

UniverSR: Unified and Versatile Audio Super-Resolution via Vocoder-Free Flow Matching

In this paper, we present a vocoder-free framework for audio super-resolution that employs a flow matching generative model to capture the conditional distribution of complex-valued spectral coefficients. Unlike conventional two-stage diffusion-based approaches that predict a mel-spectrogram and then rely on a pre-trained neural vocoder to synthesize waveforms, our method directly reconstructs waveforms via the inverse Short-Time Fourier Transform (iSTFT), thereby eliminating the dependence on a separate vocoder. This design not only simplifies end-to-end optimization but also overcomes a critical bottleneck of two-stage pipelines, where the final audio quality is fundamentally constrained by vocoder performance. Experiments show that our model consistently produces high-fidelity 48 kHz audio across diverse upsampling factors, achieving state-of-the-art performance on both speech and general audio datasets.

artificial intelligence, dataset, machine learning, (17 more...)

2510.00771

Genre: Research Report (0.50)

Industry:

Media > Music (0.46)
Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct

Zheng, Haoyang, Liu, Xinyang, Kong, Cindy Xiangrui, Jiang, Nan, Hu, Zheyuan, Luo, Weijian, Deng, Wei, Lin, Guang

Fast and high-quality language generation is the holy grail that people pursue in the age of AI. In this work, we introduce Discrete Diffusion Divergence Instruct (DiDi-Instruct), a training-based method that initializes from a pre-trained (masked) discrete diffusion language model (dLLM) and distills a few-step student for fast generation. The resulting DiDi-Instruct model achieves comparable or superior performance to its dLLM teacher and the GPT-2 baseline while enabling up to 64$\times$ acceleration. The theoretical foundation of DiDi-Instruct is a novel framework based on integral KL-divergence minimization, which yields a practical training algorithm. We further introduce grouped reward normalization, intermediate-state matching, and the reward-guided ancestral sampler that significantly improve training stability, model coverage, and inference quality. On OpenWebText, DiDi-Instruct achieves perplexity from 62.2 (8 NFEs) to 18.4 (128 NFEs), which outperforms prior accelerated dLLMs and GPT-2 baseline. These gains come with a negligible entropy loss (around $1\%$) and reduce additional training wall-clock time by more than $20\times$ compared to competing dLLM distillation methods. We further validate the robustness and effectiveness of DiDi-Instruct through extensive ablation studies, model scaling, and the generation of discrete protein sequences. In conclusion, DiDi-Instruct is an efficient yet effective distillation method, enabling language generation in the blink of an eye. We will release both code and models at github.com/haoyangzheng-ai/didi-instruct.

didi-instruct, large language model, machine learning, (14 more...)

2509.25035

Country:

North America > United States (1.00)
Europe (1.00)
Asia > Middle East > Syria (1.00)
Asia > Russia (0.93)

Genre:

Research Report > New Finding (1.00)
Personal > Interview (0.92)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
Information Technology (1.00)
(13 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Fuoli, Matteo, Huang, Weihang, Littlemore, Jeannette, Turner, Sarah, Wilding, Ellen

Metaphor identification using large language models: A comparison of RAG, prompt engineering, and fine-tuning

Metaphor is a pervasive feature of discourse and a powerful lens for examining cognition, emotion, and ideology. Large-scale analysis, however, has been constrained by the need for manual annotation due to the context-sensitive nature of metaphor. This study investigates the potential of large language models (LLMs) to automate metaphor identification in full texts. We compare three methods: (i) retrieval-augmented generation (RAG), where the model is provided with a codebook and instructed to annotate texts based on its rules and examples; (ii) prompt engineering, where we design task-specific verbal instructions; and (iii) fine-tuning, where the model is trained on hand-coded texts to optimize performance. Within prompt engineering, we test zero-shot, few-shot, and chain-of-thought strategies. Our results show that state-of-the-art closed-source LLMs can achieve high accuracy, with fine-tuning yielding a median F1 score of 0.79. A comparison of human and LLM outputs reveals that most discrepancies are systematic, reflecting well-known grey areas and conceptual challenges in metaphor theory. We propose that LLMs can be used to at least partly automate metaphor identification and can serve as a testbed for developing and refining metaphor identification protocols and the theory that underpins them.

large language model, machine learning, natural language, (17 more...)

2509.24866

Country: Europe (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Education (0.93)
Media > Film (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Beyond Content: How Grammatical Gender Shapes Visual Representation in Text-to-Image Models

Saeed, Muhammed, Raza, Shaina, Vayani, Ashmal, Abdul-Mageed, Muhammad, Emami, Ali, Shehata, Shady

Research on bias in Text-to-Image (T2I) models has primarily focused on demographic representation and stereotypical attributes, overlooking a fundamental question: how does grammatical gender influence visual representation across languages? We introduce a cross-linguistic benchmark examining words where grammatical gender contradicts stereotypical gender associations (e.g., ``une sentinelle'' - grammatically feminine in French but referring to the stereotypically masculine concept ``guard''). Our dataset spans five gendered languages (French, Spanish, German, Italian, Russian) and two gender-neutral control languages (English, Chinese), comprising 800 unique prompts that generated 28,800 images across three state-of-the-art T2I models. Our analysis reveals that grammatical gender dramatically influences image generation: masculine grammatical markers increase male representation to 73% on average (compared to 22% with gender-neutral English), while feminine grammatical markers increase female representation to 38% (compared to 28% in English). These effects vary systematically by language resource availability and model architecture, with high-resource languages showing stronger effects. Our findings establish that language structure itself, not just content, shapes AI-generated visual outputs, introducing a new dimension for understanding bias and fairness in multilingual, multimodal systems.

large language model, machine learning, natural language, (18 more...)

2508.03199

Country:

North America > United States (0.67)
Asia (0.67)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.49)

What if Othello-Playing Language Models Could See?

Chen, Xinyi, Yuan, Yifei, Li, Jiaang, Belongie, Serge, de Rijke, Maarten, Søgaard, Anders

Language models are often said to face a symbol grounding problem. While some have argued the problem can be solved without resort to other modalities, many have speculated that grounded learning is more efficient. We explore this question in Othello, a simplified, rule-based world that offers a controlled and interpretable testbed for studying world understanding. Building on prior work, we introduce VISOTHELLO, a multi-modal model trained jointly on move sequences and board images. Using the Othello rule understanding task, we examine whether multi-modal learning provides advantages over text-only approaches. We further evaluate robustness under semantically irrelevant perturbations and analyze the consistency of cross-modal alignment. Our results suggest that multi-modal training not only improves performance and robustness but also promotes convergence toward shared internal representations across different model architectures.

artificial intelligence, machine learning, natural language, (17 more...)

2507.1452

Country: Europe (0.46)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment > Games (1.00)
Media > Theater (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Salles, Marcel Mateos, Goyal, Praney, Sekhsaria, Pradyut, Huang, Hai, Balestriero, Randall

LoRA Users Beware: A Few Spurious Tokens Can Manipulate Your Finetuned Model

Large Language Models (LLMs) are commonly finetuned for a variety of use cases and domains. A common approach is to leverage Low-Rank Adaptation (LoRA) -- known to provide strong performance at low resource costs. In this study, we demonstrate that LoRA actually opens the door to short-cut vulnerabilities -- and the more resource efficient is the LoRA setup, the more vulnerable will be the finetuned model to aggressive attacks. To measure that vulnerability, we introduce Seamless Spurious Token Injection (SSTI), where we find that LoRA exclusively focuses on even just a single token that is spuriously correlated with downstream labels. In short, injection of that spurious token during finetuning ensure that the model's prediction at test-time can be manipulated on-demand. We conducted experiments across model families and datasets to evaluate the impact of SSTI during LoRA finetuning while providing possible mitigations. Our experiments conclude that none of the existing checkers and preprocessors can sanitize a dataset raising new concerns for data quality and AI safety.

large language model, machine learning, ssti, (18 more...)

2506.11402

Country: North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)