AITopics

Fine-tuning pretrained LLMs has been shown to be an effective strategy for reaching state-of-the-art performance on specific tasks like machine translation. However, this process of adaptation often implies sacrificing general-purpose capabilities, such as conversational reasoning and instruction-following, hampering the utility of the system in real-world applications that require a mixture of skills. In this paper, we introduce Tower+, a suite of models designed to deliver strong performance across both translation and multilingual general-purpose text capabilities. We achieve a Pareto frontier between translation specialization and multilingual general-purpose capabilities by introducing a novel training recipe that builds on Tower (Alves et al., 2024), comprising continued pretraining, supervised fine-tuning, preference optimization, and reinforcement learning with verifiable rewards. At each stage of training, we carefully generate and curate data to strengthen performance on translation as well as general-purpose tasks involving code generation, mathematics problem solving, and general instruction-following. We develop models at multiple scales: 2B, 9B, and 72B. Our smaller models often outperform larger general-purpose open-weight and proprietary LLMs (e.g., Llama 3.3 70B, GPT-4o). Our largest model delivers best-in-class translation performance for high-resource languages and top results in multilingual Arena Hard evaluations and in IF-MT, a benchmark we introduce for evaluating both translation and instruction-following. Our findings highlight that it is possible to rival frontier models in general capabilities, while optimizing for specific business domains, such as translation and localization.

large language model, machine learning, translation, (20 more...)

2506.1708

Country:

Europe > Portugal > Lisbon > Lisbon (0.14)
South America (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(7 more...)

Genre: Research Report > New Finding (0.34)

Industry: Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

DynScaling: Efficient Verifier-free Inference Scaling via Dynamic and Integrated Sampling

Wang, Fei, Wan, Xingchen, Sun, Ruoxi, Chen, Jiefeng, Arık, Sercan Ö.

Inference-time scaling has proven effective in boosting large language model (LLM) performance through increased test-time computation. Yet, its practical application is often hindered by reliance on external verifiers or a lack of optimization for realistic computational constraints. We propose DynScaling, which addresses these limitations through two primary innovations: an integrated parallel-sequential sampling strategy and a bandit-based dynamic budget allocation framework. The integrated sampling strategy unifies parallel and sequential sampling by constructing synthetic sequential reasoning chains from initially independent parallel responses, promoting diverse and coherent reasoning trajectories. The dynamic budget allocation framework formulates the allocation of computational resources as a multi-armed bandit problem, adaptively distributing the inference budget across queries based on the uncertainty of previously sampled responses, thereby maximizing computational efficiency. By combining these components, DynScaling effectively improves LLM performance under practical resource constraints without the need for external verifiers. Experimental results demonstrate that DynScaling consistently surpasses existing verifier-free inference scaling baselines in both task performance and computational cost.

data mining, large language model, machine learning, (16 more...)

2506.16043

Country:

North America > United States > California (0.14)
Asia > Middle East > Jordan (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

On Path to Multimodal Historical Reasoning: HistBench and HistAgent

Qiu, Jiahao, Xiao, Fulian, Wang, Yimin, Mao, Yuchen, Chen, Yijia, Juan, Xinzhe, Zhang, Shu, Wang, Siran, Qi, Xuan, Zhang, Tongcheng, Yao, Zixin, Guo, Jiacheng, Lu, Yifu, Argon, Charles, Cui, Jundi, Chen, Daixin, Zhou, Junran, Zhou, Shuyao, Zhou, Zhanpeng, Yang, Ling, Liu, Shilong, Wang, Hongru, Huang, Kaixuan, Jiang, Xun, Cao, Yuming, Chen, Yue, Chen, Yunfei, Chen, Zhengyi, Dai, Ruowei, Deng, Mengqiu, Fu, Jiye, Gu, Yunting, Guan, Zijie, Huang, Zirui, Ji, Xiaoyan, Jiang, Yumeng, Kong, Delong, Li, Haolong, Li, Jiaqi, Li, Ruipeng, Li, Tianze, Li, Zhuoran, Lian, Haixia, Lin, Mengyue, Liu, Xudong, Lu, Jiayi, Lu, Jinghan, Luo, Wanyu, Luo, Ziyue, Pu, Zihao, Qiao, Zhi, Ren, Ruihuan, Wan, Liang, Wang, Ruixiang, Wang, Tianhui, Wang, Yang, Wang, Zeyu, Wang, Zihua, Wu, Yujia, Wu, Zhaoyi, Xin, Hao, Xing, Weiao, Xiong, Ruojun, Xu, Weijie, Shu, Yao, Xiao, Yao, Yang, Xiaorui, Yang, Yuchen, Yi, Nan, Yu, Jiadong, Yu, Yangyuxuan, Zeng, Huiting, Zhang, Danni, Zhang, Yunjie, Zhang, Zhaoyu, Zhang, Zhiheng, Zheng, Xiaofeng, Zhou, Peirong, Zhong, Linyan, Zong, Xiaoyin, Zhao, Ying, Chen, Zhenxin, Ding, Lin, Gao, Xiaoyu, Gong, Bingbing, Li, Yichao, Liao, Yang, Ma, Guang, Ma, Tianyuan, Sun, Xinrui, Wang, Tianyi, Xia, Han, Xian, Ruobing, Ye, Gen, Yu, Tengfei, Zhang, Wentao, Wang, Yuxi, Gao, Xi, Wang, Mengdi

Recent advances in large language models (LLMs) have led to remarkable progress across domains, yet their capabilities in the humanities, particularly history, remain underexplored. Historical reasoning poses unique challenges for AI, involving multimodal source interpretation, temporal inference, and cross-linguistic analysis. While general-purpose agents perform well on many existing benchmarks, they lack the domain-specific expertise required to engage with historical materials and questions. To address this gap, we introduce HistBench, a new benchmark of 414 high-quality questions designed to evaluate AI's capacity for historical reasoning and authored by more than 40 expert contributors. The tasks span a wide range of historical problems-from factual retrieval based on primary sources to interpretive analysis of manuscripts and images, to interdisciplinary challenges involving archaeology, linguistics, or cultural history. Furthermore, the benchmark dataset spans 29 ancient and modern languages and covers a wide range of historical periods and world regions. Finding the poor performance of LLMs and other agents on HistBench, we further present HistAgent, a history-specific agent equipped with carefully designed tools for OCR, translation, archival search, and image understanding in History. On HistBench, HistAgent based on GPT-4o achieves an accuracy of 27.54% pass@1 and 36.47% pass@2, significantly outperforming LLMs with online search and generalist agents, including GPT-4o (18.60%), DeepSeek-R1(14.49%) and Open Deep Research-smolagents(20.29% pass@1 and 25.12% pass@2). These results highlight the limitations of existing LLMs and generalist agents and demonstrate the advantages of HistAgent for historical reasoning.

large language model, machine learning, natural language, (16 more...)

2505.20246

Country:

South America (0.04)
North America > United States > Michigan (0.04)
North America > Central America (0.04)
(9 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Education (1.00)
Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Large Language Models as Psychological Simulators: A Methodological Guide

Lin, Zhicheng

Large language models (LLMs) offer emerging opportunities for psychological and behavioral research, but methodological guidance is lacking. This article provides a framework for using LLMs as psychological simulators across two primary applications: simulating roles and personas to explore diverse contexts, and serving as computational models to investigate cognitive processes. For simulation, we present methods for developing psychologically grounded personas that move beyond demographic categories, with strategies for validation against human data and use cases ranging from studying inaccessible populations to prototyping research instruments. For cognitive modeling, we synthesize emerging approaches for probing internal representations, methodological advances in causal interventions, and strategies for relating model behavior to human cognition. We address overarching challenges including prompt sensitivity, temporal limitations from training data cutoffs, and ethical considerations that extend beyond traditional human subjects review. Throughout, we emphasize the need for transparency about model capabilities and constraints. Together, this framework integrates emerging empirical evidence about LLM performance--including systematic biases, cultural limitations, and prompt brittleness--to help researchers wrangle these challenges and leverage the unique capabilities of LLMs in psychological research.

large language model, machine learning, natural language, (20 more...)

2506.16702

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(10 more...)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Industry:

Education (0.93)
Health & Medicine > Therapeutic Area > Neurology (0.46)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

van der Linden, Putri A., Timans, Alexander, Bekkers, Erik J.

CP$^2$: Leveraging Geometry for Conformal Prediction via Canonicalization

arXiv.org Machine LearningJun-23-2025

We study the problem of conformal prediction (CP) under geometric data shifts, where data samples are susceptible to transformations such as rotations or flips. While CP endows prediction models with post-hoc uncertainty quantification and formal coverage guarantees, their practicality breaks under distribution shifts that deteriorate model performance. To address this issue, we propose integrating geometric information--such as geometric pose--into the conformal procedure to reinstate its guarantees and ensure robustness under geometric shifts. In particular, we explore recent advancements on pose canonicalization as a suitable information extractor for this purpose. Evaluating the combined approach across discrete and continuous shifts and against equivariant and augmentation-based baselines, we find that integrating geometric information with CP yields a principled way to address geometric shifts while maintaining broad applicability to black-box predictors.

artificial intelligence, machine learning, prediction, (14 more...)

2506.16189

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine (0.46)
Transportation (0.34)

Technology:

Information Technology > Data Science (0.93)
Information Technology > Sensing and Signal Processing > Image Processing (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Graham, Robert, Stevinson, Edward, Richter, Leo, Chia, Alexander, Miller, Joseph, Bloom, Joseph Isaac

ContextBench: Modifying Contexts for Targeted Latent Activation

arXiv.org Machine LearningJun-23-2025

Identifying inputs that trigger specific behaviours or latent features in language models could have a wide range of safety use cases. We investigate a class of methods capable of generating targeted, linguistically fluent inputs that activate specific latent features or elicit model behaviours. We formalise this approach as context modification and present ContextBench -- a benchmark with tasks assessing core method capabilities and potential safety applications. Our evaluation framework measures both elicitation strength (activation of latent features or behaviours) and linguistic fluency, highlighting how current state-of-the-art methods struggle to balance these objectives. We enhance Evolutionary Prompt Optimisation (EPO) with LLM-assistance and diffusion model inpainting, and demonstrate that these variants achieve state-of-the-art performance in balancing elicitation effectiveness and fluency.

large language model, machine learning, natural language, (18 more...)

2506.15735

Country:

North America > United States (0.14)
Europe > Ukraine (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology (0.46)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)

Usevich, Konstantin, Dérand, Clara, Borsoi, Ricardo, Clausel, Marianne

Identifiability of Deep Polynomial Neural Networks

arXiv.org Machine LearningJun-23-2025

Polynomial Neural Networks (PNNs) possess a rich algebraic and geometric structure. However, their identifiability -- a key property for ensuring interpretability -- remains poorly understood. In this work, we present a comprehensive analysis of the identifiability of deep PNNs, including architectures with and without bias terms. Our results reveal an intricate interplay between activation degrees and layer widths in achieving identifiability. As special cases, we show that architectures with non-increasing layer widths are generically identifiable under mild conditions, while encoder-decoder networks are identifiable when the decoder widths do not grow too rapidly. Our proofs are constructive and center on a connection between deep PNNs and low-rank tensor decompositions, and Kruskal-type uniqueness theorems. This yields both generic conditions determined by the architecture, and effective conditions that depend on the network's parameters. We also settle an open conjecture on the expected dimension of PNN's neurovarieties, and provide new bounds on the activation degrees required for it to reach its maximum.

artificial intelligence, identifiability, machine learning, (17 more...)

2506.17093

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > France (0.04)
Asia > Middle East > Israel (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

SlepNet: Spectral Subgraph Representation Learning for Neural Dynamics

Viswanath, Siddharth, Singh, Rahul, Zhang, Yanlei, Noah, J. Adam, Hirsch, Joy, Krishnaswamy, Smita

Graph neural networks have been useful in machine learning on graph-structured data, particularly for node classification and some types of graph classification tasks. However, they have had limited use in representing patterning of signals over graphs. Patterning of signals over graphs and in subgraphs carries important information in many domains including neuroscience. Neural signals are spatiotemporally patterned, high dimensional and difficult to decode. Graph signal processing and associated GCN models utilize the graph Fourier transform and are unable to efficiently represent spatially or spectrally localized signal patterning on graphs. Wavelet transforms have shown promise here, but offer non-canonical representations and cannot be tightly confined to subgraphs. Here we propose SlepNet, a novel GCN architecture that uses Slepian bases rather than graph Fourier harmonics. In SlepNet, the Slepian harmonics optimally concentrate signal energy on specifically relevant subgraphs that are automatically learned with a mask. Thus, they can produce canonical and highly resolved representations of neural activity, focusing energy of harmonics on areas of the brain which are activated. We evaluated SlepNet across three fMRI datasets, spanning cognitive and visual tasks, and two traffic dynamics datasets, comparing its performance against conventional GNNs and graph signal processing constructs. SlepNet outperforms the baselines in all datasets. Moreover, the extracted representations of signal patterns from SlepNet offers more resolution in distinguishing between similar patterns, and thus represent brain signaling transients as informative trajectories. Here we have shown that these extracted trajectory representations can be used for other downstream untrained tasks. Thus we establish that SlepNet is useful both for prediction and representation learning in spatiotemporal data.

artificial intelligence, machine learning, slepnet, (20 more...)

2506.16602

Country:

South America (0.28)
North America (0.28)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Stradi, Francesco Emanuele, Castiglioni, Matteo, Marchesi, Alberto, Gatti, Nicola, Kroer, Christian

No-Regret Learning Under Adversarial Resource Constraints: A Spending Plan Is All You Need!

arXiv.org Machine LearningJun-19-2025

We study online decision making problems under resource constraints, where both reward and cost functions are drawn from distributions that may change adversarially over time. We focus on two canonical settings: $(i)$ online resource allocation where rewards and costs are observed before action selection, and $(ii)$ online learning with resource constraints where they are observed after action selection, under full feedback or bandit feedback. It is well known that achieving sublinear regret in these settings is impossible when reward and cost distributions may change arbitrarily over time. To address this challenge, we analyze a framework in which the learner is guided by a spending plan--a sequence prescribing expected resource usage across rounds. We design general (primal-)dual methods that achieve sublinear regret with respect to baselines that follow the spending plan. Crucially, the performance of our algorithms improves when the spending plan ensures a well-balanced distribution of the budget across rounds. We additionally provide a robust variant of our methods to handle worst-case scenarios where the spending plan is highly imbalanced. To conclude, we study the regret of our algorithms when competing against benchmarks that deviate from the prescribed spending plan.

artificial intelligence, constraint-based reasoning, regret minimizer, (17 more...)

2506.13244

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.50)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)

Wu, Ruihan, Garov, Konstantin, Chaudhuri, Kamalika

Learning-Time Encoding Shapes Unlearning in LLMs

arXiv.org Artificial IntelligenceJun-19-2025

As large language models (LLMs) are increasingly deployed in the real world, the ability to ``unlearn'', or remove specific pieces of knowledge post hoc, has become essential for a variety of reasons ranging from privacy regulations to correcting outdated or harmful content. Prior work has proposed unlearning benchmarks and algorithms, and has typically assumed that the training process and the target model are fixed. In this work, we empirically investigate how learning-time choices in knowledge encoding impact the effectiveness of unlearning factual knowledge. Our experiments reveal two key findings: (1) learning with paraphrased descriptions improves unlearning performance and (2) unlearning individual piece of knowledge from a chunk of text is challenging. Our results suggest that learning-time knowledge encoding may play a central role in enabling reliable post-hoc unlearning.

large language model, machine learning, natural language, (19 more...)

2506.15076

Country:

Europe (0.28)
South America > Chile (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)