AITopics | vocab

Collaborating Authors

vocab

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Compact Proofs of Model Performance via Mechanistic Interpretability

Jason Gross,Rajashree Agrawal,Thomas Kwa,Euan Ong,Chun Hei Yip,Alex Gibson,Soufiane Noubir,Lawrence Chan

Neural Information Processing SystemsFeb-16-2026, 15:52:08 GMT

We propose using mechanistic interpretability – techniques for reverse engineering model weights into human-interpretable algorithms – to derive and compactly prove formal guarantees on model performance. We prototype this approach by formally proving accuracy lower bounds for a small transformer trained on Max-of-K, validating proof transferability across 151 random seeds and four values of K. We create 102 different computer-assisted proof strategies and assess their length and tightness of bound on each of our models. Using quantitative metrics, we find that shorter proofs seem to require and provide more mechanistic understanding. Moreover, we find that more faithful mechanistic understanding leads to tighter performance bounds. We confirm these connections by qualitatively examining a subset of our proofs. Finally, we identify compounding structureless errors as a key challenge for using mechanistic interpretability to generate compact proofs on model performance.

logic & formal reasoning, machine learning, natural language, (23 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
Europe > Poland > Lower Silesia Province > Wroclaw (0.04)
(3 more...)

Genre: Research Report > Experimental Study (0.45)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Provable Benefit of Curriculum in Transformer Tree-Reasoning Post-Training

Bu, Dake, Huang, Wei, Han, Andi, Nitanda, Atsushi, Wong, Hau-San, Zhang, Qingfu, Suzuki, Taiji

arXiv.org Artificial IntelligenceNov-25-2025

Recent curriculum techniques in the post-training stage of LLMs have been widely observed to outperform non-curriculum approaches in enhancing reasoning performance, yet a principled understanding of why and to what extent they work remains elusive. To address this gap, we develop a theoretical framework grounded in the intuition that progressively learning through manageable steps is more efficient than directly tackling a hard reasoning task, provided each stage stays within the model's effective competence. Under mild complexity conditions linking consecutive curriculum stages, we show that curriculum post-training avoids the exponential complexity bottleneck. To substantiate this result, drawing insights from the Chain-of-Thoughts (CoTs) solving mathematical problems such as Countdown and parity, we model CoT generation as a states-conditioned autoregressive reasoning tree, define a uniform-branching base model to capture pretrained behavior, and formalize curriculum stages as either depth-increasing (longer reasoning chains) or hint-decreasing (shorter prefixes) subtasks. Our analysis shows that, under outcome-only reward signals, reinforcement learning finetuning achieves high accuracy with polynomial sample complexity, whereas direct learning suffers from an exponential bottleneck. We further establish analogous guarantees for test-time scaling, where curriculum-aware querying reduces both reward oracle calls and sampling cost from exponential to polynomial order.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2511.07372

Genre: Research Report (1.00)

Industry: Education > Educational Setting (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)

Add feedback

Compact Proofs of Model Performance via Mechanistic Interpretability

Jason Gross,Rajashree Agrawal,Thomas Kwa,Euan Ong,Chun Hei Yip,Alex Gibson,Soufiane Noubir,Lawrence Chan

Neural Information Processing SystemsOct-10-2025, 09:38:10 GMT

query, sequence, vocab, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
Europe > Poland > Lower Silesia Province > Wroclaw (0.04)
(3 more...)

Genre: Research Report > Experimental Study (0.45)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Syntactic Learnability of Echo State Neural Language Models at Scale

Ueda, Ryo, Kuribayashi, Tatsuki, Kando, Shunsuke, Inui, Kentaro

arXiv.org Artificial IntelligenceMar-3-2025

What is a neural model with minimum architectural complexity that exhibits reasonable language learning capability? To explore such a simple but sufficient neural language model, we revisit a basic reservoir computing (RC) model, Echo State Network (ESN), a restricted class of simple Recurrent Neural Networks. Our experiments showed that ESN with a large hidden state is comparable or superior to Transformer in grammaticality judgment tasks when trained with about 100M words, suggesting that architectures as complex as that of Transformer may not always be necessary for syntactic learning.

esn, nll, validation nll, (14 more...)

arXiv.org Artificial Intelligence

2503.01724

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(6 more...)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Notes on the Mathematical Structure of GPT LLM Architectures

Becker-Kahn, Spencer

arXiv.org Artificial IntelligenceOct-25-2024

Introduction When considered from a purely mathematical point of view, the building and training of a large (transformer) language model (LLM) is the construction of a function - which can be taken to be a map from some euclidean space to another - that has certain interesting properties. And therefore, from the point of view of a mathematician, it may be frustrating to find that many key papers announcing significant new LLMs seem reluctant to simply spell out the details of the function that they have constructed in plain mathematical language or indeed even in complete pseudo-code (and the latter form of this complaint appears to be one of the motivations behind a recent article of Phuong and Hutter [1]). Here, we seek to give a relatively'pure' mathematical description of the architecture of a GPT-3-style LLM. There is then a separate process - the training of the model - in which a particular value θ Θ is selected using a training algorithm. We will draw attention to such parameters as we introduce them, as opposed to attempting to give a definition of Θ up front.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2410.1937

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

AutoIRT: Calibrating Item Response Theory Models with Automated Machine Learning

Sharpnack, James, Mulcaire, Phoebe, Bicknell, Klinton, LaFlair, Geoff, Yancey, Kevin

arXiv.org Artificial IntelligenceSep-13-2024

Item response theory (IRT) is a class of interpretable factor models that are widely used in computerized adaptive tests (CATs), such as language proficiency tests. Traditionally, these are fit using parametric mixed effects models on the probability of a test taker getting the correct answer to a test item (i.e., question). Neural net extensions of these models, such as BertIRT, require specialized architectures and parameter tuning. We propose a multistage fitting procedure that is compatible with out-of-the-box Automated Machine Learning (AutoML) tools. It is based on a Monte Carlo EM (MCEM) outer loop with a two stage inner loop, which trains a non-parametric AutoML grade model using item features followed by an item specific parametric model. This greatly accelerates the modeling workflow for scoring tests. We demonstrate its effectiveness by applying it to the Duolingo English Test, a high stakes, online English proficiency test. We show that the resulting model is typically more well calibrated, gets better predictive performance, and more accurate scores than existing methods (non-explanatory IRT models and explanatory IRT models like BERT-IRT). Along the way, we provide a brief survey of machine learning methods for calibration of item parameters for CATs.

autoirt, item parameter, test taker, (15 more...)

arXiv.org Artificial Intelligence

2409.08823

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(5 more...)

Genre:

Research Report (0.82)
Overview (0.68)

Industry: Education > Curriculum > Subject-Specific Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

PyMarian: Fast Neural Machine Translation and Evaluation in Python

Gowda, Thamme, Grundkiewicz, Roman, Rippeth, Elijah, Post, Matt, Junczys-Dowmunt, Marcin

arXiv.org Artificial IntelligenceAug-14-2024

The deep learning language of choice these days is Python; measured by factors such as available libraries and technical support, it is hard to beat. At the same time, software written in lower-level programming languages like C++ retain advantages in speed. We describe a Python interface to Marian NMT, a C++-based training and inference toolkit for sequence-to-sequence models, focusing on machine translation. This interface enables models trained with Marian to be connected to the rich, wide range of tools available in Python. A highlight of the interface is the ability to compute state-of-the-art COMET metrics from Python but using Marian's inference engine, with a speedup factor of up to 7.8$\times$ the existing implementations. We also briefly spotlight a number of other integrations, including Jupyter notebooks, connection with prebuilt models, and a web app interface provided with the package. PyMarian is available in PyPI via $\texttt{pip install pymarian}$.

implementation, machine translation, translation, (10 more...)

arXiv.org Artificial Intelligence

2408.11853

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Portugal > Lisbon > Lisbon (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
(5 more...)

Genre: Research Report (0.40)

Industry: Information Technology > Software (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

On the Off-Target Problem of Zero-Shot Multilingual Neural Machine Translation

Chen, Liang, Ma, Shuming, Zhang, Dongdong, Wei, Furu, Chang, Baobao

arXiv.org Artificial IntelligenceJun-1-2023

While multilingual neural machine translation has achieved great success, it suffers from the off-target issue, where the translation is in the wrong language. This problem is more pronounced on zero-shot translation tasks. In this work, we find that failing in encoding discriminative target language signal will lead to off-target and a closer lexical distance (i.e., KL-divergence) between two languages' vocabularies is related with a higher off-target rate. We also find that solely isolating the vocab of different languages in the decoder can alleviate the problem. Motivated by the findings, we propose Language Aware Vocabulary Sharing (LAVS), a simple and effective algorithm to construct the multilingual vocabulary, that greatly alleviates the off-target problem of the translation model by increasing the KL-divergence between languages. We conduct experiments on a multilingual machine translation benchmark in 11 languages. Experiments show that the off-target rate for 90 translation tasks is reduced from 29\% to 8\%, while the overall BLEU score is improved by an average of 1.9 points without extra training cost or sacrificing the supervised directions' performance. We release the code at https://github.com/PKUnlp-icler/Off-Target-MNMT for reproduction.

artificial intelligence, natural language, translation, (17 more...)

arXiv.org Artificial Intelligence

2305.1093

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Oceania > Samoa (0.05)
North America > Dominican Republic (0.04)
(9 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Nonparametric Decoding for Generative Retrieval

Lee, Hyunji, Kim, Jaeyoung, Chang, Hoyeon, Oh, Hanseok, Yang, Sohee, Karpukhin, Vlad, Lu, Yi, Seo, Minjoon

arXiv.org Artificial IntelligenceMay-28-2023

The generative retrieval model depends solely on the information encoded in its model parameters without external memory, its information capacity is limited and fixed. To overcome the limitation, we propose Nonparametric Decoding (Np Decoding) which can be applied to existing generative retrieval models. Np Decoding uses nonparametric contextualized vocab embeddings (external memory) rather than vanilla vocab embeddings as decoder vocab embeddings. By leveraging the contextualized vocab embeddings, the generative retrieval model is able to utilize both the parametric and nonparametric space. Evaluation over 9 datasets (8 single-hop and 1 multi-hop) in the document retrieval task shows that applying Np Decoding to generative retrieval models significantly improves the performance. We also show that Np Decoding is data- and parameter-efficient, and shows high performance in the zero-shot setting.

information retrieval, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2210.02068

Country:

Europe > United Kingdom > England > Lincolnshire (0.15)
North America > United States (0.14)
Africa > South Africa > Western Cape > Cape Town (0.05)
(4 more...)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Sports (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.93)

Add feedback

Understanding Text Classification Data and Models Using Aggregated Input Salience

Ebert, Sebastian, Jakobovits, Alice Shoshana, Filippova, Katja

arXiv.org Artificial IntelligenceJan-11-2023

Realizing when a model is right for a wrong reason is not trivial and requires a significant effort by model developers. In some cases an input salience method, which highlights the most important parts of the input, may reveal problematic reasoning. But scrutinizing highlights over many data instances is tedious and often infeasible. Furthermore, analyzing examples in isolation does not reveal general patterns in the data or in the model's behavior. In this paper we aim to address these issues and go from understanding single examples to understanding entire datasets and models. The methodology we propose is based on aggregated salience maps, to which we apply clustering, nearest neighbor search and visualizations. Using this methodology we address multiple distinct but common model developer needs by showing how problematic data and model behavior can be identified and explained -- a necessary first step for improving the model.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2211.05485

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom (0.14)
(15 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback