AITopics

Country:

Asia > South Korea > Seoul > Seoul (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Neural Information Processing SystemsDec-26-2025, 09:49:15 GMT

CLeAR: Continual Learning on Algorithmic Reasoning for Human-like Intelligence

Continual learning (CL) aims to incrementally learn multiple tasks that are presented sequentially. The significance of CL lies not only in the practical importance but also in studying the learning mechanisms of humans who are excellent continual learners. While most research on CL has been done on structured data such as images, there is a lack of research on CL for abstract logical concepts such as counting, sorting, and arithmetic, which humans learn gradually over time in the real world. In this work, for the first time, we introduce novel algorithmic reasoning (AR) methodology for continual tasks of abstract concepts: CLeAR. Our methodology proposes a one-to-many mapping of input distribution to a shared mapping space, which allows the alignment of various tasks of different dimensions and shared semantics. Our tasks of abstract logical concepts, in the form of formal language, can be classified into Chomsky hierarchies based on their difficulty. In this study, we conducted extensive experiments consisting of 15 tasks with various levels of Chomsky hierarchy, ranging from in-hierarchy to inter-hierarchy scenarios. CLeAR not only achieved near zero forgetting but also improved accuracy during following tasks, a phenomenon known as backward transfer, while previous CL methods designed for image classification drastically failed.

algorithmic reasoning, continual learning, name change, (3 more...)

Technology: Information Technology > Artificial Intelligence (0.77)

Neural Information Processing SystemsOct-9-2025, 02:10:13 GMT

986e0caad271b59417287737416d8594-Paper-Conference.pdf

artificial intelligence, machine learning, natural language, (17 more...)

Country:

Asia > South Korea > Seoul > Seoul (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)

arXiv.org Artificial IntelligenceAug-5-2025

StackTrans: From Large Language Model to Large Pushdown Automata Model

Zhang, Kechi, Li, Ge, Li, Jia, Zhang, Huangzhao, Dong, Yihong, Li, Jia, Xu, Jingjing, Jin, Zhi

The Transformer architecture has emerged as a landmark advancement within the broad field of artificial intelligence, effectively catalyzing the advent of large language models (LLMs). However, despite its remarkable capabilities and the substantial progress it has facilitated, the Transformer architecture still has some limitations. One such intrinsic limitation is its inability to effectively capture the Chomsky hierarchy, such as regular expressions or deterministic context-free grammars. Drawing inspiration from pushdown automata, which efficiently resolve deterministic context-free grammars using stacks, we propose StackTrans to address the aforementioned issue within LLMs. Unlike previous approaches that modify the attention computation, StackTrans explicitly incorporates hidden state stacks between Transformer layers. This design maintains compatibility with existing frameworks like flash-attention. Specifically, our design features stack operations -- such as pushing and popping hidden states -- that are differentiable and can be learned in an end-to-end manner. Our comprehensive evaluation spans benchmarks for both Chomsky hierarchies and large-scale natural languages. Across these diverse tasks, StackTrans consistently outperforms standard Transformer models and other baselines. We have successfully scaled StackTrans up from 360M to 7B parameters. In particular, our from-scratch pretrained model StackTrans-360M outperforms several larger open-source LLMs with 2-3x more parameters, showcasing its superior efficiency and reasoning capability.

large language model, machine learning, natural language, (17 more...)

2507.15343

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Goodale, Michael, Mascarenhas, Salvador, Lakretz, Yair

Meta-Learning Neural Mechanisms rather than Bayesian Priors

arXiv.org Artificial IntelligenceMar-20-2025

Children acquire language despite being exposed to several orders of magnitude less data than large language models require. Meta-learning has been proposed as a way to integrate human-like learning biases into neural-network architectures, combining both the structured generalizations of symbolic models with the scalability of neural-network models. But what does meta-learning exactly imbue the model with? We investigate the meta-learning of formal languages and find that, contrary to previous claims, meta-trained models are not learning simplicity-based priors when meta-trained on datasets organised around simplicity. Rather, we find evidence that meta-training imprints neural mechanisms (such as counters) into the model, which function like cognitive primitives for the network on downstream tasks. Most surprisingly, we find that meta-training on a single formal language can provide as much improvement to a model as meta-training on 5000 different formal languages, provided that the formal language incentivizes the learning of useful neural mechanisms. Taken together, our findings provide practical implications for efficient meta-learning paradigms and new theoretical insights into linking symbolic theories and neural mechanisms.

artificial intelligence, machine learning, mechanism, (19 more...)

2503.16048

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Neural Information Processing SystemsJan-19-2025, 16:22:58 GMT

CLeAR: Continual Learning on Algorithmic Reasoning for Human-like Intelligence

Continual learning (CL) aims to incrementally learn multiple tasks that are presented sequentially. The significance of CL lies not only in the practical importance but also in studying the learning mechanisms of humans who are excellent continual learners. While most research on CL has been done on structured data such as images, there is a lack of research on CL for abstract logical concepts such as counting, sorting, and arithmetic, which humans learn gradually over time in the real world. In this work, for the first time, we introduce novel algorithmic reasoning (AR) methodology for continual tasks of abstract concepts: CLeAR. Our methodology proposes a one-to-many mapping of input distribution to a shared mapping space, which allows the alignment of various tasks of different dimensions and shared semantics.

algorithmic reasoning, continual learning, human-like intelligence, (1 more...)

Technology: Information Technology > Artificial Intelligence (0.40)

Young, Nathan, Witbrock, Michael

Transformers As Approximations of Solomonoff Induction

arXiv.org Artificial IntelligenceAug-21-2024

Solomonoff Induction is an optimal-in-the-limit unbounded algorithm for sequence prediction, representing a Bayesian mixture of every computable probability distribution and performing close to optimally in predicting any computable sequence. Being an optimal form of computational sequence prediction, it seems plausible that it may be used as a model against which other methods of sequence prediction might be compared. We put forth and explore the hypothesis that Transformer models - the basis of Large Language Models - approximate Solomonoff Induction better than any other extant sequence prediction method. We explore evidence for and against this hypothesis, give alternate hypotheses that take this evidence into account, and outline next steps for modelling Transformers and other kinds of AI in this way.

hypothesis, solind, transformer, (11 more...)

2408.12065

Country: Oceania > New Zealand > North Island > Auckland Region > Auckland (0.05)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)

arXiv.org Artificial IntelligenceFeb-28-2023

Neural Networks and the Chomsky Hierarchy

Delétang, Grégoire, Ruoss, Anian, Grau-Moya, Jordi, Genewein, Tim, Wenliang, Li Kevin, Catt, Elliot, Cundy, Chris, Hutter, Marcus, Legg, Shane, Veness, Joel, Ortega, Pedro A.

Reliable generalization lies at the heart of safe ML and AI. However, understanding when and how neural networks generalize remains one of the most important unsolved problems in the field. In this work, we conduct an extensive empirical study (20'910 models, 15 tasks) to investigate whether insights from the theory of computation can predict the limits of neural network generalization in practice. We demonstrate that grouping tasks according to the Chomsky hierarchy allows us to forecast whether certain architectures will be able to generalize to out-of-distribution inputs. This includes negative results where even extensive amounts of data and training time never lead to any non-trivial generalization, despite models having sufficient capacity to fit the training data perfectly. Our results show that, for our subset of tasks, RNNs and Transformers fail to generalize on non-regular tasks, LSTMs can solve regular and counter-language tasks, and only networks augmented with structured memory (such as a stack or memory tape) can successfully generalize on context-free and context-sensitive tasks.

artificial intelligence, machine learning, sequence, (18 more...)

2207.02098

Country:

Africa > Mali (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Maryland (0.04)

Genre: Research Report > New Finding (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Hou, Kaiwen, Rabusseau, Guillaume

Spectral Regularization: an Inductive Bias for Sequence Modeling

arXiv.org Artificial IntelligenceNov-4-2022

Various forms of regularization in learning tasks strive for different notions of simplicity. This paper presents a spectral regularization technique, which attaches a unique inductive bias to sequence modeling based on an intuitive concept of simplicity defined in the Chomsky hierarchy. From fundamental connections between Hankel matrices and regular grammars, we propose to use the trace norm of the Hankel matrix, the tightest convex relaxation of its rank, as the spectral regularizer. To cope with the fact that the Hankel matrix is bi-infinite, we propose an unbiased stochastic estimator for its trace norm. Ultimately, we demonstrate experimental results on Tomita grammars, which exhibit the potential benefits of spectral regularization and validate the proposed stochastic estimator.

artificial intelligence, hankel matrix, machine learning, (12 more...)

2211.02255

Country:

North America > United States > California > Los Angeles County > Santa Monica (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

#artificialintelligenceJul-21-2022, 09:20:25 GMT

DeepMind's Latest Study on Artificial Intelligence Explains How Neural Network Generalize and Rise in the Chomsky Hierarchy

A DeepMind research group conducted a comprehensive generalization study on neural network architectures in the paper'Neural Networks and the Chomsky Hierarchy', which investigates whether insights from the theory of computation and the Chomsky hierarchy can predict the actual limitations of neural network generalization. While we understand that developing powerful machine learning models requires an accurate generalization to out-of-distribution inputs. However, how and why neural networks can generalize on algorithmic sequence prediction tasks is unclear. The research group performed a thorough generalization study on more than 2000 individual models spread across 16 tasks of cutting-edge neural network architectures and memory-augmented neural networks on a battery of sequence-prediction tasks encompassing all tiers of the Chomsky hierarchy that can be evaluated practically with finite-time computation. They demonstrated that more significant quantities of training data do not permit generalization on tasks further up in the hierarchy for various architectures, possibly suggesting rigid restrictions for scaling rules.

chomsky hierarchy, generalization, neural network generalize and rise, (7 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.77)