AITopics | McAllester, David

Collaborating Authors

McAllester, David

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PENCIL: Long Thoughts with Short Memory

Yang, Chenxiao, Srebro, Nathan, McAllester, David, Li, Zhiyuan

arXiv.org Artificial IntelligenceMar-18-2025

While recent works (e.g. o1, DeepSeek R1) have demonstrated great promise of using long Chain-of-Thought (CoT) to improve reasoning capabilities of language models, scaling it up during test-time is challenging due to inefficient memory usage -- intermediate computations accumulate indefinitely in context even no longer needed for future thoughts. We propose PENCIL, which incorporates a reduction mechanism into the autoregressive generation process, allowing the model to recursively clean up intermediate thoughts based on patterns learned from training. With this reduction mechanism, PENCIL significantly reduces the maximal context length required during generation, and thus can generate longer thoughts with limited memory, solving larger-scale problems given more thinking time. For example, we demonstrate PENCIL achieves 97\% accuracy on the challenging Einstein's puzzle -- a task even large models like GPT-4 struggle with -- using only a small 25M-parameter transformer with 2048 context length. Theoretically, we prove PENCIL can perform universal space-efficient computation by simulating Turing machines with optimal time and space complexity, and thus can solve arbitrary computational tasks that would otherwise be intractable given context window constraints.

category, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2503.14337

Country: North America > United States (0.14)

Genre: Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On the Mathematics of Diffusion Models

McAllester, David

arXiv.org Artificial IntelligenceMar-5-2023

This paper gives direct derivations of the differential equations and likelihood formulas of diffusion models assuming only knowledge of Gaussian distributions. A VAE analysis derives both forward and backward stochastic differential equations (SDEs) as well as non-variational integral expressions for likelihood formulas. A score-matching analysis derives the reverse diffusion ordinary differential equation (ODE) and a family of reverse-diffusion SDEs parameterized by noise level. The paper presents the mathematics directly with attributions saved for a final section.

artificial intelligence, equation, reverse-diffusion process, (14 more...)

arXiv.org Artificial Intelligence

2301.11108

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

MathZero, The Classification Problem, and Set-Theoretic Type Theory

McAllester, David

arXiv.org Artificial IntelligenceMay-18-2020

AlphaZero learns to play go, chess and shogi at a superhuman level through self play given only the rules of the game. This raises the question of whether a similar thing could be done for mathematics -- a MathZero. MathZero would require a formal foundation and an objective. We propose the foundation of set-theoretic dependent type theory and an objective defined in terms of the classification problem -- the problem of classifying concept instances up to isomorphism. The natural numbers arise as the solution to the classification problem for finite sets. Here we generalize classical Bourbaki set-theoretic isomorphism to set-theoretic dependent type theory. To our knowledge we give the first isomorphism inference rules for set-theoretic dependent type theory with propositional set-theoretic equality. The presentation is intended to be accessible to mathematicians with no prior exposure to type theory.

artificial intelligence, expression, logic programming, (18 more...)

arXiv.org Artificial Intelligence

2005.05512

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment (0.34)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)

Add feedback

Formal Limitations on the Measurement of Mutual Information

McAllester, David, Stratos, Karl

arXiv.org Machine LearningNov-13-2018

Motivate by applications to unsupervised learning, we consider the problem of measuring mutual information. Recent analysis has shown that naive kNN estimators of mutual information have serious statistical limitations motivating more refined methods. In this paper we prove that serious statistical limitations are inherent to any measurement method. More specifically, we show that any distribution-free high-confidence lower bound on mutual information cannot be larger than $O(\ln N)$ where $N$ is the size of the data sample. We also analyze the Donsker-Varadhan lower bound on KL divergence in particular and show that, when simple statistical considerations are taken into account, this bound can never produce a high-confidence value larger than $\ln N$. While large high-confidence lower bounds are impossible, in practice one can use estimators without formal guarantees. We suggest expressing mutual information as a difference of entropies and using cross-entropy as an entropy estimator. We observe that, although cross-entropy is only an upper bound on entropy, cross-entropy estimates converge to the true cross-entropy at the rate of $1/\sqrt{N}$.

artificial intelligence, machine learning, mutual information, (15 more...)

arXiv.org Machine Learning

1811.04251

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Information Theoretic Co-Training

McAllester, David

arXiv.org Machine LearningFeb-21-2018

This paper introduces an information theoretic co-training objective for unsupervised learning. We consider the problem of predicting the future. Rather than predict future sensations (image pixels or sound waves) we predict "hypotheses" to be confirmed by future sensations. More formally, we assume a population distribution on pairs $(x,y)$ where we can think of $x$ as a past sensation and $y$ as a future sensation. We train both a predictor model $P_\Phi(z|x)$ and a confirmation model $P_\Psi(z|y)$ where we view $z$ as hypotheses (when predicted) or facts (when confirmed). For a population distribution on pairs $(x,y)$ we focus on the problem of measuring the mutual information between $x$ and $y$. By the data processing inequality this mutual information is at least as large as the mutual information between $x$ and $z$ under the distribution on triples $(x,z,y)$ defined by the confirmation model $P_\Psi(z|y)$. The information theoretic training objective for $P_\Phi(z|x)$ and $P_\Psi(z|y)$ can be viewed as a form of co-training where we want the prediction from $x$ to match the confirmation from $y$.

artificial intelligence, information, neural network, (14 more...)

arXiv.org Machine Learning

1802.07572

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Add feedback

Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (2008)

McAllester, David, Myllymaki, Petri

arXiv.org Artificial IntelligenceAug-28-2014

This is the Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence, which was held in Helsinki, Finland, July 9 - 12 2008.

artificial intelligence, proceedings, twenty-fourth conference

arXiv.org Artificial Intelligence

1208.5154

Country: Europe > Finland > Uusimaa > Helsinki (0.24)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.60)

Add feedback