AITopics | Denain, Jean-Stanislas

Collaborating Authors

Denain, Jean-Stanislas

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI

Glazer, Elliot, Erdil, Ege, Besiroglu, Tamay, Chicharro, Diego, Chen, Evan, Gunning, Alex, Olsson, Caroline Falkman, Denain, Jean-Stanislas, Ho, Anson, Santos, Emily de Oliveira, Järviniemi, Olli, Barnett, Matthew, Sandler, Robert, Vrzala, Matej, Sevilla, Jaime, Ren, Qiuyu, Pratt, Elizabeth, Levine, Lionel, Barkley, Grant, Stewart, Natalie, Grechuk, Bogdan, Grechuk, Tetiana, Enugandla, Shreepranav Varma, Wildon, Mark

arXiv.org Artificial IntelligenceDec-19-2024

Recent AI systems have demonstrated remarkable proficiency in tackling challenging mathematical tasks, from achieving olympiad-level performance in geometry (Trinh et al. 2024) to improving upon existing research results in combinatorics (Romera-Paredes et al. 2024). However, existing benchmarks face some limitations: Saturation of existing benchmarks Current standard mathematics benchmarks such as the MATH dataset (Hendrycks, Burns, Kadavath, et al. 2021) and GSM8K (Cobbe et al. 2021) primarily assess competency at the high-school and early undergraduate level. As state-of-the-art models achieve near-perfect performance on these benchmarks, we lack rigorous ways to evaluate their capabilities in advanced mathematical domains that require deeper theoretical understanding, creative insight, and specialized expertise. Furthermore, to assess AI's potential contributions to mathematics research, we require benchmarks that better reflect the challenges faced by working mathematicians. Benchmark contamination in training data A significant challenge in evaluating large language models (LLMs) is data contamination--the inadvertent inclusion of benchmark problems in training data.

benchmark, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2411.04872

Country: Europe (0.28)

Genre: Research Report > Promising Solution (0.48)

Industry: Education > Educational Setting (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AI capabilities can be significantly improved without expensive retraining

Davidson, Tom, Denain, Jean-Stanislas, Villalobos, Pablo, Bas, Guillem

arXiv.org Artificial IntelligenceDec-12-2023

State-of-the-art AI systems can be significantly improved without expensive retraining via "post-training enhancements"-techniques applied after initial training like fine-tuning the system to use a web browser. We review recent post-training enhancements, categorizing them into five types: tool-use, prompting methods, scaffolding, solution selection, and data generation. Different enhancements improve performance on different tasks, making it hard to compare their significance. So we translate improvements from different enhancements into a common currency, the compute-equivalent gain: how much additional training compute would be needed to improve performance by the same amount as the enhancement. Our non-experimental work shows that post-training enhancements have significant benefits: most surveyed enhancements improve benchmark performance by more than a 5x increase in training compute, some by more than 20x. Post-training enhancements are relatively cheap to develop: fine-tuning costs are typically <1% of the original training cost. Governing the development of capable post-training enhancements may be challenging because frontier models could be enhanced by a wide range of actors.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2312.07413

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Overthinking the Truth: Understanding how Language Models Process False Demonstrations

Halawi, Danny, Denain, Jean-Stanislas, Steinhardt, Jacob

arXiv.org Artificial IntelligenceJul-18-2023

Modern language models can imitate complex patterns through few-shot learning, enabling them to complete challenging tasks without fine-tuning. However, imitation can also lead models to reproduce inaccuracies or harmful content if present in the context. We study harmful imitation through the lens of a model's internal representations, and identify two related phenomena: overthinking and false induction heads. The first phenomenon, overthinking, appears when we decode predictions from intermediate layers, given correct vs. incorrect few-shot demonstrations. At early layers, both demonstrations induce similar model behavior, but the behavior diverges sharply at some "critical layer", after which the accuracy given incorrect demonstrations progressively decreases. The second phenomenon, false induction heads, are a possible mechanistic cause of overthinking: these are heads in late layers that attend to and copy false information from previous demonstrations, and whose ablation reduces overthinking. Beyond scientific understanding, our results suggest that studying intermediate model computations could be a promising avenue for understanding and guarding against harmful model behaviors.

demonstration, machine learning, pattern recognition, (16 more...)

arXiv.org Artificial Intelligence

2307.09476

Country:

North America > United States (0.93)
Europe (0.92)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (1.00)
Government (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Banking & Finance (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

Auditing Visualizations: Transparency Methods Struggle to Detect Anomalous Behavior

Denain, Jean-Stanislas, Steinhardt, Jacob

arXiv.org Artificial IntelligenceMay-29-2023

Model visualizations provide information that outputs alone might miss. But can we trust that model visualizations reflect model behavior? For instance, can they diagnose abnormal behavior such as planted backdoors or overregularization? To evaluate visualization methods, we test whether they assign different visualizations to anomalously trained models and normal models. We find that while existing methods can detect models with starkly anomalous behavior, they struggle to identify more subtle anomalies. Moreover, they often fail to recognize the inputs that induce anomalous behavior, e.g. images containing a spurious cue. These results reveal blind spots and limitations of some popular model visualizations. By introducing a novel evaluation framework for visualizations, our work paves the way for developing more reliable model transparency methods in the future.

anomaly, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2206.13498

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

MetFlow: A New Efficient Method for Bridging the Gap between Markov Chain Monte Carlo and Variational Inference

Thin, Achille, Kotelevskii, Nikita, Denain, Jean-Stanislas, Grinsztajn, Leo, Durmus, Alain, Panov, Maxim, Moulines, Eric

arXiv.org Machine LearningFeb-27-2020

In this contribution, we propose a new computationally efficient method to combine Variational Inference (VI) with Markov Chain Monte Carlo (MCMC). This approach can be used with generic MCMC kernels, but is especially well suited to \textit{MetFlow}, a novel family of MCMC algorithms we introduce, in which proposals are obtained using Normalizing Flows. The marginal distribution produced by such MCMC algorithms is a mixture of flow-based distributions, thus drastically increasing the expressivity of the variational family. Unlike previous methods following this direction, our approach is amenable to the reparametrization trick and does not rely on computationally expensive reverse kernels. Extensive numerical experiments show clear computational and performance improvements over state-of-the-art methods.

deep learning, metflow, neural network, (18 more...)

arXiv.org Machine Learning

2002.12253

Country:

Europe (0.67)
North America > United States (0.28)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.61)

Add feedback