AITopics | Ruoss, Anian

Collaborating Authors

Ruoss, Anian

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Why is prompting hard? Understanding prompts on binary sequence predictors

Wenliang, Li Kevin, Ruoss, Anian, Grau-Moya, Jordi, Hutter, Marcus, Genewein, Tim

arXiv.org Machine LearningFeb-15-2025

Large language models (LLMs) can be prompted to do many tasks, but finding good prompts is not always easy, nor is understanding some performant prompts. We explore these issues by viewing prompting as conditioning a near-optimal sequence predictor (LLM) pretrained on diverse data sources. Through numerous prompt search experiments, we show that the unintuitive patterns in optimal prompts can be better understood given the pretraining distribution, which is often unavailable in practice. Moreover, even using exhaustive search, reliably identifying optimal prompts from practical neural predictors can be difficult. Further, we demonstrate that common prompting methods, such as using intuitive prompts or samples from the targeted task, are in fact suboptimal. Thus, this work takes an initial step towards understanding the difficulties in finding and understanding optimal prompts from a statistical and empirical perspective.

binary sequence predictor, large language model, natural language, (1 more...)

arXiv.org Machine Learning

2502.1076

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)

Add feedback

LMAct: A Benchmark for In-Context Imitation Learning with Long Multimodal Demonstrations

Ruoss, Anian, Pardo, Fabio, Chan, Harris, Li, Bonnie, Mnih, Volodymyr, Genewein, Tim

arXiv.org Artificial IntelligenceDec-2-2024

Today's largest foundation models have increasingly general capabilities, yet when used as agents, they often struggle with simple reasoning and decision-making tasks, even though they possess good factual knowledge of the task and how to solve it. In this paper, we present a benchmark to pressure-test these models' multimodal decision-making capabilities in the very long-context regime (up to one million tokens) and investigate whether they can learn from a large number of expert demonstrations in their context. We evaluate a wide range of state-of-the-art frontier models as policies across a battery of simple interactive decision-making tasks: playing tic-tac-toe, chess, and Atari, navigating grid worlds, solving crosswords, and controlling a simulated cheetah. We measure the performance of Claude 3.5 Sonnet, Gemini 1.5 Flash, Gemini 1.5 Pro, GPT-4o, o1-mini, and o1-preview under increasing amounts of expert demonstrations in the context $\unicode{x2013}$ from no demonstrations up to 512 full episodes, pushing these models' multimodal long-context reasoning capabilities to their limits. Across our tasks, today's frontier models rarely manage to fully reach expert performance, showcasing the difficulty of our benchmark. Presenting more demonstrations often has little effect, but some models steadily improve with more demonstrations on a few tasks. We investigate the effect of encoding observations as text or images and the impact of chain-of-thought prompting. Overall, our results suggest that even today's most capable models often struggle to imitate desired behavior by generalizing purely from in-context demonstrations. To help quantify the impact of other approaches and future innovations aiming to tackle this problem, we open source our benchmark that covers the zero-, few-, and many-shot regimes in a unified evaluation.

gemini 1, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2412.01441

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment > Games > Chess (0.49)
Leisure & Entertainment > Games > Tic-Tac-Toe (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Mastering Board Games by External and Internal Planning with Language Models

Schultz, John, Adamek, Jakub, Jusup, Matej, Lanctot, Marc, Kaisers, Michael, Perrin, Sarah, Hennes, Daniel, Shar, Jeremy, Lewis, Cannada, Ruoss, Anian, Zahavy, Tom, Veličković, Petar, Prince, Laurel, Singh, Satinder, Malmi, Eric, Tomašev, Nenad

arXiv.org Artificial IntelligenceDec-2-2024

While large language models perform well on a range of complex tasks (e.g., text generation, question answering, summarization), robust multi-step planning and reasoning remains a considerable challenge for them. In this paper we show that search-based planning can significantly improve LLMs' playing strength across several board games (Chess, Fischer Random / Chess960, Connect Four, and Hex). We introduce, compare and contrast two major approaches: In external search, the model guides Monte Carlo Tree Search (MCTS) rollouts and evaluations without calls to an external engine, and in internal search, the model directly generates in-context a linearized tree of potential futures and a resulting final choice. Both build on a language model pre-trained on relevant domain knowledge, capturing the transition and value functions across these games. We find that our pre-training method minimizes hallucinations, as our model is highly accurate regarding state prediction and legal moves. Additionally, both internal and external search indeed improve win-rates against state-of-the-art bots, even reaching Grandmaster-level performance in chess while operating on a similar move count search budget per decision as human Grandmasters. The way we combine search with domain knowledge is not specific to board games, suggesting direct extensions into more general language model inference and training techniques.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.12119

Country: Europe (0.28)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Chess (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Compression via Pre-trained Transformers: A Study on Byte-Level Multimodal Data

Heurtel-Depeiges, David, Ruoss, Anian, Veness, Joel, Genewein, Tim

arXiv.org Artificial IntelligenceOct-7-2024

Foundation models have recently been shown to be strong data compressors. However, when accounting for their excessive parameter count, their compression ratios are actually inferior to standard compression algorithms. Moreover, naively reducing the number of parameters may not necessarily help as it leads to worse predictions and thus weaker compression. In this paper, we conduct a large-scale empirical study to investigate whether there is a sweet spot where competitive compression ratios with pre-trained vanilla transformers are possible. To this end, we train families of models on 165GB of raw byte sequences of either text, image, or audio data (and all possible combinations of the three) and then compress 1GB of out-of-distribution (OOD) data from each modality. We find that relatively small models (i.e., millions of parameters) can outperform standard general-purpose compression algorithms (gzip, LZMA2) and even domain-specific compressors (PNG, JPEG 2000, FLAC) - even when factoring in parameter count. We achieve, e.g., the lowest compression ratio of 0.49 on OOD audio data (vs. 0.54 for FLAC). To study the impact of model- and dataset scale, we conduct extensive ablations and hyperparameter sweeps, and we investigate the effect of unimodal versus multimodal training. We find that even small models can be trained to perform well on multiple modalities, but, in contrast to previously reported results with large-scale foundation models, transfer to unseen modalities is generally weak.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.05078

Country:

North America > Canada > Quebec (0.14)
Europe (0.14)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)

Add feedback

Evaluating Frontier Models for Dangerous Capabilities

Phuong, Mary, Aitchison, Matthew, Catt, Elliot, Cogan, Sarah, Kaskasoli, Alexandre, Krakovna, Victoria, Lindner, David, Rahtz, Matthew, Assael, Yannis, Hodkinson, Sarah, Howard, Heidi, Lieberum, Tom, Kumar, Ramana, Raad, Maria Abi, Webson, Albert, Ho, Lewis, Lin, Sharon, Farquhar, Sebastian, Hutter, Marcus, Deletang, Gregoire, Ruoss, Anian, El-Sayed, Seliem, Brown, Sasha, Dragan, Anca, Shah, Rohin, Dafoe, Allan, Shevlane, Toby

arXiv.org Artificial IntelligenceApr-5-2024

To understand the risks posed by a new AI system, we must understand what it can and cannot do. Building on prior work, we introduce a programme of new "dangerous capability" evaluations and pilot them on Gemini 1.0 models. Our evaluations cover four areas: (1) persuasion and deception; (2) cyber-security; (3) self-proliferation; and (4) self-reasoning. We do not find evidence of strong dangerous capabilities in the models we evaluated, but we flag early warning signs. Our goal is to help advance a rigorous science of dangerous capability evaluation, in preparation for future models.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2403.13793

Country:

North America > United States (1.00)
Europe > United Kingdom (0.92)
Asia > Middle East > Palestine (0.27)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)
Overview (0.92)
Research Report > Experimental Study (0.67)

Industry:

Leisure & Entertainment > Sports > Soccer (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
(6 more...)

Technology:

Information Technology > Software (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Web (1.00)
(7 more...)

Add feedback

Grandmaster-Level Chess Without Search

Ruoss, Anian, Delétang, Grégoire, Medapati, Sourabh, Grau-Moya, Jordi, Wenliang, Li Kevin, Catt, Elliot, Reid, John, Genewein, Tim

arXiv.org Artificial IntelligenceFeb-6-2024

The recent breakthrough successes in machine learning are mainly attributed to scale: namely large-scale attention-based architectures and datasets of unprecedented scale. This paper investigates the impact of training at scale for chess. Unlike traditional chess engines that rely on complex heuristics, explicit search, or a combination of both, we train a 270M parameter transformer model with supervised learning on a dataset of 10 million chess games. We annotate each board in the dataset with action-values provided by the powerful Stockfish 16 engine, leading to roughly 15 billion data points. Our largest model reaches a Lichess blitz Elo of 2895 against humans, and successfully solves a series of challenging chess puzzles, without any domain-specific tweaks or explicit search algorithms. We also show that our model outperforms AlphaZero's policy and value networks (without MCTS) and GPT-3.5-turbo-instruct. A systematic investigation of model and dataset size shows that strong chess performance only arises at sufficient scale. To validate our results, we perform an extensive series of ablations of design choices and hyperparameters.

grandmaster-level chess, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2402.04494

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Games > Chess (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.88)

Add feedback

Learning Universal Predictors

Grau-Moya, Jordi, Genewein, Tim, Hutter, Marcus, Orseau, Laurent, Delétang, Grégoire, Catt, Elliot, Ruoss, Anian, Wenliang, Li Kevin, Mattern, Christopher, Aitchison, Matthew, Veness, Joel

arXiv.org Artificial IntelligenceJan-26-2024

Meta-learning has emerged as a powerful approach to train neural networks to learn new tasks quickly from limited data. Broad exposure to different tasks leads to versatile representations enabling general problem solving. But, what are the limits of meta-learning? In this work, we explore the potential of amortizing the most powerful universal predictor, namely Solomonoff Induction (SI), into neural networks via leveraging meta-learning to its limits. We use Universal Turing Machines (UTMs) to generate training data used to expose networks to a broad range of patterns. We provide theoretical analysis of the UTM data generation processes and meta-training protocols. We conduct comprehensive experiments with neural architectures (e.g. LSTMs, Transformers) and algorithmic data generators of varying complexity and universality. Our results suggest that UTM data is a valuable resource for meta-learning, and that it can be used to train neural networks capable of learning universal prediction strategies.

artificial intelligence, machine learning, sequence, (17 more...)

arXiv.org Artificial Intelligence

2401.14953

Country:

Europe (0.28)
Oceania > Australia > Victoria > Melbourne (0.14)
North America > Canada > Quebec (0.14)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Distributional Bellman Operators over Mean Embeddings

Wenliang, Li Kevin, Déletang, Grégoire, Aitchison, Matthew, Hutter, Marcus, Ruoss, Anian, Gretton, Arthur, Rowland, Mark

arXiv.org Machine LearningDec-9-2023

We propose a novel algorithmic framework for distributional reinforcement learning, based on learning finite-dimensional mean embeddings of return distributions. We derive several new algorithms for dynamic programming and temporal-difference learning based on this framework, provide asymptotic convergence theory, and examine the empirical performance of the algorithms on a suite of tabular tasks. Further, we show that this approach can be straightforwardly combined with deep reinforcement learning, and obtain a new deep RL agent that improves over baseline distributional approaches on the Arcade Learning Environment.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2312.07358

Genre: Research Report (0.50)

Industry:

Health & Medicine (0.46)
Leisure & Entertainment > Sports (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)

Add feedback

Finding Increasingly Large Extremal Graphs with AlphaZero and Tabu Search

Mehrabian, Abbas, Anand, Ankit, Kim, Hyunjik, Sonnerat, Nicolas, Balog, Matej, Comanici, Gheorghe, Berariu, Tudor, Lee, Andrew, Ruoss, Anian, Bulanova, Anna, Toyama, Daniel, Blackwell, Sam, Paredes, Bernardino Romera, Veličković, Petar, Orseau, Laurent, Lee, Joonkyung, Naredla, Anurag Murty, Precup, Doina, Wagner, Adam Zsolt

arXiv.org Artificial IntelligenceNov-6-2023

This work studies a central extremal graph theory problem inspired by a 1975 conjecture of Erd\H{o}s, which aims to find graphs with a given size (number of nodes) that maximize the number of edges without having 3- or 4-cycles. We formulate this problem as a sequential decision-making problem and compare AlphaZero, a neural network-guided tree search, with tabu search, a heuristic local search method. Using either method, by introducing a curriculum -- jump-starting the search for larger graphs using good graphs found at smaller sizes -- we improve the state-of-the-art lower bounds for several sizes. We also propose a flexible graph-generation environment and a permutation-invariant network architecture for learning to search in the space of graphs.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2311.03583

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Games (0.93)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Language Modeling Is Compression

Delétang, Grégoire, Ruoss, Anian, Duquenne, Paul-Ambroise, Catt, Elliot, Genewein, Tim, Mattern, Christopher, Grau-Moya, Jordi, Wenliang, Li Kevin, Aitchison, Matthew, Orseau, Laurent, Hutter, Marcus, Veness, Joel

arXiv.org Artificial IntelligenceSep-19-2023

It has long been established that predictive models can be transformed into lossless compressors and vice versa. Incidentally, in recent years, the machine learning community has focused on training increasingly large and powerful self-supervised (language) models. Since these large language models exhibit impressive predictive capabilities, they are well-positioned to be strong compressors. In this work, we advocate for viewing the prediction problem through the lens of compression and evaluate the compression capabilities of large (foundation) models. We show that large language models are powerful general-purpose predictors and that the compression viewpoint provides novel insights into scaling laws, tokenization, and in-context learning. For example, Chinchilla 70B, while trained primarily on text, compresses ImageNet patches to 43.4% and LibriSpeech samples to 16.4% of their raw size, beating domain-specific compressors like PNG (58.5%) or FLAC (30.3%), respectively. Finally, we show that the prediction-compression equivalence allows us to use any compressor (like gzip) to build a conditional generative model.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2309.10668

Country: Europe > United Kingdom (0.28)

Genre: Research Report (0.40)

Industry:

Government (0.68)
Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.66)

Add feedback