AITopics | Schellaert, Wout

Collaborating Authors

Schellaert, Wout

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PredictaBoard: Benchmarking LLM Score Predictability

Pacchiardi, Lorenzo, Voudouris, Konstantinos, Slater, Ben, Martínez-Plumed, Fernando, Hernández-Orallo, José, Zhou, Lexin, Schellaert, Wout

arXiv.org Machine LearningFeb-20-2025

Despite possessing impressive skills, Large Language Models (LLMs) often fail unpredictably, demonstrating inconsistent success in even basic common sense reasoning tasks. This unpredictability poses a significant challenge to ensuring their safe deployment, as identifying and operating within a reliable "safe zone" is essential for mitigating risks. To address this, we present PredictaBoard, a novel collaborative benchmarking framework designed to evaluate the ability of score predictors (referred to as assessors) to anticipate LLM errors on specific task instances (i.e., prompts) from existing datasets. PredictaBoard evaluates pairs of LLMs and assessors by considering the rejection rate at different tolerance errors. As such, PredictaBoard stimulates research into developing better assessors and making LLMs more predictable, not only with a higher average performance. We conduct illustrative experiments using baseline assessors and state-of-the-art LLMs. PredictaBoard highlights the critical need to evaluate predictability alongside performance, paving the way for safer AI systems where errors are not only minimised but also anticipated and effectively mitigated. Code for our benchmark can be found at https://github.com/Kinds-of-Intelligence-CFI/PredictaBoard

large language model, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2502.14445

Country:

Europe > Germany (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Health & Medicine (0.67)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

Animal-AI 3: What's New & Why You Should Care

Voudouris, Konstantinos, Alhas, Ibrahim, Schellaert, Wout, Crosby, Matthew, Holmes, Joel, Burden, John, Chaubey, Niharika, Donnelly, Niall, Patel, Matishalin, Halina, Marta, Hernández-Orallo, José, Cheke, Lucy G.

arXiv.org Artificial IntelligenceDec-18-2023

The Animal-AI Environment is a unique game-based research platform designed to serve both the artificial intelligence and cognitive science research communities. In this paper, we present Animal-AI 3, the latest version of the environment, outlining several major new features that make the game more engaging for humans and more complex for AI systems. New features include interactive buttons, reward dispensers, and player notifications, as well as an overhaul of the environment's graphics and processing for significant increases in agent training time and quality of the human player experience. We provide detailed guidance on how to build computational and behavioural experiments with Animal-AI 3. We present results from a series of agents, including the state-of-the-art Deep Reinforcement Learning agent (dreamer-v3), on newly designed tests and the Animal-AI Testbed of 900 tasks inspired by research in comparative psychology. Animal-AI 3 is designed to facilitate collaboration between the cognitive sciences and artificial intelligence. This paper serves as a stand-alone document that motivates, describes, and demonstrates Animal-AI 3 for the end user.

animal-ai 3, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2312.11414

Country:

North America > United States (0.67)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)

Genre: Research Report > Experimental Study (0.93)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

Predictable Artificial Intelligence

Zhou, Lexin, Moreno-Casares, Pablo A., Martínez-Plumed, Fernando, Burden, John, Burnell, Ryan, Cheke, Lucy, Ferri, Cèsar, Marcoci, Alexandru, Mehrbakhsh, Behzad, Moros-Daval, Yael, hÉigeartaigh, Seán Ó, Rutar, Danaja, Schellaert, Wout, Voudouris, Konstantinos, Hernández-Orallo, José

arXiv.org Artificial IntelligenceOct-9-2023

We introduce the fundamental ideas and challenges of Predictable AI, a nascent research area that explores the ways in which we can anticipate key indicators of present and future AI ecosystems. We argue that achieving predictability is crucial for fostering trust, liability, control, alignment and safety of AI ecosystems, and thus should be prioritised over performance. While distinctive from other areas of technical and non-technical AI research, the questions, hypotheses and challenges relevant to Predictable AI were yet to be clearly described. This paper aims to elucidate them, calls for identifying paths towards AI predictability and outlines the potential impact of this emergent field.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2310.06167

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States (0.14)

Genre: Research Report (0.40)

Industry:

Leisure & Entertainment (0.93)
Transportation (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Your Prompt is My Command: On Assessing the Human-Centred Generality of Multimodal Models

Schellaert, Wout, Martínez-Plumed, Fernando, Vold, Karina, Burden, John, A. M. Casares, Pablo, Sheng Loe, Bao, Reichart, Roi, Ó hÉigeartaigh, Sean, Korhonen, Anna, Hernández-Orallo, José

Journal of Artificial Intelligence ResearchJun-12-2023

Even with obvious deficiencies, large prompt-commanded multimodal models are proving to be flexible cognitive tools representing an unprecedented generality. But the directness, diversity, and degree of user interaction create a distinctive “human-centred generality” (HCG), rather than a fully autonomous one. HCG implies that —for a specific user— a system is only as general as it is effective for the user’s relevant tasks and their prevalent ways of prompting. A human-centred evaluation of general-purpose AI systems therefore needs to reflect the personal nature of interaction, tasks and cognition. We argue that the best way to understand these systems is as highly-coupled cognitive extenders, and to analyse the bidirectional cognitive adaptations between them and humans. In this paper, we give a formulation of HCG, as well as a high-level overview of the elements and trade-offs involved in the prompting process. We end the paper by outlining some essential research questions and suggestions for improving evaluation practices, which we envision as characteristic for the evaluation of general artificial intelligence in the future. This paper appears in the AI & Society track.

arxiv preprint arxiv, machine learning, natural language, (13 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.14157

AI Access Foundation

14157

Journal of Artificial Intelligence Research

Country:

Europe (1.00)
North America > United States (0.46)
North America > Canada (0.28)
Asia > Middle East (0.28)

Technology:

Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
(2 more...)

Add feedback