AITopics | Sevilla, Jaime

Collaborating Authors

Sevilla, Jaime

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI

Glazer, Elliot, Erdil, Ege, Besiroglu, Tamay, Chicharro, Diego, Chen, Evan, Gunning, Alex, Olsson, Caroline Falkman, Denain, Jean-Stanislas, Ho, Anson, Santos, Emily de Oliveira, Järviniemi, Olli, Barnett, Matthew, Sandler, Robert, Vrzala, Matej, Sevilla, Jaime, Ren, Qiuyu, Pratt, Elizabeth, Levine, Lionel, Barkley, Grant, Stewart, Natalie, Grechuk, Bogdan, Grechuk, Tetiana, Enugandla, Shreepranav Varma, Wildon, Mark

arXiv.org Artificial IntelligenceDec-19-2024

Recent AI systems have demonstrated remarkable proficiency in tackling challenging mathematical tasks, from achieving olympiad-level performance in geometry (Trinh et al. 2024) to improving upon existing research results in combinatorics (Romera-Paredes et al. 2024). However, existing benchmarks face some limitations: Saturation of existing benchmarks Current standard mathematics benchmarks such as the MATH dataset (Hendrycks, Burns, Kadavath, et al. 2021) and GSM8K (Cobbe et al. 2021) primarily assess competency at the high-school and early undergraduate level. As state-of-the-art models achieve near-perfect performance on these benchmarks, we lack rigorous ways to evaluate their capabilities in advanced mathematical domains that require deeper theoretical understanding, creative insight, and specialized expertise. Furthermore, to assess AI's potential contributions to mathematics research, we require benchmarks that better reflect the challenges faced by working mathematicians. Benchmark contamination in training data A significant challenge in evaluating large language models (LLMs) is data contamination--the inadvertent inclusion of benchmark problems in training data.

benchmark, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2411.04872

Country: Europe (0.28)

Genre: Research Report > Promising Solution (0.48)

Industry: Education > Educational Setting (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Explaining Bayesian Networks in Natural Language using Factor Arguments. Evaluation in the medical domain

Sevilla, Jaime, Babakov, Nikolay, Reiter, Ehud, Bugarin, Alberto

arXiv.org Artificial IntelligenceOct-23-2024

In this paper, we propose a model for building natural language explanations for Bayesian Network Reasoning in terms of factor arguments, which are argumentation graphs of flowing evidence, relating the observed evidence to a target variable we want to learn about. We introduce the notion of factor argument independence to address the outstanding question of defining when arguments should be presented jointly or separately and present an algorithm that, starting from the evidence nodes and a target node, produces a list of all independent factor arguments ordered by their strength. Finally, we implemented a scheme to build natural language explanations of Bayesian Reasoning using this approach. Our proposal has been validated in the medical domain through a human-driven evaluation study where we compare the Bayesian Network Reasoning explanations obtained using factor arguments with an alternative explanation method. Evaluation results indicate that our proposed explanation approach is deemed by users as significantly more useful for understanding Bayesian Network Reasoning than another existing explanation method it is compared to.

explanation, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.1806

Country:

North America > United States (0.46)
Europe > Spain (0.28)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.51)
Health & Medicine > Therapeutic Area > Immunology (0.51)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Algorithmic progress in language models

Ho, Anson, Besiroglu, Tamay, Erdil, Ege, Owen, David, Rahman, Robi, Guo, Zifan Carl, Atkinson, David, Thompson, Neil, Sevilla, Jaime

arXiv.org Artificial IntelligenceMar-9-2024

We investigate the rate at which algorithms for pre-training language models have improved since the advent of deep learning. Using a dataset of over 200 language model evaluations on Wikitext and Penn Treebank spanning 2012-2023, we find that the compute required to reach a set performance threshold has halved approximately every 8 months, with a 95% confidence interval of around 5 to 14 months, substantially faster than hardware gains per Moore's Law. We estimate augmented scaling laws, which enable us to quantify algorithmic progress and determine the relative contributions of scaling models versus innovations in training algorithms. Despite the rapid pace of algorithmic progress and the development of new architectures such as the transformer, our analysis reveals that the increase in compute made an even larger contribution to overall performance improvements over this time period. Though limited by noisy benchmark data, our analysis quantifies the rapid progress in language modeling, shedding light on the relative contributions from compute and algorithms.

algorithmic progress, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2403.05812

Country: Asia (0.14)

Genre:

Research Report > New Finding (0.88)
Research Report > Experimental Study (0.67)

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.90)

Add feedback

Power Law Trends in Speedrunning and Machine Learning

Erdil, Ege, Sevilla, Jaime

arXiv.org Artificial IntelligenceApr-19-2023

We find that improvements in speedrunning world records follow a power law pattern. Using this observation, we answer an outstanding question from previous work: How do we improve on the baseline of predicting no improvement when forecasting speedrunning world records out to some time horizon, such as one month? Using a random effects model, we improve on this baseline for relative mean square error made on predicting out-of-sample world record improvements as the comparison metric at a $p < 10^{-5}$ significance level. The same set-up improves \textit{even} on the ex-post best exponential moving average forecasts at a $p = 0.15$ significance level while having access to substantially fewer data points. We demonstrate the effectiveness of this approach by applying it to Machine Learning benchmarks and achieving forecasts that exceed a baseline. Finally, we interpret the resulting model to suggest that 1) ML benchmarks are far from saturation and 2) sudden large improvements in Machine Learning are unlikely but cannot be ruled out.

artificial intelligence, machine learning, world record, (16 more...)

arXiv.org Artificial Intelligence

2304.10004

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.67)

Add feedback

Machine Learning Model Sizes and the Parameter Gap

Villalobos, Pablo, Sevilla, Jaime, Besiroglu, Tamay, Heim, Lennart, Ho, Anson, Hobbhahn, Marius

arXiv.org Artificial IntelligenceJul-5-2022

We study trends in model size of notable machine learning systems over time using a curated dataset. From 1950 to 2018, model size in language models increased steadily by seven orders of magnitude. The trend then accelerated, with model size increasing by another five orders of magnitude in just 4 years from 2018 to 2022. Vision models grew at a more constant pace, totaling 7 orders of magnitude of growth between 1950 and 2022. We also identify that, since 2020, there have been many language models below 20B parameters, many models above 70B parameters, but a scarcity of models in the 20-70B parameter range. We refer to that scarcity as the parameter gap. We provide some stylized facts about the parameter gap and propose a few hypotheses to explain it. The explanations we favor are: (a) increasing model size beyond 20B parameters requires adopting different parallelism techniques, which makes mid-sized models less cost-effective, (b) GPT-3 was one order of magnitude larger than previous language models, and researchers afterwards primarily experimented with bigger models to outperform it. While these dynamics likely exist, and we believe they play some role in generating the gap, we don't have high confidence that there are no other, more important dynamics at play.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2207.02852

Genre: Research Report (0.50)

Industry: Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Compute Trends Across Three Eras of Machine Learning

Sevilla, Jaime, Heim, Lennart, Ho, Anson, Besiroglu, Tamay, Hobbhahn, Marius, Villalobos, Pablo

arXiv.org Artificial IntelligenceFeb-11-2022

Compute, data, and algorithmic advances are the three fundamental factors that guide the progress of modern Machine Learning (ML). In this paper we study trends in the most readily quantified factor - compute. We show that before 2010 training compute grew in line with Moore's law, doubling roughly every 20 months. Since the advent of Deep Learning in the early 2010s, the scaling of training compute has accelerated, doubling approximately every 6 months. In late 2015, a new trend emerged as firms developed large-scale ML models with 10 to 100-fold larger requirements in training compute. Based on these observations we split the history of compute in ML into three eras: the Pre Deep Learning Era, the Deep Learning Era and the Large-Scale Era. Overall, our work highlights the fast-growing compute requirements for training advanced ML systems.

artificial intelligence, deep learning, neural network, (2 more...)

arXiv.org Artificial Intelligence

2202.05924

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.64)

Add feedback

Finding, Scoring and Explaining Arguments in Bayesian Networks

Sevilla, Jaime

arXiv.org Artificial IntelligenceNov-30-2021

We propose a new approach to explain Bayesian Networks. The approach revolves around a new definition of a probabilistic argument and the evidence it provides. We define a notion of independent arguments, and propose an algorithm to extract a list of relevant, independent arguments given a Bayesian Network, a target node and a set of observations. To demonstrate the relevance of the arguments, we show how we can use the extracted arguments to approximate message passing. Finally, we show a simple scheme to explain the arguments in natural language.

argument, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2112.00799

Country:

Europe > United Kingdom (0.46)
North America > United States (0.28)

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.31)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Implications of Quantum Computing for Artificial Intelligence alignment research

Sevilla, Jaime, Moreno, Pablo

arXiv.org Artificial IntelligenceAug-22-2019

Quantum Computing (QC) is a disruptive technology that may not be too far ahead in the horizon. Small proof-of-concept quantum computers have already been built [1] and major obstacles to large-scale quantum computing are being heavily researched [2] . Among its potential uses, QC will allow breaking classical cryptographic codes, simulate large quantum systems and faster search and optimization [3] . This last use case is of particular interest to Artificial Intelligence (AI) Strategy. In particular, variants of the Grover algorithm can be exploited to gain a quadratic speedup in search problems, and some recent Quantum Machine Learning (QML) developments have led to exponential gains in certain Machine Learning tasks [4] (though with important caveats which may invalidate their practical use [5]). These ideas have the potential to exert a transformative effect on research in AI (as noted in [6], for example). Furthermore the technical aspects of QC, which put some physical limits on the observation of the inner workings of a quantum machine and hinder the verification of quantum computations [7], may pose an additional challenge for AI Alignment concerns. In this short article we introduce a heuristic model of quantum computing that captures the most relevant characteristics of QC for technical AI Alignment research.

agent, artificial intelligence, machine learning, (14 more...)

arXiv.org Artificial Intelligence

1908.07613

Country:

North America > United States (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Chess (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback