AITopics | Magka, Despoina

Collaborating Authors

Magka, Despoina

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Nathani, Deepak, Madaan, Lovish, Roberts, Nicholas, Bashlykov, Nikolay, Menon, Ajay, Moens, Vincent, Budhiraja, Amar, Magka, Despoina, Vorotilov, Vladislav, Chaurasia, Gaurav, Hupkes, Dieuwke, Cabral, Ricardo Silveira, Shavrina, Tatiana, Foerster, Jakob, Bachrach, Yoram, Wang, William Yang, Raileanu, Roberta

arXiv.org Artificial IntelligenceFeb-20-2025

We introduce Meta MLGym and MLGym-Bench, a new framework and benchmark for evaluating and developing LLM agents on AI research tasks. This is the first Gym environment for machine learning (ML) tasks, enabling research on reinforcement learning (RL) algorithms for training such agents. MLGym-bench consists of 13 diverse and open-ended AI research tasks from diverse domains such as computer vision, natural language processing, reinforcement learning, and game theory. Solving these tasks requires real-world AI research skills such as generating new ideas and hypotheses, creating and processing data, implementing ML methods, training models, running experiments, analyzing the results, and iterating through this process to improve on a given task. We evaluate a number of frontier large language models (LLMs) on our benchmarks such as Claude-3.5-Sonnet, Llama-3.1 405B, GPT-4o, o1-preview, and Gemini-1.5 Pro. Our MLGym framework makes it easy to add new tasks, integrate and evaluate models or agents, generate synthetic data at scale, as well as develop new learning algorithms for training agents on AI research tasks. We find that current frontier models can improve on the given baselines, usually by finding better hyperparameters, but do not generate novel hypotheses, algorithms, architectures, or substantial improvements. We open-source our framework and benchmark to facilitate future research in advancing the AI research capabilities of LLM agents.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.14499

Country:

North America > United States > Wisconsin (0.14)
North America > United States > California (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre:

Research Report > New Finding (1.00)
Workflow (0.93)

Industry:

Leisure & Entertainment > Games (1.00)
Health & Medicine (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Computing Stable Models for Nonmonotonic Existential Rules

Magka, Despoina (University of Oxford) | Krötzsch, Markus (University of Oxford) | Horrocks, Ian (University of Oxford)

AAAI ConferencesAug-3-2013

In this work, we consider function-free existential rules extended with nonmonotonic negation under a stable model semantics. We present new acyclicity and stratification conditions that identify a large class of rule sets having finite, unique stable models, and we show how the addition of constraints on the input facts can further extend this class. Checking these conditions is computationally feasible, and we provide tight complexity bounds. Finally, we demonstrate how these new methods allowed us to solve relevant reasoning problems over a real-world knowledge base from biochemistry using an off-the-shelf answer set programming engine.

computing stable model, nonmonotonic existential rule

AAAI Conferences

Twenty-Third International Joint Conference on Artificial Intelligence

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.53)

Add feedback