AITopics | Precup, Doina

Collaborating Authors

Precup, Doina

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Capturing Individual Human Preferences with Reward Features

Barreto, André, Dumoulin, Vincent, Mao, Yiran, Perez-Nieves, Nicolas, Shahriari, Bobak, Dauphin, Yann, Precup, Doina, Larochelle, Hugo

arXiv.org Machine LearningMar-21-2025

Reinforcement learning from human feedback usually models preferences using a reward model that does not distinguish between people. We argue that this is unlikely to be a good design choice in contexts with high potential for disagreement, like in the training of large language models. We propose a method to specialise a reward model to a person or group of people. Our approach builds on the observation that individual preferences can be captured as a linear combination of a set of general reward features. We show how to learn such features and subsequently use them to quickly adapt the reward model to a specific individual, even if their preferences are not reflected in the training data. We present experiments with large language models comparing the proposed architecture with a non-adaptive reward model and also adaptive counterparts, including models that do in-context personalisation. Depending on how much disagreement there is in the training data, our model either significantly outperforms the baselines or matches their performance with a simpler architecture and more stable training.

large language model, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2503.17338

Country:

North America > United States (0.14)
Europe (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Agency Is Frame-Dependent

Abel, David, Barreto, André, Bowling, Michael, Dabney, Will, Dong, Shi, Hansen, Steven, Harutyunyan, Anna, Khetarpal, Khimya, Lyle, Clare, Pascanu, Razvan, Piliouras, Georgios, Precup, Doina, Richens, Jonathan, Rowland, Mark, Schaul, Tom, Singh, Satinder

arXiv.org Artificial IntelligenceFeb-6-2025

Agency is a system's capacity to steer outcomes toward a goal, and is a central topic of study across biology, philosophy, cognitive science, and artificial intelligence. Determining if a system exhibits agency is a notoriously difficult question: Dennett (1989), for instance, highlights the puzzle of determining which principles can decide whether a rock, a thermostat, or a robot each possess agency. We here address this puzzle from the viewpoint of reinforcement learning by arguing that agency is fundamentally frame-dependent: Any measurement of a system's agency must be made relative to a reference frame. We support this claim by presenting a philosophical argument that each of the essential properties of agency proposed by Barandiaran et al. (2009) and Moreno (2018) are themselves frame-dependent. We conclude that any basic science of agency requires frame-dependence, and discuss the implications of this claim for reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2502.04403

Country:

North America > United States (0.14)
North America > Canada > Alberta (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)

Add feedback

Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic Learning

Ishfaq, Haque, Wang, Guangyuan, Islam, Sami Nur, Precup, Doina

arXiv.org Artificial IntelligenceJan-29-2025

Existing actor-critic algorithms, which are popular for continuous control reinforcement learning (RL) tasks, suffer from poor sample efficiency due to lack of principled exploration mechanism within them. Motivated by the success of Thompson sampling for efficient exploration in RL, we propose a novel model-free RL algorithm, Langevin Soft Actor Critic (LSAC), which prioritizes enhancing critic learning through uncertainty estimation over policy optimization. LSAC employs three key innovations: approximate Thompson sampling through distributional Langevin Monte Carlo (LMC) based $Q$ updates, parallel tempering for exploring multiple modes of the posterior of the $Q$ function, and diffusion synthesized state-action samples regularized with $Q$ action gradients. Our extensive experiments demonstrate that LSAC outperforms or matches the performance of mainstream model-free RL algorithms for continuous control tasks. Notably, LSAC marks the first successful application of an LMC based Thompson sampling in continuous control tasks with continuous action spaces.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2501.17827

Country: Asia (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Fairness in Reinforcement Learning with Bisimulation Metrics

Rezaei-Shoshtari, Sahand, Yurchyk, Hanna, Fujimoto, Scott, Precup, Doina, Meger, David

arXiv.org Artificial IntelligenceDec-31-2024

Ensuring long-term fairness is crucial when developing automated decision making systems, specifically in dynamic and sequential environments. By maximizing their reward without consideration of fairness, AI agents can introduce disparities in their treatment of groups or individuals. In this paper, we establish the connection between bisimulation metrics and group fairness in reinforcement learning. We propose a novel approach that leverages bisimulation metrics to learn reward functions and observation dynamics, ensuring that learners treat groups fairly while reflecting the original problem. We demonstrate the effectiveness of our method in addressing disparities in sequential decision making problems through empirical evaluation on a standard fairness benchmark consisting of lending and college admission scenarios. As machine learning continues to shape decision making systems, understanding and addressing its potential risks and biases becomes increasingly imperative. This concern is especially pronounced in sequential decision making, where neglecting algorithmic fairness can create a self-reinforcing cycle that amplifies existing disparities (Jabbari et al., 2017; D'Amour et al., 2020). In response, there is a growing recognition of the importance of leveraging reinforcement learning (RL) to tackle decision making problems that have traditionally been approached through supervised learning paradigms, in order to achieve long-term fairness (Nashed et al., 2023). Yin et al. (2023) define long-term fairness in RL as the optimization of the cumulative reward subject to a constraint on the cumulative utility, reflecting fairness over a time horizon. Recent efforts to achieve fairness in RL have primarily relied on metrics adopted from supervised learning, such as demographic parity (Dwork et al., 2012) or equality of opportunity (Hardt et al., 2016b). These metrics are typically integrated into a constrained Markov decision process (MDP) framework to learn a policy that adheres to the criterion (Wen et al., 2021; Yin et al., 2023; Satija et al., 2023; Hu & Zhang, 2022). However, this approach is limited by its requirement for complex constrained optimization, which can introduce additional complexity and hyperparameters into the underlying RL algorithm. Moreover, these methods make the implicit assumption that stakeholders are incorporating these fairness constraints into their decision making process. However, in reality, this may not occur due to various external and uncontrollable factors (Kusner & Loftus, 2020).

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2412.17123

Country: North America > Canada > Quebec (0.14)

Genre: Research Report > Promising Solution (0.48)

Industry:

Education > Educational Setting > Higher Education (0.34)
Banking & Finance > Credit (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Mitigating Downstream Model Risks via Model Provenance

Wang, Keyu, Iranzad, Abdullah Norozi, Schaffter, Scott, Risdal, Meg, Precup, Doina, Lebensold, Jonathan

arXiv.org Artificial IntelligenceDec-13-2024

Research and industry are rapidly advancing the innovation and adoption of foundation model-based systems, yet the tools for managing these models have not kept pace. Understanding the provenance and lineage of models is critical for researchers, industry, regulators, and public trust. While model cards and system cards were designed to provide transparency, they fall short in key areas: tracing model genealogy, enabling machine readability, offering reliable centralized management systems, and fostering consistent creation incentives. This challenge mirrors issues in software supply chain security, but AI/ML remains at an earlier stage of maturity. Addressing these gaps requires industry-standard tooling that can be adopted by foundation model publishers, open-source model innovators, and major distribution platforms. We propose a machine-readable model specification format to simplify the creation of model records, thereby reducing error-prone human effort, notably when a new model inherits most of its design from a foundation model. Our solution explicitly traces relationships between upstream and downstream models, enhancing transparency and traceability across the model lifecycle. To facilitate the adoption, we introduce the unified model record (UMR) repository , a semantically versioned system that automates the publication of model records to multiple formats (PDF, HTML, LaTeX) and provides a hosted web interface (https://modelrecord.com/). This proof of concept aims to set a new standard for managing foundation models, bridging the gap between innovation and responsible model management.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2410.0223

Country: North America > Canada (0.29)

Genre: Research Report (0.64)

Industry:

Law (1.00)
Health & Medicine (1.00)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Communications > Social Media (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)

Add feedback

MaestroMotif: Skill Design from Artificial Intelligence Feedback

Klissarov, Martin, Henaff, Mikael, Raileanu, Roberta, Sodhani, Shagun, Vincent, Pascal, Zhang, Amy, Bacon, Pierre-Luc, Precup, Doina, Machado, Marlos C., D'Oro, Pierluca

arXiv.org Artificial IntelligenceDec-11-2024

Describing skills in natural language has the potential to provide an accessible way to inject human knowledge about decision-making into an AI system. We present MaestroMotif, a method for AI-assisted skill design, which yields high-performing and adaptable agents. MaestroMotif leverages the capabilities of Large Language Models (LLMs) to effectively create and reuse skills. It first uses an LLM's feedback to automatically design rewards corresponding to each skill, starting from their natural language description. Then, it employs an LLM's code generation abilities, together with reinforcement learning, for training the skills and combining them to implement complex behaviors specified in language. We evaluate MaestroMotif using a suite of complex tasks in the NetHack Learning Environment (NLE), demonstrating that it surpasses existing approaches in both performance and usability.

large language model, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2412.08542

Country:

Europe (1.00)
North America > Canada (0.67)
North America > United States > Massachusetts (0.28)

Genre: Research Report (0.82)

Industry:

Education (0.88)
Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Parseval Regularization for Continual Reinforcement Learning

Chung, Wesley, Cherif, Lynn, Meger, David, Precup, Doina

arXiv.org Artificial IntelligenceDec-10-2024

Loss of plasticity, trainability loss, and primacy bias have been identified as issues arising when training deep neural networks on sequences of tasks -- all referring to the increased difficulty in training on new tasks. We propose to use Parseval regularization, which maintains orthogonality of weight matrices, to preserve useful optimization properties and improve training in a continual reinforcement learning setting. We show that it provides significant benefits to RL agents on a suite of gridworld, CARL and MetaWorld tasks. We conduct comprehensive ablations to identify the source of its benefits and investigate the effect of certain metrics associated to network trainability including weight matrix rank, weight norms and policy entropy.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2412.07224

Country: North America (0.28)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Identifying and Addressing Delusions for Target-Directed Decision-Making

Zhao, Mingde, Sylvain, Tristan, Precup, Doina, Bengio, Yoshua

arXiv.org Artificial IntelligenceNov-18-2024

Target-directed agents utilize self-generated targets, to guide their behaviors for better generalization. These agents are prone to blindly chasing problematic targets, resulting in worse generalization and safety catastrophes. We show that these behaviors can be results of delusions, stemming from improper designs around training: the agent may naturally come to hold false beliefs about certain targets. We identify delusions via intuitive examples in controlled environments, and investigate their causes and mitigations. With the insights, we demonstrate how we can make agents address delusions preemptively and autonomously. We validate empirically the effectiveness of the proposed strategies in correcting delusional behaviors and improving out-of-distribution generalization.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2410.07096

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Reaction-conditioned De Novo Enzyme Design with GENzyme

Hua, Chenqing, Lu, Jiarui, Liu, Yong, Zhang, Odin, Tang, Jian, Ying, Rex, Jin, Wengong, Wolf, Guy, Precup, Doina, Zheng, Shuangjia

arXiv.org Artificial IntelligenceNov-9-2024

The introduction of models like RFDiffusionAA, AlphaFold3, AlphaProteo, and Chai1 has revolutionized protein structure modeling and interaction prediction, primarily from a binding perspective, focusing on creating ideal lock-and-key models. However, these methods can fall short for enzyme-substrate interactions, where perfect binding models are rare, and induced fit states are more common. To address this, we shift to a functional perspective for enzyme design, where the enzyme function is defined by the reaction it catalyzes. Here, we introduce \textsc{GENzyme}, a \textit{de novo} enzyme design model that takes a catalytic reaction as input and generates the catalytic pocket, full enzyme structure, and enzyme-substrate binding complex. \textsc{GENzyme} is an end-to-end, three-staged model that integrates (1) a catalytic pocket generation and sequence co-design module, (2) a pocket inpainting and enzyme inverse folding module, and (3) a binding and screening module to optimize and predict enzyme-substrate complexes. The entire design process is driven by the catalytic reaction being targeted. This reaction-first approach allows for more accurate and biologically relevant enzyme design, potentially surpassing structure-based and binding-focused models in creating enzymes capable of catalyzing specific reactions. We provide \textsc{GENzyme} code at https://github.com/WillHua127/GENzyme.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2411.16694

Country: North America (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

Soft Condorcet Optimization for Ranking of General Agents

Lanctot, Marc, Larson, Kate, Kaisers, Michael, Berthet, Quentin, Gemp, Ian, Diaz, Manfred, Maura-Rivero, Roberto-Rafael, Bachrach, Yoram, Koop, Anna, Precup, Doina

arXiv.org Artificial IntelligenceNov-4-2024

A common way to drive progress of AI models and agents is to compare their performance on standardized benchmarks. Comparing the performance of general agents requires aggregating their individual performances across a potentially wide variety of different tasks. In this paper, we describe a novel ranking scheme inspired by social choice frameworks, called Soft Condorcet Optimization (SCO), to compute the optimal ranking of agents: the one that makes the fewest mistakes in predicting the agent comparisons in the evaluation data. This optimal ranking is the maximum likelihood estimate when evaluation data (which we view as votes) are interpreted as noisy samples from a ground truth ranking, a solution to Condorcet's original voting system criteria. SCO ratings are maximal for Condorcet winners when they exist, which we show is not necessarily true for the classical rating system Elo. We propose three optimization algorithms to compute SCO ratings and evaluate their empirical performance. When serving as an approximation to the Kemeny-Young voting method, SCO rankings are on average 0 to 0.043 away from the optimal ranking in normalized Kendall-tau distance across 865 preference profiles from the PrefLib open ranking archive. In a simulated noisy tournament setting, SCO achieves accurate approximations to the ground truth ranking and the best among several baselines when 59\% or more of the preference data is missing. Finally, SCO ranking provides the best approximation to the optimal ranking, measured on held-out test sets, in a problem containing 52,958 human players across 31,049 games of the classic seven-player game of Diplomacy.

machine learning, natural language, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2411.00119

Country:

North America > United States > New York (0.14)
North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Leisure & Entertainment > Games > Chess (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(5 more...)

Add feedback