AITopics | Meger, David

Collaborating Authors

Meger, David

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Tractable Representations for Convergent Approximation of Distributional HJB Equations

Alhosh, Julie, Wiltzer, Harley, Meger, David

arXiv.org Machine LearningMar-7-2025

In reinforcement learning (RL), the long-term behavior of decision-making policies is evaluated based on their average returns. Distributional RL has emerged, presenting techniques for learning return distributions, which provide additional statistics for evaluating policies, incorporating risk-sensitive considerations. When the passage of time cannot naturally be divided into discrete time increments, researchers have studied the continuous-time RL (CTRL) problem, where agent states and decisions evolve continuously. In this setting, the Hamilton-Jacobi-Bellman (HJB) equation is well established as the characterization of the expected return, and many solution methods exist. However, the study of distributional RL in the continuous-time setting is in its infancy. Recent work has established a distributional HJB (DHJB) equation, providing the first characterization of return distributions in CTRL. These equations and their solutions are intractable to solve and represent exactly, requiring novel approximation techniques. This work takes strides towards this end, establishing conditions on the method of parameterizing return distributions under which the DHJB equation can be approximately solved. Particularly, we show that under a certain topological property of the mapping between statistics learned by a distributional RL algorithm and corresponding distributions, approximation of these statistics leads to close approximations of the solution of the DHJB equation. Concretely, we demonstrate that the quantile representation common in distributional RL satisfies this topological property, certifying an efficient approximation algorithm for continuous-time distributional RL.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

2503.05563

Country: North America > Canada > Quebec > Montreal (0.30)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)

Add feedback

Fairness in Reinforcement Learning with Bisimulation Metrics

Rezaei-Shoshtari, Sahand, Yurchyk, Hanna, Fujimoto, Scott, Precup, Doina, Meger, David

arXiv.org Artificial IntelligenceDec-31-2024

Ensuring long-term fairness is crucial when developing automated decision making systems, specifically in dynamic and sequential environments. By maximizing their reward without consideration of fairness, AI agents can introduce disparities in their treatment of groups or individuals. In this paper, we establish the connection between bisimulation metrics and group fairness in reinforcement learning. We propose a novel approach that leverages bisimulation metrics to learn reward functions and observation dynamics, ensuring that learners treat groups fairly while reflecting the original problem. We demonstrate the effectiveness of our method in addressing disparities in sequential decision making problems through empirical evaluation on a standard fairness benchmark consisting of lending and college admission scenarios. As machine learning continues to shape decision making systems, understanding and addressing its potential risks and biases becomes increasingly imperative. This concern is especially pronounced in sequential decision making, where neglecting algorithmic fairness can create a self-reinforcing cycle that amplifies existing disparities (Jabbari et al., 2017; D'Amour et al., 2020). In response, there is a growing recognition of the importance of leveraging reinforcement learning (RL) to tackle decision making problems that have traditionally been approached through supervised learning paradigms, in order to achieve long-term fairness (Nashed et al., 2023). Yin et al. (2023) define long-term fairness in RL as the optimization of the cumulative reward subject to a constraint on the cumulative utility, reflecting fairness over a time horizon. Recent efforts to achieve fairness in RL have primarily relied on metrics adopted from supervised learning, such as demographic parity (Dwork et al., 2012) or equality of opportunity (Hardt et al., 2016b). These metrics are typically integrated into a constrained Markov decision process (MDP) framework to learn a policy that adheres to the criterion (Wen et al., 2021; Yin et al., 2023; Satija et al., 2023; Hu & Zhang, 2022). However, this approach is limited by its requirement for complex constrained optimization, which can introduce additional complexity and hyperparameters into the underlying RL algorithm. Moreover, these methods make the implicit assumption that stakeholders are incorporating these fairness constraints into their decision making process. However, in reality, this may not occur due to various external and uncontrollable factors (Kusner & Loftus, 2020).

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2412.17123

Country: North America > Canada (0.28)

Genre: Research Report > Promising Solution (0.48)

Industry:

Education > Educational Setting > Higher Education (0.34)
Banking & Finance > Credit (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Parseval Regularization for Continual Reinforcement Learning

Chung, Wesley, Cherif, Lynn, Meger, David, Precup, Doina

arXiv.org Artificial IntelligenceDec-10-2024

Loss of plasticity, trainability loss, and primacy bias have been identified as issues arising when training deep neural networks on sequences of tasks -- all referring to the increased difficulty in training on new tasks. We propose to use Parseval regularization, which maintains orthogonality of weight matrices, to preserve useful optimization properties and improve training in a continual reinforcement learning setting. We show that it provides significant benefits to RL agents on a suite of gridworld, CARL and MetaWorld tasks. We conduct comprehensive ablations to identify the source of its benefits and investigate the effect of certain metrics associated to network trainability including weight matrix rank, weight norms and policy entropy.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2412.07224

Country: North America (0.28)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Action Gaps and Advantages in Continuous-Time Distributional Reinforcement Learning

Wiltzer, Harley, Bellemare, Marc G., Meger, David, Shafto, Patrick, Jhaveri, Yash

arXiv.org Machine LearningOct-14-2024

When decisions are made at high frequency, traditional reinforcement learning (RL) methods struggle to accurately estimate action values. In turn, their performance is inconsistent and often poor. Whether the performance of distributional RL (DRL) agents suffers similarly, however, is unknown. In this work, we establish that DRL agents are sensitive to the decision frequency. We prove that action-conditioned return distributions collapse to their underlying policy's return distribution as the decision frequency increases. We quantify the rate of collapse of these return distributions and exhibit that their statistics collapse at different rates. Moreover, we define distributional perspectives on action gaps and advantages. In particular, we introduce the superiority as a probabilistic generalization of the advantage -- the core object of approaches to mitigating performance issues in high-frequency value-based RL. In addition, we build a superiority-based DRL algorithm. Through simulations in an option-trading domain, we validate that proper modeling of the superiority distribution produces improved controllers at high decision frequencies.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

2410.11022

Country: North America > Canada > Quebec (0.14)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Topological mapping for traversability-aware long-range navigation in off-road terrain

Tremblay, Jean-François, Alhosh, Julie, Petit, Louis, Lotfi, Faraz, Landauro, Lara, Meger, David

arXiv.org Artificial IntelligenceOct-2-2024

Autonomous robots navigating in off-road terrain like forests open new opportunities for automation. While off-road navigation has been studied, existing work often relies on clearly delineated pathways. We present a method allowing for long-range planning, exploration and low-level control in unknown off-trail forest terrain, using vision and GPS only. We represent outdoor terrain with a topological map, which is a set of panoramic snapshots connected with edges containing traversability information. A novel traversability analysis method is demonstrated, predicting the existence of a safe path towards a target in an image. Navigating between nodes is done using goal-conditioned behavior cloning, leveraging the power of a pretrained vision transformer. An exploration planner is presented, efficiently covering an unknown off-road area with unknown traversability using a frontiers-based approach. The approach is successfully deployed to autonomously explore two 400 meters squared forest sites unseen during training, in difficult conditions for navigation.

artificial intelligence, machine learning, robot, (18 more...)

arXiv.org Artificial Intelligence

2410.01925

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Shedding Light on Large Generative Networks: Estimating Epistemic Uncertainty in Diffusion Models

Berry, Lucas, Brando, Axel, Meger, David

arXiv.org Artificial IntelligenceJun-5-2024

Generative diffusion models, notable for their large parameter count (exceeding 100 million) and operation within high-dimensional image spaces, pose significant challenges for traditional uncertainty estimation methods due to computational demands. In this work, we introduce an innovative framework, Diffusion Ensembles for Capturing Uncertainty (DECU), designed for estimating epistemic uncertainty for diffusion models. The DECU framework introduces a novel method that efficiently trains ensembles of conditional diffusion models by incorporating a static set of pre-trained parameters, drastically reducing the computational burden and the number of parameters that require training. Additionally, DECU employs Pairwise-Distance Estimators (PaiDEs) to accurately measure epistemic uncertainty by evaluating the mutual information between model outputs and weights in high-dimensional spaces. The effectiveness of this framework is demonstrated through experiments on the ImageNet dataset, highlighting its capability to capture epistemic uncertainty, specifically in under-sampled image classes.

artificial intelligence, epistemic uncertainty, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2406.1858

Country:

North America > Canada > Quebec (0.29)
Europe > Estonia > Järva County > Paide (0.27)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry: Transportation (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.97)

Add feedback

Constrained Robotic Navigation on Preferred Terrains Using LLMs and Speech Instruction: Exploiting the Power of Adverbs

Lotfi, Faraz, Faraji, Farnoosh, Kakodkar, Nikhil, Manderson, Travis, Meger, David, Dudek, Gregory

arXiv.org Artificial IntelligenceApr-2-2024

This paper explores leveraging large language models for map-free off-road navigation using generative AI, reducing the need for traditional data collection and annotation. We propose a method where a robot receives verbal instructions, converted to text through Whisper, and a large language model (LLM) model extracts landmarks, preferred terrains, and crucial adverbs translated into speed settings for constrained navigation. A language-driven semantic segmentation model generates text-based masks for identifying landmarks and terrain types in images. By translating 2D image points to the vehicle's motion plane using camera parameters, an MPC controller can guides the vehicle towards the desired terrain. This approach enhances adaptation to diverse environments and facilitates the use of high-level instructions for navigating complex and challenging terrains. Keywords: Constrained map-free navigation, large language models, languagedriven semantic segmentation, preferred terrains, speech instruction, adverbs.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2404.02294

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Energy > Oil & Gas (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Learning active tactile perception through belief-space control

Tremblay, Jean-François, Meger, David, Hogan, Francois, Dudek, Gregory

arXiv.org Artificial IntelligenceNov-30-2023

Robots operating in an open world will encounter novel objects with unknown physical properties, such as mass, friction, or size. These robots will need to sense these properties through interaction prior to performing downstream tasks with the objects. We propose a method that autonomously learns tactile exploration policies by developing a generative world model that is leveraged to 1) estimate the object's physical parameters using a differentiable Bayesian filtering algorithm and 2) develop an exploration policy using an information-gathering model predictive controller. We evaluate our method on three simulated tasks where the goal is to estimate a desired object property (mass, height or toppling height) through physical interaction. We find that our method is able to discover policies that efficiently gather information about the desired property in an intuitive manner. Finally, we validate our method on a real robot system for the height estimation task, where our method is able to successfully learn and execute an information-gathering policy from scratch.

artificial intelligence, estimation, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2312.00215

Country:

North America > Canada > Quebec > Montreal (0.14)
Asia > Middle East > Qatar (0.14)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Generalizable Imitation Learning Through Pre-Trained Representations

Chang, Wei-Di, Hogan, Francois, Meger, David, Dudek, Gregory

arXiv.org Artificial IntelligenceNov-15-2023

In this paper we leverage self-supervised vision transformer models and their emergent semantic abilities to improve the generalization abilities of imitation learning policies. We introduce BC-ViT, an imitation learning algorithm that leverages rich DINO pre-trained Visual Transformer (ViT) patch-level embeddings to obtain better generalization when learning through demonstrations. Our learner sees the world by clustering appearance features into semantic concepts, forming stable keypoints that generalize across a wide range of appearance variations and object types. We show that this representation enables generalized behaviour by evaluating imitation learning across a diverse dataset of object manipulation tasks. Our method, data and evaluation approach are made available to facilitate further study of generalization in Imitation Learners.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2311.0935

Country:

North America > Canada (0.14)
Oceania > New Zealand (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)

Add feedback

For SALE: State-Action Representation Learning for Deep Reinforcement Learning

Fujimoto, Scott, Chang, Wei-Di, Smith, Edward J., Gu, Shixiang Shane, Precup, Doina, Meger, David

arXiv.org Machine LearningNov-5-2023

In the field of reinforcement learning (RL), representation learning is a proven tool for complex image-based tasks, but is often overlooked for environments with low-level states, such as physical control problems. This paper introduces SALE, a novel approach for learning embeddings that model the nuanced interaction between state and action, enabling effective representation learning from low-level states. We extensively study the design space of these embeddings and highlight important design considerations. We integrate SALE and an adaptation of checkpoints for RL into TD3 to form the TD7 algorithm, which significantly outperforms existing continuous control algorithms. On OpenAI gym benchmark tasks, TD7 has an average performance gain of 276.7% and 50.7% over TD3 at 300k and 5M time steps, respectively, and works in both the online and offline settings.

machine learning, reinforcement learning, time step, (16 more...)

arXiv.org Machine Learning

2306.02451

Country: North America > Canada (0.28)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback