AITopics | Fleming, Cody

Collaborating Authors

Fleming, Cody

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

HopCast: Calibration of Autoregressive Dynamics Models

Shahid, Muhammad Bilal, Fleming, Cody

arXiv.org Artificial IntelligenceJan-27-2025

Deep learning models are often trained to approximate dynamical systems that can be modeled using differential equations. These models are optimized to predict one step ahead and produce calibrated predictions if the predictive model can quantify uncertainty, such as deep ensembles. At inference time, multi-step predictions are generated via autoregression, which needs a sound uncertainty propagation method (e.g., Trajectory Sampling) to produce calibrated multi-step predictions. This paper introduces an approach named HopCast that uses the Modern Hopfield Network (MHN) to learn the residuals of a deterministic model that approximates the dynamical system. The MHN predicts the density of residuals based on a context vector at any timestep during autoregression. This approach produces calibrated multi-step predictions without uncertainty propagation and turns a deterministic model into a calibrated probabilistic model. This work is also the first to benchmark existing uncertainty propagation methods based on calibration errors with deep ensembles for multi-step predictions.

artificial intelligence, machine learning, prediction, (18 more...)

arXiv.org Artificial Intelligence

2501.16587

Country: North America > United States > Iowa (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

FAWAC: Feasibility Informed Advantage Weighted Regression for Persistent Safety in Offline Reinforcement Learning

Koirala, Prajwal, Jiang, Zhanhong, Sarkar, Soumik, Fleming, Cody

arXiv.org Artificial IntelligenceDec-11-2024

Safe offline reinforcement learning aims to learn policies that maximize cumulative rewards while adhering to safety constraints, using only offline data for training. A key challenge is balancing safety and performance, particularly when the policy encounters out-of-distribution (OOD) states and actions, which can lead to safety violations or overly conservative behavior during deployment. To address these challenges, we introduce Feasibility Informed Advantage Weighted Actor-Critic (FAWAC), a method that prioritizes persistent safety in constrained Markov decision processes (CMDPs). FAWAC formulates policy optimization with feasibility conditions derived specifically for offline datasets, enabling safe policy updates in non-parametric policy space, followed by projection into parametric space for constrained actor training. By incorporating a cost-advantage term into Advantage Weighted Regression (AWR), FAWAC ensures that the safety constraints are respected while maximizing performance. Additionally, we propose a strategy to address a more challenging class of problems that involves tempting datasets where trajectories are predominantly high-rewarded but unsafe. Empirical evaluations on standard benchmarks demonstrate that FAWAC achieves strong results, effectively balancing safety and performance in learning policies from the static datasets.

constraint, machine learning, reinforcement learning, (11 more...)

arXiv.org Artificial Intelligence

2412.0888

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Latent Safety-Constrained Policy Approach for Safe Offline Reinforcement Learning

Koirala, Prajwal, Jiang, Zhanhong, Sarkar, Soumik, Fleming, Cody

arXiv.org Machine LearningDec-11-2024

In safe offline reinforcement learning (RL), the objective is to develop a policy that maximizes cumulative rewards while strictly adhering to safety constraints, utilizing only offline data. Traditional methods often face difficulties in balancing these constraints, leading to either diminished performance or increased safety risks. We address these issues with a novel approach that begins by learning a conservatively safe policy through the use of Conditional Variational Autoencoders, which model the latent safety constraints. Subsequently, we frame this as a Constrained Reward-Return Maximization problem, wherein the policy aims to optimize rewards while complying with the inferred latent safety constraints. This is achieved by training an encoder with a reward-Advantage Weighted Regression objective within the latent constraint space. Our methodology is supported by theoretical analysis, including bounds on policy performance and sample complexity. Extensive empirical evaluation on benchmark datasets, including challenging autonomous driving scenarios, demonstrates that our approach not only maintains safety compliance but also excels in cumulative reward optimization, surpassing existing methods. Additional visualizations provide further insights into the effectiveness and underlying mechanisms of our approach. Although Reinforcement learning (RL) is a popular approach for decision-making and control applications across various domains, its deployment in industrial contexts is limited by safety concerns during the training phase. In traditional online RL, agents learn optimal policies through trial and error, interacting with their environments to maximize cumulative rewards. This process inherently involves exploration, which can lead to the agent encountering unsafe states and/or taking unsafe actions, posing substantial risks in industrial applications such as autonomous driving, robotics, and manufacturing systems (Garcıa & Fernández, 2015; Gu et al., 2022; Moldovan & Abbeel, 2012; Shen et al., 2014; Yang et al., 2020). The primary challenge lies in ensuring that the agent's learning process does not compromise safety, as failures during training can result in costly damages, operational disruptions, or even endanger human lives (Achiam et al., 2017; Stooke et al., 2020). To address these challenges, researchers have explored several approaches aimed at minimizing safety risks while maintaining the efficacy of RL algorithms. One effective method to mitigate safety risks associated with training an agent is offline RL. This dataset comprises trajectory rollouts generated by an arbitrary behavior policy or multiple policies, collected beforehand.

constraint, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

2412.08794

Country: North America > United States > Iowa (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation > Ground > Road (0.54)
Information Technology (0.54)
Automobiles & Trucks (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

ShieldNN: A Provably Safe NN Filter for Unsafe NN Controllers

Ferlez, James, Elnaggar, Mahmoud, Shoukry, Yasser, Fleming, Cody

arXiv.org Artificial IntelligenceOct-14-2024

In this paper, we develop a novel closed-form Control Barrier Function (CBF) and associated controller shield for the Kinematic Bicycle Model (KBM) with respect to obstacle avoidance. The proposed CBF and shield -- designed by an algorithm we call ShieldNN -- provide two crucial advantages over existing methodologies. First, ShieldNN considers steering and velocity constraints directly with the non-affine KBM dynamics; this is in contrast to more general methods, which typically consider only affine dynamics and do not guarantee invariance properties under control constraints. Second, ShieldNN provides a closed-form set of safe controls for each state unlike more general methods, which typically rely on optimization algorithms to generate a single instantaneous for each state. Together, these advantages make ShieldNN uniquely suited as an efficient Multi-Obstacle Safe Actions (i.e. multiple-barrier-function shielding) during training time of a Reinforcement Learning (RL) enabled NN controller. We show via experiments that ShieldNN dramatically increases the completion rate of RL training episodes in the presence of multiple obstacles, thus establishing the value of ShieldNN in training RL-based controllers.

machine learning, obstacle, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2006.09564

Country: North America > United States > California (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Add feedback

Towards Robust Car Following Dynamics Modeling via Blackbox Models: Methodology, Analysis, and Recommendations

Shahid, Muhammad Bilal, Fleming, Cody

arXiv.org Artificial IntelligenceFeb-11-2024

The selection of the target variable is important while learning parameters of the classical car following models like GIPPS, IDM, etc. There is a vast body of literature on which target variable is optimal for classical car following models, but there is no study that empirically evaluates the selection of optimal target variables for black-box models, such as LSTM, etc. The black-box models, like LSTM and Gaussian Process (GP) are increasingly being used to model car following behavior without wise selection of target variables. The current work tests different target variables, like acceleration, velocity, and headway, for three black-box models, i.e., GP, LSTM, and Kernel Ridge Regression. These models have different objective functions and work in different vector spaces, e.g., GP works in function space, and LSTM works in parameter space. The experiments show that the optimal target variable recommendations for black-box models differ from classical car following models depending on the objective function and the vector space. It is worth mentioning that models and datasets used during evaluation are diverse in nature: the datasets contained both automated and human-driven vehicle trajectories; the black-box models belong to both parametric and non-parametric classes of models. This diversity is important during the analysis of variance, wherein we try to find the interaction between datasets, models, and target variables. It is shown that the models and target variables interact and recommended target variables don't depend on the dataset under consideration.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2402.07139

Country: North America > United States > Iowa (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Transportation (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reframing Offline Reinforcement Learning as a Regression Problem

Koirala, Prajwal, Fleming, Cody

arXiv.org Artificial IntelligenceJan-21-2024

The study proposes the reformulation of offline reinforcement learning as a regression problem that can be solved with decision trees. Aiming to predict actions based on input states, return-to-go (RTG), and timestep information, we observe that with gradient-boosted trees, the agent training and inference are very fast, the former taking less than a minute. Despite the simplification inherent in this reformulated problem, our agent demonstrates performance that is at least on par with established methods. This assertion is validated by testing it across standard datasets associated with D4RL Gym-MuJoCo tasks. We further discuss the agent's ability to generalize by testing it on two extreme cases, how it learns to model the return distributions effectively even with highly skewed expert datasets, and how it exhibits robust performance in scenarios with sparse/delayed rewards.

machine learning, reinforcement, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2401.1163

Country: North America > United States > Iowa (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)

Add feedback