AITopics

Recent years have seen significant advancements in humanoid control, largely due to the availability of large-scale motion capture data and the application of reinforcement learning methodologies. However, many real-world tasks, such as moving large and heavy furniture, require multi-character collaboration. Given the scarcity of data on multi-character collaboration and the efficiency challenges associated with multi-agent learning, these tasks cannot be straightforwardly addressed using training paradigms designed for single-agent scenarios. In this paper, we introduce Cooperative Human-Object Interaction (CooHOI), a novel framework that addresses multi-character objects transporting through a two-phase learning paradigm: individual skill acquisition and subsequent transfer. Initially, a single agent learns to perform tasks using the Adversarial Motion Priors (AMP) framework. Following this, the agent learns to collaborate with others by considering the shared dynamics of the manipulated object during parallel training using Multi-Agent Proximal Policy Optimization (MAPPO). When one agent interacts with the object, resulting in specific object dynamics changes, the other agents learn to respond appropriately, thereby achieving implicit communication and coordination between teammates. Unlike previous approaches that relied on tracking-based methods for multi-character HOI, CooHOI is inherently efficient, does not depend on motion capture data of multi-character interactions, and can be seamlessly extended to include more participants and a wide range of object types.

agent, arxiv preprint arxiv, interaction, (14 more...)

2406.14558

Country:

North America > United States > New Jersey > Middlesex County > New Brunswick (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Autonomous Decision Making for Air Taxi Networks

Vesel, Alex

Future urban air mobility systems are expected to be operated by rideshare companies as fleets, which will require fully autonomous air traffic control systems and an order of magnitude increase in airspace capacity. Such a system must not only be safe, but also highly responsive to customer demand. This paper proposes the air traffic network problem (ATNP), which models the optimization problem of future cooperative air taxi networks. We propose a three-phase decision making model that efficiently assigns vehicles to passengers, determines flight levels to reduce collision risk, and resolves aircraft conflicts by selectively applying Monte Carlo tree search. We develop a simulator for the ATNP and show that our approach has increased safety and reduced passenger waiting time compared to greedy and first-dispatch protocols over potential vertiport layouts across the Bay Area and New York City.

agent, passenger, vertiport, (16 more...)

2406.14832

Country:

North America > United States > New York (0.25)
North America > United States > California > San Francisco County > San Francisco (0.04)
Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry:

Transportation > Passenger (1.00)
Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
Transportation > Air (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.46)

arXiv.org Machine LearningJun-20-2024

Latent Variable Sequence Identification for Cognitive Models with Neural Bayes Estimation

Pan, Ti-Fen, Li, Jing-Jing, Thompson, Bill, Collins, Anne

Extracting time-varying latent variables from computational cognitive models is a key step in model-based neural analysis, which aims to understand the neural correlates of cognitive processes. However, existing methods only allow researchers to infer latent variables that explain subjects' behavior in a relatively small class of cognitive models. For example, a broad class of relevant cognitive models with analytically intractable likelihood is currently out of reach from standard techniques, based on Maximum a Posteriori parameter estimation. Here, we present an approach that extends neural Bayes estimation to learn a direct mapping between experimental data and the targeted latent variable space using recurrent neural networks and simulated datasets. We show that our approach achieves competitive performance in inferring latent variable sequences in both tractable and intractable models. Furthermore, the approach is generalizable across different computational models and is adaptable for both continuous and discrete latent spaces. We then demonstrate its applicability in real world datasets. Our work underscores that combining recurrent neural networks and simulation-based inference to identify latent variable sequences can enable researchers to access a wider class of cognitive models for model-based neural analyses, and thus test a broader set of theories.

estimator, lasenet, latent variable, (17 more...)

arXiv.org Machine Learning

2406.14742

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > New York (0.04)
Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.46)
Health & Medicine > Health Care Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
(3 more...)

Olivos-Castillo, Itzel, Schrater, Paul, Pitkow, Xaq

Control when confidence is costly

We develop a version of stochastic control that accounts for computational costs of inference. Past studies identified efficient coding without control, or efficient control that neglects the cost of synthesizing information. Here we combine these concepts into a framework where agents rationally approximate inference for efficient control. Specifically, we study Linear Quadratic Gaussian (LQG) control with an added internal cost on the relative precision of the posterior probability over the world state. This creates a trade-off: an agent can obtain more utility overall by sacrificing some task performance, if doing so saves enough bits during inference. We discover that the rational strategy that solves the joint inference and control problem goes through phase transitions depending on the task demands, switching from a costly but optimal inference to a family of suboptimal inferences related by rotation transformations, each misestimate the stability of the world. In all cases, the agent moves more to think less. This work provides a foundation for a new type of rational computations that could be used by both brains and machines for efficient but computationally constrained control.

agent, inference, information, (16 more...)

2406.14427

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Texas > Harris County > Houston (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)

Delecki, Harrison, Schlichting, Marc R., Arief, Mansur, Corso, Anthony, Vazquez-Chanlatte, Marcell, Kochenderfer, Mykel J.

Diffusion-Based Failure Sampling for Cyber-Physical Systems

Validating safety-critical autonomous systems in high-dimensional domains such as robotics presents a significant challenge. Existing black-box approaches based on Markov chain Monte Carlo may require an enormous number of samples, while methods based on importance sampling often rely on simple parametric families that may struggle to represent the distribution over failures. We propose to sample the distribution over failures using a conditional denoising diffusion model, which has shown success in complex high-dimensional problems such as robotic task planning. We iteratively train a diffusion model to produce state trajectories closer to failure. We demonstrate the effectiveness of our approach on high-dimensional robotic validation tasks, improving sample efficiency and mode coverage compared to existing black-box techniques.

diffusion model, failure distribution, trajectory, (16 more...)

2406.14761

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (1.00)

Industry: Transportation > Air (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Towards Global Optimality for Practical Average Reward Reinforcement Learning without Mixing Time Oracles

Patel, Bhrij, Suttle, Wesley A., Koppel, Alec, Aggarwal, Vaneet, Sadler, Brian M., Bedi, Amrit Singh, Manocha, Dinesh

In the context of average-reward reinforcement learning, the requirement for oracle knowledge of the mixing time, a measure of the duration a Markov chain under a fixed policy needs to achieve its stationary distribution, poses a significant challenge for the global convergence of policy gradient methods. This requirement is particularly problematic due to the difficulty and expense of estimating mixing time in environments with large state spaces, leading to the necessity of impractically long trajectories for effective gradient estimation in practical applications. To address this limitation, we consider the Multi-level Actor-Critic (MAC) framework, which incorporates a Multi-level Monte-Carlo (MLMC) gradient estimator. With our approach, we effectively alleviate the dependency on mixing time knowledge, a first for average-reward MDPs global convergence. Furthermore, our approach exhibits the tightest available dependence of $\mathcal{O}\left( \sqrt{\tau_{mix}} \right)$known from prior work. With a 2D grid world goal-reaching navigation experiment, we demonstrate that MAC outperforms the existing state-of-the-art policy gradient-based method for average reward settings.

algorithm, convergence, dependence, (15 more...)

2403.11925

Country:

North America > United States > Maryland > Prince George's County > College Park (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Hiremath, Shruthi K., Ploetz, Thomas

Maintenance Required: Updating and Extending Bootstrapped Human Activity Recognition Systems for Smart Homes

Developing human activity recognition (HAR) systems for smart homes is not straightforward due to varied layouts of the homes and their personalized settings, as well as idiosyncratic behaviors of residents. As such, off-the-shelf HAR systems are effective in limited capacity for an individual home, and HAR systems often need to be derived "from scratch", which comes with substantial efforts and often is burdensome to the resident. Previous work has successfully targeted the initial phase. At the end of this initial phase, we identify seed points. We build on bootstrapped HAR systems and introduce an effective updating and extension procedure for continuous improvement of HAR systems with the aim of keeping up with ever changing life circumstances. Our method makes use of the seed points identified at the end of the initial bootstrapping phase. A contrastive learning framework is trained using these seed points and labels obtained for the same. This model is then used to improve the segmentation accuracy of the identified prominent activities. Improvements in the activity recognition system through this procedure help model the majority of the routine activities in the smart home. We demonstrate the effectiveness of our procedure through experiments on the CASAS datasets that show the practical value of our approach.

procedure, resident, smart home, (16 more...)

2406.14446

Country:

North America > Aruba (0.05)
Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.05)
North America > United States (0.04)
(5 more...)

Genre: Research Report (0.50)

Industry:

Information Technology > Smart Houses & Appliances (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Internet of Things (0.95)
Information Technology > Data Science > Data Mining (0.94)
(3 more...)

Chen, Weiqin, Squillante, Mark S., Wu, Chai Wah, Paternain, Santiago

A General Control-Theoretic Approach for Reinforcement Learning: Theory and Algorithms

For many years now, reinforcement learning (RL) has succeeded in solving a wide variety of decision-making problems and control for robotics [1, 2, 3, 4, 5]. Generally speaking, modelfree methods [6, 7] often suffer from high sample complexity that can require an inordinate amount of samples, making them unsuitable for robotic applications where collecting large amounts of data is time-consuming, costly and potentially dangerous for the system and its surroundings [8, 9, 10, 11, 12]. On the other hand, model-based RL methods have been successful in demonstrating significantly reduced sample complexity and in outperforming model-free approaches for various decision making under uncertainty problems (see, e.g., [13, 14]). However, such modelbased approaches can suffer from the difficulty of learning an appropriate model and from worse asymptotic performance than model-free approaches due to model bias from inherently assuming the learned system dynamics model accurately represents the true system environment (see, e.g., [15, 16, 17]). In this paper we propose a novel form of RL that seeks to directly learn an optimal control policy for a general underlying (unknown) dynamical system and to directly apply the corresponding learned optimal control policy within the dynamical system. This general approach is in strong contrast to many traditional model-based RL methods that, after learning the system dynamics model which is often of high complexity and dimensionality, then use this system dynamics model to compute an approximate solution of a corresponding (stochastic) dynamic programming problem, often applying model predictive control (see, e.g., [18]). Our control-based RL (CBRL) approach instead directly learns the unknown parameters that derive, through control-theoretic means, an optimal control policy function from a family of control policy functions, often of much lower complexity and dimensionality, from which the optimal control policy is directly obtained. The theoretical foundation and analysis of our CRBL approach is presented within the context of a general Markov decision process (MDP) framework that extends the family of policies associated with the classical Bellman operator to a family of control-policy functions mapping a vector of (unknown) parameters from a corresponding parameter set to a control policy which is optimal under those parameters, and that extends the domain of these control policies from a single state to span across all (or a large subset of) states, with the (unknown) parameter vector encoding global and local information that needs to be learned. Within the context of this MDP framework and our general CBRL approach, we establish theoretical results on convergence and optimality with respect to (w.r.t.) a CBRL contraction operator, analogous to the Bellman operator.

cbrl approach, control policy, unknown parameter, (16 more...)

2406.14753

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Huang, Kaixuan, Guo, Xudong, Wang, Mengdi

SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths

Speculative decoding reduces the inference latency of a target large language model via utilizing a smaller and faster draft model. Its performance depends on a hyperparameter K -- the candidate length, i.e., the number of candidate tokens for the target model to verify in each round. However, previous methods often use simple heuristics to choose K, which may result in sub-optimal performance. We study the choice of the candidate length K and formulate it as a Markov Decision Process. We theoretically show that the optimal policy of this Markov decision process takes the form of a threshold policy, i.e., the current speculation should stop and be verified when the probability of getting a rejection exceeds a threshold value. Motivated by this theory, we propose SpecDec++, an enhanced version of speculative decoding that adaptively determines the candidate length on the fly. We augment the draft model with a trained acceptance prediction head to predict the conditional acceptance probability of the candidate tokens. SpecDec++ will stop the current speculation when the predicted probability that at least one token gets rejected exceeds a threshold. We implement SpecDec++ and apply it to the llama-2-chat 7B & 70B model pair. Our adaptive method achieves a 2.04x speedup on the Alpaca dataset (an additional 7.2% improvement over the baseline speculative decoding). On the GSM8K and HumanEval datasets, our method achieves a 2.26x speedup (9.4% improvement) and 2.23x speedup (11.1% improvement), respectively.

arxiv preprint arxiv, prefix, target model, (13 more...)

2405.19715

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Ontario > Toronto (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.54)

Eshwar, S. R., Felipe, Lucas Lopes, Reiffers-Masson, Alexandre, Menasché, Daniel Sadoc, Thoppe, Gugan

Online Learning of Weakly Coupled MDP Policies for Load Balancing and Auto Scaling

Load balancing and auto scaling are at the core of scalable, contemporary systems, addressing dynamic resource allocation and service rate adjustments in response to workload changes. This paper introduces a novel model and algorithms for tuning load balancers coupled with auto scalers, considering bursty traffic arriving at finite queues. We begin by presenting the problem as a weakly coupled Markov Decision Processes (MDP), solvable via a linear program (LP). However, as the number of control variables of such LP grows combinatorially, we introduce a more tractable relaxed LP formulation, and extend it to tackle the problem of online parameter learning and policy optimization using a two-timescale algorithm based on the LP Lagrangian.

constraint, probability, queue, (14 more...)

2406.14141

Country:

South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Pays de la Loire (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry:

Energy > Power Industry (0.64)
Education > Educational Setting > Online (0.41)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)