AITopics

2411.04246

Country: Europe > Netherlands > South Holland > Delft (0.04)

Genre: Research Report > New Finding (0.66)

Industry:

Transportation > Air (1.00)
Leisure & Entertainment > Sports (1.00)
Information Technology > Robotics & Automation (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Artificial IntelligenceNov-6-2024

ET-SEED: Efficient Trajectory-Level SE(3) Equivariant Diffusion Policy

Tie, Chenrui, Chen, Yue, Wu, Ruihai, Dong, Boxuan, Li, Zeyi, Gao, Chongkai, Dong, Hao

Imitation learning, e.g., diffusion policy, has been proven effective in various robotic manipulation tasks. However, extensive demonstrations are required for policy robustness and generalization. To reduce the demonstration reliance, we leverage spatial symmetry and propose ET-SEED, an efficient trajectory-level SE(3) equivariant diffusion model for generating action sequences in complex robot manipulation tasks. Further, previous equivariant diffusion models require the per-step equivariance in the Markov process, making it difficult to learn policy under such strong constraints. We theoretically extend equivariant Markov kernels and simplify the condition of equivariant diffusion process, thereby significantly improving training efficiency for trajectory-level SE(3) equivariant diffusion policy in an end-to-end manner. We evaluate ET-SEED on representative robotic manipulation tasks, involving rigid body, articulated and deformable object. Experiments demonstrate superior data efficiency and manipulation proficiency of our proposed method, as well as its ability to generalize to unseen configurations with only a few demonstrations. Website: https://et-seed.github.io/

diffusion process, equivariant, trajectory, (15 more...)

2411.0399

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > Singapore (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Słupiński, Mikołaj, Lipiński, Piotr

Bayesian Inference in Recurrent Explicit Duration Switching Linear Dynamical Systems

arXiv.org Machine LearningNov-6-2024

In this paper, we propose a novel model called Recurrent Explicit Duration Switching Linear Dynamical Systems (REDSLDS) that incorporates recurrent explicit duration variables into the rSLDS model. We also propose an inference and learning scheme that involves the use of P\'olya-gamma augmentation. We demonstrate the improved segmentation capabilities of our model on three benchmark datasets, including two quantitative datasets and one qualitative dataset.

dataset, redsld, rsld, (13 more...)

arXiv.org Machine Learning

2411.0428

Country:

North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(5 more...)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Słupiński, Mikołaj, Lipiński, Piotr

The Recurrent Sticky Hierarchical Dirichlet Process Hidden Markov Model

arXiv.org Machine LearningNov-6-2024

The Hierarchical Dirichlet Process Hidden Markov Model (HDP-HMM) is a natural Bayesian nonparametric extension of the classical Hidden Markov Model for learning from (spatio-)temporal data. A sticky HDP-HMM has been proposed to strengthen the self-persistence probability in the HDP-HMM. Then, disentangled sticky HDP-HMM has been proposed to disentangle the strength of the self-persistence prior and transition prior. However, the sticky HDP-HMM assumes that the self-persistence probability is stationary, limiting its expressiveness. Here, we build on previous work on sticky HDP-HMM and disentangled sticky HDP-HMM, developing a more general model: the recurrent sticky HDP-HMM (RS-HDP-HMM). We develop a novel Gibbs sampling strategy for efficient inference in this model. We show that RS-HDP-HMM outperforms disentangled sticky HDP-HMM, sticky HDP-HMM, and HDP-HMM in both synthetic and real data segmentation.

hdp-hmm, probability, sticky hdp-hmm, (13 more...)

arXiv.org Machine Learning

2411.04278

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts (0.04)
(6 more...)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Machine LearningNov-6-2024

Hybrid Transfer Reinforcement Learning: Provable Sample Efficiency from Shifted-Dynamics Data

Qu, Chengrui, Shi, Laixi, Panaganti, Kishan, You, Pengcheng, Wierman, Adam

Online Reinforcement learning (RL) typically requires high-stakes online interaction data to learn a policy for a target task. This prompts interest in leveraging historical data to improve sample efficiency. The historical data may come from outdated or related source environments with different dynamics. It remains unclear how to effectively use such data in the target task to provably enhance learning and sample efficiency. To address this, we propose a hybrid transfer RL (HTRL) setting, where an agent learns in a target environment while accessing offline data from a source environment with shifted dynamics. We show that -- without information on the dynamics shift -- general shifted-dynamics data, even with subtle shifts, does not reduce sample complexity in the target environment. However, with prior information on the degree of the dynamics shift, we design HySRL, a transfer algorithm that achieves problem-dependent sample complexity and outperforms pure online RL. Finally, our experimental results demonstrate that HySRL surpasses state-of-the-art online RL baseline.

proceedings, sample complexity, tar, (11 more...)

arXiv.org Machine Learning

2411.0381

Country:

North America > United States > California (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)
(2 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Transportation > Ground > Road (0.45)
Automobiles & Trucks (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Zbeeb, Mohammad, Ghorayeb, Mohammad, Salman, Mariam

Exploring the Landscape for Generative Sequence Models for Specialized Data Synthesis

arXiv.org Artificial IntelligenceNov-6-2024

Artificial Intelligence (AI) research often aims to develop models that can generalize reliably across complex datasets, yet this remains challenging in fields where data is scarce, intricate, or inaccessible. This paper introduces a novel approach that leverages three generative models of varying complexity to synthesize one of the most demanding structured datasets: Malicious Network Traffic. Our approach uniquely transforms numerical data into text, re-framing data generation as a language modeling task, which not only enhances data regularization but also significantly improves generalization and the quality of the synthetic data. Extensive statistical analyses demonstrate that our method surpasses state-of-the-art generative models in producing high-fidelity synthetic data. Additionally, we conduct a comprehensive study on synthetic data applications, effectiveness, and evaluation strategies, offering valuable insights into its role across various domains. Our code and pre-trained models are openly accessible at Github, enabling further exploration and application of our methodology. Index Terms: Data synthesis, machine learning, traffic generation, privacy preserving data, generative models.

data mining, machine learning, natural language, (21 more...)

doi: 10.33140/JDAEDM

2411.01929

Country: North America > United States (0.28)

Genre:

Overview (1.00)
Research Report > New Finding (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Kumar, Rajul, Bhatti, Adam, Yao, Ningshi

Can Robotic Cues Manipulate Human Decisions? Exploring Consensus Building via Bias-Controlled Non-linear Opinion Dynamics and Robotic Eye Gaze Mediated Interaction in Human-Robot Teaming

Although robots are becoming more advanced with human-like anthropomorphic features and decision-making abilities to improve collaboration, the active integration of humans into this process remains under-explored. This article presents the first experimental study exploring decision-making interactions between humans and robots with visual cues from robotic eyes, which can dynamically influence human opinion formation. The cues generated by robotic eyes gradually guide human decisions towards alignment with the robot's choices. Both human and robot decision-making processes are modeled as non-linear opinion dynamics with evolving biases. To examine these opinion dynamics under varying biases, we conduct numerical parametric and equilibrium continuation analyses using tuned parameters designed explicitly for the presented human-robot interaction experiment. Furthermore, to facilitate the transition from disagreement to agreement, we introduced a human opinion observation algorithm integrated with the formation of the robot's opinion, where the robot's behavior is controlled based on its formed opinion. The algorithms developed aim to enhance human involvement in consensus building, fostering effective collaboration between humans and robots. Experiments with 51 participants (N = 51) show that human-robot teamwork can be improved by guiding human decisions using robotic cues. Finally, we provide detailed insights on the effects of trust, cognitive load, and participant demographics on decision-making based on user feedback and post-experiment interviews.

interaction, participant, robot, (8 more...)

2411.03581

Country:

North America > United States > Virginia > Fairfax County > Fairfax (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Virginia > Fairfax County > McLean (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Leisure & Entertainment (0.67)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.92)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.89)
(2 more...)

Pre-trained Visual Dynamics Representations for Efficient Policy Learning

Luo, Hao, Zhou, Bohan, Lu, Zongqing

Pre-training for Reinforcement Learning (RL) with purely video data is a valuable yet challenging problem. Although in-the-wild videos are readily available and inhere a vast amount of prior world knowledge, the absence of action annotations and the common domain gap with downstream tasks hinder utilizing videos for RL pre-training. To address the challenge of pre-training with videos, we propose Pre-trained Visual Dynamics Representations (PVDR) to bridge the domain gap between videos and downstream tasks for efficient policy learning. By adopting video prediction as a pre-training task, we use a Transformer-based Conditional Variational Autoencoder (CVAE) to learn visual dynamics representations. The pre-trained visual dynamics representations capture the visual dynamics prior knowledge in the videos. This abstract prior knowledge can be readily adapted to downstream tasks and aligned with executable actions through online adaptation. We conduct experiments on a series of robotics visual control tasks and verify that PVDR is an effective form for pre-training with videos to promote policy learning.

pvdr, representation, video, (10 more...)

2411.03169

Country:

Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre:

Research Report (0.64)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Giral, Francisco, Gómez, Ignacio, Vinuesa, Ricardo, Le-Clainche, Soledad

Transformer-Based Fault-Tolerant Control for Fixed-Wing UAVs Using Knowledge Distillation and In-Context Adaptation

Abstract-- This study presents a transformer-based approach for fault-tolerant control in fixed-wing Unmanned Aerial Vehicles (UAVs), designed to adapt in real time to dynamic changes caused by structural damage or actuator failures. Employing a teacher-student knowledge distillation framework, the proposed approach trains a student agent with partial observations by transferring knowledge from a privileged expert agent with full observability, enabling robust performance across diverse failure scenarios. In recent years, Unmanned Aerial Vehicles (UAVs) have been widely used to perform various applications in complex However, complex environments and demanding tasks can and critical scenarios, such as search and rescue or cause structural damage to the UAV, altering its aerodynamic autonomous medical transportation. Fixed-wing UAVs, in particular, and reliability of these aerial robots have become major exhibit highly complex, nonlinear dynamics, which can concerns due to the potential implications of system failures. Unlike other robotics fields, such as manipulation and Although current FCSs are robust, they struggle to maintain humanoid locomotion, where advanced control methods are performance when the vehicle dynamics deviate from the essential for managing complex joint movements, UAV original design specifications, sometimes leading to control Flight Control Systems (FCSs) in industry typically rely divergence and catastrophic failure.

agent, scenario, uav, (16 more...)

2411.02975

Country:

Europe > Spain > Galicia > Madrid (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (1.00)

Industry:

Aerospace & Defense > Aircraft (0.68)
Transportation > Air (0.46)
Education > Educational Technology > Educational Software (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Milosevic, Nikola, Müller, Johannes, Scherf, Nico

Embedding Safety into RL: A New Take on Trust Region Methods

Reinforcement Learning (RL) agents are able to solve a wide variety of tasks but are prone to producing unsafe behaviors. Constrained Markov Decision Processes (CMDPs) provide a popular framework for incorporating safety constraints. However, common solution methods often compromise reward maximization by being overly conservative or allow unsafe behavior during training. We propose Constrained Trust Region Policy Optimization (C-TRPO), a novel approach that modifies the geometry of the policy space based on the safety constraints and yields trust regions composed exclusively of safe policies, ensuring constraint satisfaction throughout training. We theoretically study the convergence and update properties of C-TRPO and highlight connections to TRPO, Natural Policy Gradient (NPG), and Constrained Policy Optimization (CPO). Finally, we demonstrate experimentally that C-TRPO significantly reduces constraint violations while achieving competitive reward maximization compared to state-of-theart CMDP algorithms. Reinforcement Learning (RL) has emerged as a highly successful paradigm in machine learning for solving sequential decision and control problems, with policy gradient (PG) algorithms as a popular approach (Williams, 1992; Sutton et al., 1999; Konda & Tsitsiklis, 1999).

algorithm, c-trpo, divergence, (14 more...)

2411.02957

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Germany > Saxony > Leipzig (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)