AITopics

2409.18735

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > Poland > Lesser Poland Province > Kraków (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Promising Solution (0.46)

Industry:

Energy (0.89)
Information Technology > Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Chen, Mingyu, Pacchiano, Aldo, Zhang, Xuezhou

State-free Reinforcement Learning

arXiv.org Artificial IntelligenceSep-27-2024

In this work, we study the \textit{state-free RL} problem, where the algorithm does not have the states information before interacting with the environment. Specifically, denote the reachable state set by ${S}^\Pi := \{ s|\max_{\pi\in \Pi}q^{P, \pi}(s)>0 \}$, we design an algorithm which requires no information on the state space $S$ while having a regret that is completely independent of ${S}$ and only depend on ${S}^\Pi$. We view this as a concrete first step towards \textit{parameter-free RL}, with the goal of designing RL algorithms that require no hyper-parameter tuning.

algorithm, neural information processing system, suffice, (13 more...)

2409.18439

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Olivares, David, Fournier, Pierre, Vasishta, Pavan, Marzat, Julien

Model-Free versus Model-Based Reinforcement Learning for Fixed-Wing UAV Attitude Control Under Varying Wind Conditions

This paper evaluates and compares the performance of model-free and model-based reinforcement learning for the attitude control of fixed-wing unmanned aerial vehicles using PID as a reference point. The comparison focuses on their ability to handle varying flight dynamics and wind disturbances in a simulated environment. Our results show that the Temporal Difference Model Predictive Control agent outperforms both the PID controller and other model-free reinforcement learning methods in terms of tracking accuracy and robustness over different reference difficulties, particularly in nonlinear flight regimes. Furthermore, we introduce actuation fluctuation as a key metric to assess energy efficiency and actuator wear, and we test two different approaches from the literature: action variation penalty and conditioning for action policy smoothness. We also evaluate all control methods when subject to stochastic turbulence and gusts separately, so as to measure their effects on tracking performance, observe their limitations and outline their implications on the Markov decision process formalism.

controller, machine learning, reinforcement learning, (14 more...)

2409.17896

Country: Europe (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Aerospace & Defense > Aircraft (1.00)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)

Tonti, Federica, Rabault, Jean, Vinuesa, Ricardo

Navigation in a simplified Urban Flow through Deep Reinforcement Learning

The increasing number of unmanned aerial vehicles (UAVs) in urban environments requires a strategy to minimize their environmental impact, both in terms of energy efficiency and noise reduction. In order to reduce these concerns, novel strategies for developing prediction models and optimization of flight planning, for instance through deep reinforcement learning (DRL), are needed. Our goal is to develop DRL algorithms capable of enabling the autonomous navigation of UAVs in urban environments, taking into account the presence of buildings and other UAVs, optimizing the trajectories in order to reduce both energetic consumption and noise. This is achieved using fluid-flow simulations which represent the environment in which UAVs navigate and training the UAV as an agent interacting with an urban environment. In this work, we consider a domain domain represented by a two-dimensional flow field with obstacles, ideally representing buildings, extracted from a three-dimensional high-fidelity numerical simulation. The presented methodology, using PPO+LSTM cells, was validated by reproducing a simple but fundamental problem in navigation, namely the Zermelo's problem, which deals with a vessel navigating in a turbulent flow, travelling from a starting point to a target location, optimizing the trajectory. The current method shows a significant improvement with respect to both a simple PPO and a TD3 algorithm, with a success rate (SR) of the PPO+LSTM trained policy of 98.7%, and a crash rate (CR) of 0.1%, outperforming both PPO (SR = 75.6%, CR=18.6%) and TD3 (SR=77.4% and CR=14.5%). This is the first step towards DRL strategies which will guide UAVs in a three-dimensional flow field using real-time signals, making the navigation efficient in terms of flight time and avoiding damages to the vehicle.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2409.17922

Country:

Europe > Sweden (0.14)
Asia > China (0.14)
Europe > Norway (0.14)
Asia > Middle East (0.14)

Genre: Research Report (1.00)

Industry:

Transportation (1.00)
Energy > Oil & Gas > Upstream (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Learning Occlusion-aware Decision-making from Agent Interaction via Active Perception

Jia, Jie, Shu, Yiming, Gan, Zhongxue, Ding, Wenchao

Occlusion-aware decision-making is essential in autonomous driving due to the high uncertainty of various occlusions. Recent occlusion-aware decision-making methods encounter issues such as high computational complexity, scenario scalability challenges, or reliance on limited expert data. Benefiting from automatically generating data by exploration randomization, we uncover that reinforcement learning (RL) may show promise in occlusion-aware decision-making. However, previous occlusion-aware RL faces challenges in expanding to various dynamic and static occlusion scenarios, low learning efficiency, and lack of predictive ability. To address these issues, we introduce Pad-AI, a self-reinforcing framework to learn occlusion-aware decision-making through active perception. Pad-AI utilizes vectorized representation to represent occluded environments efficiently and learns over the semantic motion primitives to focus on high-level active perception exploration. Furthermore, Pad-AI integrates prediction and RL within a unified framework to provide risk-aware learning and security guarantees. Our framework was tested in challenging scenarios under both dynamic and static occlusions and demonstrated efficient and general perception-aware exploration performance to other strong baselines in closed-loop evaluations.

artificial intelligence, machine learning, occlusion, (16 more...)

2409.17618

Country: Asia > China > Hong Kong (0.04)

Genre: Research Report (0.82)

Industry:

Transportation > Ground > Road (0.68)
Automobiles & Trucks (0.67)
Information Technology > Robotics & Automation (0.49)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

BoT-Drive: Hierarchical Behavior and Trajectory Planning for Autonomous Driving using POMDPs

Jin, Xuanjin, Zeng, Chendong, Zhu, Shengfa, Liu, Chunxiao, Cai, Panpan

Uncertainties in dynamic road environments pose significant challenges for behavior and trajectory planning in autonomous driving. This paper introduces BoT-Drive, a planning algorithm that addresses uncertainties at both behavior and trajectory levels within a Partially Observable Markov Decision Process (POMDP) framework. BoT-Drive employs driver models to characterize unknown behavioral intentions and utilizes their model parameters to infer hidden driving styles. By also treating driver models as decision-making actions for the autonomous vehicle, BoT-Drive effectively tackles the exponential complexity inherent in POMDPs. To enhance safety and robustness, the planner further applies importance sampling to refine the driving trajectory conditioned on the planned high-level behavior. Evaluation on real-world data shows that BoT-Drive consistently outperforms both existing planning methods and learning-based methods in regular and complex urban driving scenes, demonstrating significant improvements in driving safety and reliability.

artificial intelligence, machine learning, trajectory, (17 more...)

2409.18411

Country:

Asia > China > Shanghai > Shanghai (0.05)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

The Sandbox Environment for Generalizable Agent Research (SEGAR)

Hjelm, R Devon, Mazoure, Bogdan, Golemo, Florian, Kahou, Samira Ebrahimi, Braga, Pedro, Frujeri, Felipe, Jalobeanu, Mihai, Kolobov, Andrey

A broad challenge of research on generalization for sequential decision-making tasks in interactive environments is designing benchmarks that clearly landmark progress. While there has been notable headway, current benchmarks either do not provide suitable exposure nor intuitive control of the underlying factors, are not easy-to-implement, customizable, or extensible, or are computationally expensive to run. We built the Sandbox Environment for Generalizable Agent Research (SEGAR) with all of these things in mind. SEGAR improves the ease and accountability of generalization research in RL, as generalization objectives can be easy designed by specifying task distributions, which in turns allows the researcher to measure the nature of the generalization objective. We present an overview of SEGAR and how it contributes to these goals, as well as experiments that demonstrate a few types of research questions SEGAR can help answer.

agent, factor type, segar, (14 more...)

2203.10351

Country:

Europe > Sweden > Skåne County > Malmö (0.04)
Europe > Portugal > Braga > Braga (0.04)
Asia (0.04)

Genre:

Research Report > Experimental Study (0.66)
Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

LoopSR: Looping Sim-and-Real for Lifelong Policy Adaptation of Legged Robots

Wu, Peilin, Xie, Weiji, Cao, Jiahang, Lai, Hang, Zhang, Weinan

Reinforcement Learning (RL) has shown its remarkable and generalizable capability in legged locomotion through sim-to-real transfer. However, while adaptive methods like domain randomization are expected to make policy more robust to diverse environments, such comprehensiveness potentially detracts from the policy's performance in any specific environment according to the No Free Lunch theorem, leading to a suboptimal solution once deployed in the real world. To address this issue, we propose a lifelong policy adaptation framework named LoopSR, which utilizes a transformer-based encoder to project real-world trajectories into a latent space, and accordingly reconstruct the real-world environments back in simulation for further improvement. Autoencoder architecture and contrastive learning methods are adopted to better extract the characteristics of real-world dynamics. The simulation parameters for continual training are derived by combining predicted parameters from the decoder with retrieved parameters from the simulation trajectory dataset. By leveraging the continual training, LoopSR achieves superior data efficiency compared with strong baselines, with only a limited amount of data to yield eminent performance in both sim-to-sim and sim-to-real experiments.

loopsr, simulation, trajectory, (15 more...)

2409.17992

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.50)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Hierarchical End-to-End Autonomous Driving: Integrating BEV Perception with Deep Reinforcement Learning

Lu, Siyi, He, Lei, Li, Shengbo Eben, Luo, Yugong, Wang, Jianqiang, Li, Keqiang

End-to-end autonomous driving offers a streamlined alternative to the traditional modular pipeline, integrating perception, prediction, and planning within a single framework. While Deep Reinforcement Learning (DRL) has recently gained traction in this domain, existing approaches often overlook the critical connection between feature extraction of DRL and perception. In this paper, we bridge this gap by mapping the DRL feature extraction network directly to the perception phase, enabling clearer interpretation through semantic segmentation. By leveraging Bird's-Eye-View (BEV) representations, we propose a novel DRL-based end-to-end driving framework that utilizes multi-sensor inputs to construct a unified three-dimensional understanding of the environment. This BEV-based system extracts and translates critical environmental features into high-level abstract states for DRL, facilitating more informed control. Extensive experimental evaluations demonstrate that our approach not only enhances interpretability but also significantly outperforms state-of-the-art methods in autonomous driving control tasks, reducing the collision rate by 20%.

autonomous driving, reinforcement, vehicle, (12 more...)

2409.17659

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)

Genre: Research Report (0.84)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.94)

arXiv.org Artificial IntelligenceSep-25-2024

Topological Foundations of Reinforcement Learning

Kadurha, David Krame

The goal of this work is to serve as a foundation for deep studies of the topology of state, action, and policy spaces in reinforcement learning. By studying these spaces from a mathematical perspective, we expect to gain more insight into how to build better algorithms to solve decision problems. Therefore, we focus on presenting the connection between the Banach fixed point theorem and the convergence of reinforcement learning algorithms, and we illustrate how the insights gained from this can practically help in designing more efficient algorithms. Before doing so, however, we first introduce relevant concepts such as metric spaces, normed spaces and Banach spaces for better understanding, before expressing the entire reinforcement learning problem in terms of Markov decision processes. This allows us to properly introduce the Banach contraction principle in a language suitable for reinforcement learning, and to write the Bellman equations in terms of operators on Banach spaces to show why reinforcement learning algorithms converge. Finally, we show how the insights gained from the mathematical study of convergence are helpful in reasoning about the best ways to make reinforcement learning algorithms more efficient.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2410.03706

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Africa > Cameroon (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre:

Workflow (0.92)
Research Report (0.81)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)