AITopics

2404.15696

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Transportation > Passenger (0.71)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

arXiv.org Artificial IntelligenceMay-11-2024

Multi-agent Traffic Prediction via Denoised Endpoint Distribution

Liu, Yao, Wang, Ruoyu, Cao, Yuanjiang, Sheng, Quan Z., Yao, Lina

The exploration of high-speed movement by robots or road traffic agents is crucial for autonomous driving and navigation. Trajectory prediction at high speeds requires considering historical features and interactions with surrounding entities, a complexity not as pronounced in lower-speed environments. Prior methods have assessed the spatio-temporal dynamics of agents but often neglected intrinsic intent and uncertainty, thereby limiting their effectiveness. We present the Denoised Endpoint Distribution model for trajectory prediction, which distinctively models agents' spatio-temporal features alongside their intrinsic intentions and uncertainties. By employing Diffusion and Transformer models to focus on agent endpoints rather than entire trajectories, our approach significantly reduces model complexity and enhances performance through endpoint information. Our experiments on open datasets, coupled with comparison and ablation studies, demonstrate our model's efficacy and the importance of its components. This approach advances trajectory prediction in high-speed scenarios and lays groundwork for future developments.

agent, prediction, trajectory, (16 more...)

2405.07041

Country: Oceania > Australia > New South Wales > Sydney (0.04)

Genre: Research Report (0.82)

Industry:

Automobiles & Trucks (0.67)
Transportation > Ground > Road (0.48)
Information Technology > Robotics & Automation (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)
(2 more...)

arXiv.org Artificial IntelligenceMay-11-2024

Group-Aware Coordination Graph for Multi-Agent Reinforcement Learning

Duan, Wei, Lu, Jie, Xuan, Junyu

Cooperative Multi-Agent Reinforcement Learning (MARL) necessitates seamless collaboration among agents, often represented by an underlying relation graph. Existing methods for learning this graph primarily focus on agent-pair relations, neglecting higher-order relationships. While several approaches attempt to extend cooperation modelling to encompass behaviour similarities within groups, they commonly fall short in concurrently learning the latent graph, thereby constraining the information exchange among partially observed agents. To overcome these limitations, we present a novel approach to infer the Group-Aware Coordination Graph (GACG), which is designed to capture both the cooperation between agent pairs based on current observations and group-level dependencies from behaviour patterns observed across trajectories. This graph is further used in graph convolution for information exchange between agents during decision-making. To further ensure behavioural consistency among agents within the same group, we introduce a group distance loss, which promotes group cohesion and encourages specialization between groups. Our evaluations, conducted on StarCraft II micromanagement tasks, demonstrate GACG's superior performance. An ablation study further provides experimental evidence of the effectiveness of each component of our method.

agent, graph, international conference, (14 more...)

2404.10976

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Ruíz-Guirola, David E., López, Onel L. A., Montejo-Sánchez, Samuel, Mayorga, Israel Leyva, Han, Zhu, Popovski, Petar

Intelligent Duty Cycling Management and Wake-up for Energy Harvesting IoT Networks with Correlated Activity

arXiv.org Artificial IntelligenceMay-10-2024

This paper presents an approach for energy-neutral Internet of Things (IoT) scenarios where the IoT devices (IoTDs) rely entirely on their energy harvesting capabilities to sustain operation. We use a Markov chain to represent the operation and transmission states of the IoTDs, a modulated Poisson process to model their energy harvesting process, and a discrete-time Markov chain to model their battery state. The aim is to efficiently manage the duty cycling of the IoTDs, so as to prolong their battery life and reduce instances of low-energy availability. We propose a duty-cycling management based on K- nearest neighbors, aiming to strike a trade-off between energy efficiency and detection accuracy. This is done by incorporating spatial and temporal correlations among IoTDs' activity, as well as their energy harvesting capabilities. We also allow the base station to wake up specific IoTDs if more information about an event is needed upon initial detection. Our proposed scheme shows significant improvements in energy savings and performance, with up to 11 times lower misdetection probability and 50\% lower energy consumption for high-density scenarios compared to a random duty cycling benchmark.

information, iotd, probability, (13 more...)

2405.06372

Country:

North America > United States > Texas > Harris County > Houston (0.14)
Europe > Finland > Northern Ostrobothnia > Oulu (0.05)
Europe > Denmark > North Jutland > Aalborg (0.05)
(4 more...)

Genre: Research Report (1.00)

Industry: Energy > Energy Storage (1.00)

Technology:

Information Technology > Internet of Things (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

As, Yarden, Sukhija, Bhavya, Krause, Andreas

Safe Exploration Using Bayesian World Models and Log-Barrier Optimization

A major challenge in deploying reinforcement learning in online tasks is ensuring that safety is maintained throughout the learning process. In this work, we propose CERL, a new method for solving constrained Markov decision processes while keeping the policy safe during learning. Our method leverages Bayesian world models and suggests policies that are pessimistic w.r.t. the model's epistemic uncertainty. This makes CERL robust towards model inaccuracies and leads to safe exploration during learning. In our experiments, we demonstrate that CERL outperforms the current state-of-the-art in terms of safety and optimality in solving CMDPs from image observations.

andreas krause, exploration, optimization, (13 more...)

2405.0589

Country:

Europe > Switzerland > Zürich > Zürich (0.05)
North America > United States > Wisconsin > Dane County > Madison (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report > New Finding (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Semi-Autonomous Laparoscopic Robot Docking with Learned Hand-Eye Information Fusion

Tian, Huanyu, Huber, Martin, Mower, Christopher E., Han, Zhe, Li, Changsheng, Duan, Xingguang, Bergeles, Christos

In this study, we introduce a novel shared-control system for key-hole docking operations, combining a commercial camera with occlusion-robust pose estimation and a hand-eye information fusion technique. This system is used to enhance docking precision and force-compliance safety. To train a hand-eye information fusion network model, we generated a self-supervised dataset using this docking system. After training, our pose estimation method showed improved accuracy compared to traditional methods, including observation-only approaches, hand-eye calibration, and conventional state estimation filters. In real-world phantom experiments, our approach demonstrated its effectiveness with reduced position dispersion (1.23\pm 0.81 mm vs. 2.47 \pm 1.22 mm) and force dispersion (0.78\pm 0.57 N vs. 1.15 \pm 0.97 N) compared to the control group. These advancements in semi-autonomy co-manipulation scenarios enhance interaction and stability. The study presents an anti-interference, steady, and precision solution with potential applications extending beyond laparoscopic surgery to other minimally invasive procedures.

pose estimation, robot, trocar, (15 more...)

2405.05817

Country:

Asia > China > Beijing > Beijing (0.05)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Spain > Galicia > Madrid (0.04)
Europe > Germany (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Surgery (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Koops, Wietze, Junges, Sebastian, Jansen, Nils

Approximate Dec-POMDP Solving Using Multi-Agent A*

We present an A*-based algorithm to compute policies for finite-horizon Dec-POMDPs. Our goal is to sacrifice optimality in favor of scalability for larger horizons. The main ingredients of our approach are (1) using clustered sliding window memory, (2) pruning the A* search tree, and (3) using novel A* heuristics. Our experiments show competitive performance to the state-of-the-art. Moreover, for multiple benchmarks, we achieve superior performance. In addition, we provide an A* algorithm that finds upper bounds for the optimum, tailored towards problems with long horizons. The main ingredient is a new heuristic that periodically reveals the state, thereby limiting the number of reachable beliefs. Our experiments demonstrate the efficacy and scalability of the approach.

algorithm, dec-pomdp, window memory, (16 more...)

2405.05662

Country:

Europe > Slovenia (0.04)
Europe > Netherlands > Gelderland > Nijmegen (0.04)
Europe > Germany (0.04)

Genre: Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Cui, Christopher Z., Peng, Xiangyu, Riedl, Mark O.

A Mixture-of-Experts Approach to Few-Shot Task Transfer in Open-Ended Text Worlds

Open-ended worlds are those in which there are no pre-specified goals or environmental reward signal. As a consequence, an agent must know how to perform a multitude of tasks. However, when a new task is presented to an agent, we expect it to be able to reuse some of what it knows from previous tasks to rapidly learn that new task. We introduce a novel technique whereby policies for different a priori known tasks are combined into a Mixture-of-Experts model with an attention mechanism across a mix of frozen and unfrozen experts. The model learns when to attend to frozen task-specific experts when appropriate and learns new experts to handle novel situations. We work in an open-ended text-based environment in which the agent is tasked with behaving like different types of character roles and must rapidly learn behaviors associated with new character role types. We show that our agent both obtains more rewards in the zero-shot setting, and discovers these rewards with greater sample efficiency in the few-shot learning settings.

agent, moe agent, target role, (13 more...)

2405.06059

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Singapore (0.04)

Genre: Research Report > Promising Solution (0.66)

Industry: Leisure & Entertainment > Games > Computer Games (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
(2 more...)

Gupta, Shashank, Jeunen, Olivier, Oosterhuis, Harrie, de Rijke, Maarten

Optimal Baseline Corrections for Off-Policy Contextual Bandits

Additive control variates give rise to baseline corrections [16], regression adjustments [15], and doubly robust The off-policy learning paradigm allows for recommender systems estimators [13]. Multiplicative control variates lead to selfnormalised and general ranking applications to be framed as decision-making estimators [32, 59]. Previous work has proven that for problems, where we aim to learn decision policies that optimize off-policy learning tasks, the multiplicative control variates can an unbiased offline estimate of an online reward metric. With unbiasedness be re-framed using an equivalent additive variate [6, 30], enabling comes potentially high variance, and prevalent methods mini-batch optimization methods to be used. We note that the exist to reduce estimation variance. These methods typically make self-normalised estimator is only asymptotically unbiased: a clear use of control variates, either additive (i.e., baseline corrections or disadvantage for evaluation with finite samples. The common problem doubly robust methods) or multiplicative (i.e., self-normalisation). which most existing methods tackle is that of variance reduction Our work unifies these approaches by proposing a single framework in offline value estimation, either for learning or for evaluation. The built on their equivalence in learning scenarios. The foundation common solution is the application of a control variate, either multiplicative of our framework is the derivation of an equivalent baseline or additive [42].

estimator, proceedings, variance, (14 more...)

2405.05736

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > New York > New York County > New York City (0.04)
(6 more...)

Genre:

Research Report > Experimental Study (0.69)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Artificial IntelligenceMay-8-2024

Markowitz Meets Bellman: Knowledge-distilled Reinforcement Learning for Portfolio Management

Hu, Gang, Gu, Ming

Investment portfolios, central to finance, balance potential returns and risks. This paper introduces a hybrid approach combining Markowitz's portfolio theory with reinforcement learning, utilizing knowledge distillation for training agents. In particular, our proposed method, called KDD (Knowledge Distillation DDPG), consist of two training stages: supervised and reinforcement learning stages. The trained agents optimize portfolio assembly. A comparative analysis against standard financial models and AI frameworks, using metrics like returns, the Sharpe ratio, and nine evaluation indices, reveals our model's superiority. It notably achieves the highest yield and Sharpe ratio of 2.03, ensuring top profitability with the lowest risk in comparable return scenarios.

ddpg, investment strategy, portfolio management, (11 more...)

2405.05449

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom > England > Greater Manchester > Manchester (0.04)

Genre: Research Report (0.64)

Industry:

Banking & Finance > Trading (1.00)
Leisure & Entertainment (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)