AITopics

doi: 10.1109/IROS55552.2023.10342421

2404.03943

Country:

Asia > China > Hong Kong (0.05)
Asia > China > Guangdong Province > Shenzhen (0.04)
North America > United States > Massachusetts (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Artificial IntelligenceApr-5-2024

Probabilistically Informed Robot Object Search with Multiple Regions

Collins, Matthew, Beard, Jared J., Ohi, Nicholas, Gu, Yu

The increasing use of autonomous robot systems in hazardous environments underscores the need for efficient search and rescue operations. Despite significant advancements, existing literature on object search often falls short in overcoming the difficulty of long planning horizons and dealing with sensor limitations, such as noise. This study introduces a novel approach that formulates the search problem as a belief Markov decision processes with options (BMDP-O) to make Monte Carlo tree search (MCTS) a viable tool for overcoming these challenges in large scale environments. The proposed formulation incorporates sequences of actions (options) to move between regions of interest, enabling the algorithm to efficiently scale to large environments. This approach also enables the use of customizable fields of view, for use with multiple types of sensors. Experimental results demonstrate the superiority of this approach in large environments when compared to the problem without options and alternative tools such as receding horizon planners. Given compute time for the proposed formulation is relatively high, a further approximated "lite" formulation is proposed. The lite formulation finds objects in a comparable number of steps with faster computation.

formulation, probability, robot, (13 more...)

2404.04186

Country:

North America > United States > West Virginia > Monongalia County > Morgantown (0.04)
North America > United States > Virginia (0.04)
Europe > Italy > Lazio > Rome (0.04)
Asia (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

arXiv.org Machine LearningApr-4-2024

Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm

Lu, Miao, Zhong, Han, Zhang, Tong, Blanchet, Jose

The sim-to-real gap, which represents the disparity between training and testing environments, poses a significant challenge in reinforcement learning (RL). A promising approach to addressing this challenge is distributionally robust RL, often framed as a robust Markov decision process (RMDP). In this framework, the objective is to find a robust policy that achieves good performance under the worst-case scenario among all environments within a pre-specified uncertainty set centered around the training environment. Unlike previous work, which relies on a generative model or a pre-collected offline dataset enjoying good coverage of the deployment environment, we tackle robust RL via interactive data collection, where the learner interacts with the training environment only and refines the policy through trial and error. In this robust RL paradigm, two main challenges emerge: managing distributional robustness while striking a balance between exploration and exploitation during data collection. Initially, we establish that sample-efficient learning without additional assumptions is unattainable owing to the curse of support shift; i.e., the potential disjointedness of the distributional supports between the training and testing environments. To circumvent such a hardness result, we introduce the vanishing minimal value assumption to RMDPs with a total-variation (TV) distance robust set, postulating that the minimal value of the optimal robust value function is zero. We prove that such an assumption effectively eliminates the support shift issue for RMDPs with a TV distance robust set, and present an algorithm with a provable sample complexity guarantee. Our work makes the initial step to uncovering the inherent difficulty of robust RL via interactive data collection and sufficient conditions for designing a sample-efficient algorithm accompanied by sharp sample complexity analysis.

machine learning, reinforcement learning, robust rl, (13 more...)

arXiv.org Machine Learning

2404.03578

Country: North America > United States (0.67)

Genre:

Research Report (0.83)
Workflow (0.66)

Industry:

Education (0.54)
Leisure & Entertainment > Games (0.34)
Energy > Oil & Gas (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Gallego-Viñarás, Lorena, Mira-Tomás, Juan Miguel, Michela-Gaeta, Anna, Pinol-Ripoll, Gerard, Barbé, Ferrán, Olmos, Pablo M., Muñoz-Barrutia, Arrate

Alzheimer's disease detection in PSG signals

Alzheimer's disease (AD) and sleep disorders exhibit a close association, where disruptions in sleep patterns often precede the onset of Mild Cognitive Impairment (MCI) and early-stage AD. This study delves into the potential of utilizing sleep-related electroencephalography (EEG) signals acquired through polysomnography (PSG) for the early detection of AD. Our primary focus is on exploring semi-supervised Deep Learning techniques for the classification of EEG signals due to the clinical scenario characterized by the limited data availability. The methodology entails testing and comparing the performance of semi-supervised SMATE and TapNet models, benchmarked against the supervised XCM model, and unsupervised Hidden Markov Models (HMMs). The study highlights the significance of spatial and temporal analysis capabilities, conducting independent analyses of each sleep stage. Results demonstrate the effectiveness of SMATE in leveraging limited labeled data, achieving stable metrics across all sleep stages, and reaching 90% accuracy in its supervised form. Comparative analyses reveal SMATE's superior performance over TapNet and HMM, while XCM excels in supervised scenarios with an accuracy range of 92 - 94%. These findings underscore the potential of semi-supervised models in early AD detection, particularly in overcoming the challenges associated with the scarcity of labeled data. Ablation tests affirm the critical role of spatio-temporal feature extraction in semi-supervised predictive performance, and t-SNE visualizations validate the model's proficiency in distinguishing AD patterns. Overall, this research contributes to the advancement of AD detection through innovative Deep Learning approaches, highlighting the crucial role of semi-supervised learning in addressing data limitations.

alzheimer, detection, sleep stage, (16 more...)

2404.03549

Country:

North America > United States (0.14)
Europe > Spain > Galicia > Madrid (0.05)
Europe > Spain > Catalonia > Lleida Province > Lleida (0.04)
Asia > Middle East > Iran > Kermanshah Province > Kermanshah (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.89)

Molinghen, Yannick, Avalos, Raphaël, Van Achter, Mark, Nowé, Ann, Lenaerts, Tom

Laser Learning Environment: A new environment for coordination-critical multi-agent tasks

We introduce the Laser Learning Environment (LLE), a collaborative multi-agent reinforcement learning environment in which coordination is central. In LLE, agents depend on each other to make progress (interdependence), must jointly take specific sequences of actions to succeed (perfect coordination), and accomplishing those joint actions does not yield any intermediate reward (zero-incentive dynamics). The challenge of such problems lies in the difficulty of escaping state space bottlenecks caused by interdependence steps since escaping those bottlenecks is not rewarded. We test multiple state-of-the-art value-based MARL algorithms against LLE and show that they consistently fail at the collaborative task because of their inability to escape state space bottlenecks, even though they successfully achieve perfect coordination. We show that Q-learning extensions such as prioritised experience replay and n-steps return hinder exploration in environments with zero-incentive dynamics, and find that intrinsic curiosity with random network distillation is not sufficient to escape those bottlenecks. We demonstrate the need for novel methods to solve this problem and the relevance of LLE as cooperative MARL benchmark.

agent, coordination, interdependence, (10 more...)

2404.03596

Country:

Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Education (0.91)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Waltz, Martin, Okhrin, Ostap, Schultz, Michael

Self-organized arrival system for urban air mobility

Urban air mobility is an innovative mode of transportation in which electric vertical takeoff and landing (eVTOL) vehicles operate between nodes called vertiports. We outline a self-organized vertiport arrival system based on deep reinforcement learning. The airspace around the vertiport is assumed to be circular, and the vehicles can freely operate inside. Each aircraft is considered an individual agent and follows a shared policy, resulting in decentralized actions that are based on local information. We investigate the development of the reinforcement learning policy during training and illustrate how the algorithm moves from suboptimal local holding patterns to a safe and efficient final policy. The latter is validated in simulation-based scenarios and also deployed on small-scale unmanned aerial vehicles to showcase its real-world usability.

aircraft, reinforcement, vehicle, (13 more...)

2404.0371

Country:

North America > United States > Massachusetts (0.04)
Europe > Germany > Saxony > Leipzig (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)

Genre:

Overview (0.93)
Research Report > New Finding (0.93)

Industry:

Transportation > Air (1.00)
Aerospace & Defense > Aircraft (1.00)
Transportation > Infrastructure & Services (0.93)
Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(3 more...)

Heterogeneous Multi-Agent Reinforcement Learning for Zero-Shot Scalable Collaboration

Guo, Xudong, Shi, Daming, Yu, Junjie, Fan, Wenhui

The rise of multi-agent systems, especially the success of multi-agent reinforcement learning (MARL), is reshaping our future across diverse domains like autonomous vehicle networks. However, MARL still faces significant challenges, particularly in achieving zero-shot scalability, which allows trained MARL models to be directly applied to unseen tasks with varying numbers of agents. In addition, real-world multi-agent systems usually contain agents with different functions and strategies, while the existing scalable MARL methods only have limited heterogeneity. To address this, we propose a novel MARL framework named Scalable and Heterogeneous Proximal Policy Optimization (SHPPO), integrating heterogeneity into parameter-shared PPO-based MARL networks. we first leverage a latent network to adaptively learn strategy patterns for each agent. Second, we introduce a heterogeneous layer for decision-making, whose parameters are specifically generated by the learned latent variables. Our approach is scalable as all the parameters are shared except for the heterogeneous layer, and gains both inter-individual and temporal heterogeneity at the same time. We implement our approach based on the state-of-the-art backbone PPO-based algorithm as SHPPO, while our approach is agnostic to the backbone and can be seamlessly plugged into any parameter-shared MARL method. SHPPO exhibits superior performance over the baselines such as MAPPO and HAPPO in classic MARL environments like Starcraft Multi-Agent Challenge (SMAC) and Google Research Football (GRF), showcasing enhanced zero-shot scalability and offering insights into the learned latent representation's impact on team performance by visualization.

agent, heterogeneity, learning, (13 more...)

2404.03869

Country:

Asia > China > Beijing > Beijing (0.05)
Asia > Middle East > Jordan (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report (0.40)

Industry:

Leisure & Entertainment > Games > Computer Games (0.48)
Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Chen, Xingran, NaderiAlizadeh, Navid, Ribeiro, Alejandro, Bidokhti, Shirin Saeedi

Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks

We address the challenge of sampling and remote estimation for autoregressive Markovian processes in a multi-hop wireless network with statistically-identical agents. Agents cache the most recent samples from others and communicate over wireless collision channels governed by an underlying graph topology. Our goal is to minimize time-average estimation error and/or age of information with decentralized scalable sampling and transmission policies, considering both oblivious (where decision-making is independent of the physical processes) and non-oblivious policies (where decision-making depends on physical processes). We prove that in oblivious policies, minimizing estimation error is equivalent to minimizing the age of information. The complexity of the problem, especially the multi-dimensional action spaces and arbitrary network topologies, makes theoretical methods for finding optimal transmission policies intractable. We optimize the policies using a graphical multi-agent reinforcement learning framework, where each agent employs a permutation-equivariant graph neural network architecture. Theoretically, we prove that our proposed framework exhibits desirable transferability properties, allowing transmission policies trained on small- or moderate-size networks to be executed effectively on large-scale topologies. Numerical experiments demonstrate that (i) Our proposed framework outperforms state-of-the-art baselines; (ii) The trained policies are transferable to larger networks, and their performance gains increase with the number of agents; (iii) The training procedure withstands non-stationarity even if we utilize independent learning techniques; and, (iv) Recurrence is pivotal in both independent learning and centralized training and decentralized execution, and improves the resilience to non-stationarity in independent learning.

agent, packet, th agent, (16 more...)

2404.03227

Country:

North America > United States > Pennsylvania (0.04)
Asia > China > Sichuan Province > Chengdu (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
(2 more...)

arXiv.org Artificial IntelligenceApr-3-2024

Data Quality in Crowdsourcing and Spamming Behavior Detection

Ba, Yang, Mancenido, Michelle V., Chiou, Erin K., Pan, Rong

As crowdsourcing emerges as an efficient and cost-effective method for obtaining labels for machine learning datasets, it is important to assess the quality of crowd-provided data, so as to improve analysis performance and reduce biases in subsequent machine learning tasks. Given the lack of ground truth in most cases of crowdsourcing, we refer to data quality as annotators' consistency and credibility. Unlike the simple scenarios where Kappa coefficient and intraclass correlation coefficient usually can apply, online crowdsourcing requires dealing with more complex situations. We introduce a systematic method for evaluating data quality and detecting spamming threats via variance decomposition, and we classify spammers into three categories based on their different behavioral patterns. A spammer index is proposed to assess entire data consistency and two metrics are developed to measure crowd worker's credibility by utilizing the Markov chain and generalized random effects models. Furthermore, we showcase the practicality of our techniques and their advantages by applying them on a face verification task with both simulation and real-world data collected from two crowdsourcing platforms.

experiment, participant, spammer, (16 more...)

2404.17582

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Texas > Travis County > Austin (0.04)
(5 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry:

Information Technology (0.67)
Health & Medicine (0.67)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Nieves, David, Ramírez-Quintana, María José, Monserrat, Carlos, Ferri, César, Hernández-Orallo, José

Learning Alternative Ways of Performing a Task

arXiv.org Artificial IntelligenceApr-3-2024

A common way of learning to perform a task is to observe how it is carried out by experts. However, it is well known that for most tasks there is no unique way to perform them. This is especially noticeable the more complex the task is because factors such as the skill or the know-how of the expert may well affect the way she solves the task. In addition, learning from experts also suffers of having a small set of training examples generally coming from several experts (since experts are usually a limited and expensive resource), being all of them positive examples (i.e. examples that represent successful executions of the task). Traditional machine learning techniques are not useful in such scenarios, as they require extensive training data. Starting from very few executions of the task presented as activity sequences, we introduce a novel inductive approach for learning multiple models, with each one representing an alternative strategy of performing a task. By an iterative process based on generalisation and specialisation, we learn the underlying patterns that capture the different styles of performing a task exhibited by the examples. We illustrate our approach on two common activity recognition tasks: a surgical skills training task and a cooking domain. We evaluate the inferred models with respect to two metrics that measure how well the models represent the examples and capture the different forms of executing a task showed by the examples. We compare our results with the traditional process mining approach and show that a small set of meaningful examples is enough to obtain patterns that capture the different strategies that are followed to solve the tasks.

dependency graph, graph, sequence, (17 more...)

doi: 10.1016/j.eswa.2020.113263

2404.02579

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > Canada > British Columbia > East Kootenay Region > Fernie (0.04)
Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
(4 more...)

Genre:

Workflow (0.89)
Research Report > New Finding (0.34)

Industry:

Information Technology (1.00)
Health & Medicine > Surgery (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.68)