Goto

Collaborating Authors

 Undirected Networks


Estimating manipulation intentions to ease teleoperation

AIHub

Teleoperation is one of the longest-standing application fields in robotics. While full autonomy is still work in progress, the possibility to remotely operate a robot has already opened scenarios where humans can act in risky environments without endangering their own safety, such as when defusing explosives or decommissioning nuclear waste. It also allows one to be present and act even at great distance: underwater, in space, or inside a patient miles away from the surgeon. These are all critical applications, where skilled and qualified operators control the robot after receiving specific training to learn to use the system safely. The recent pandemic has yet made even more apparent the need for immersive telepresence and remote action also for non-expert users: not only could teleoperated robots take vitals or bring drugs to infectious patients, but we could assist our elderly living far away with chores like moving heavy stuff, or cooking, for example.


Near Sample-Optimal Reduction-based Policy Learning for Average Reward MDP

arXiv.org Artificial Intelligence

This work considers the sample complexity of obtaining an $\varepsilon$-optimal policy in an average reward Markov Decision Process (AMDP), given access to a generative model (simulator). When the ground-truth MDP is weakly communicating, we prove an upper bound of $\widetilde O(H \varepsilon^{-3} \ln \frac{1}{\delta})$ samples per state-action pair, where $H := sp(h^*)$ is the span of bias of any optimal policy, $\varepsilon$ is the accuracy and $\delta$ is the failure probability. This bound improves the best-known mixing-time-based approaches in [Jin & Sidford 2021], which assume the mixing-time of every deterministic policy is bounded. The core of our analysis is a proper reduction bound from AMDP problems to discounted MDP (DMDP) problems, which may be of independent interests since it allows the application of DMDP algorithms for AMDP in other settings. We complement our upper bound by proving a minimax lower bound of $\Omega(|\mathcal S| |\mathcal A| H \varepsilon^{-2} \ln \frac{1}{\delta})$ total samples, showing that a linear dependent on $H$ is necessary and that our upper bound matches the lower bound in all parameters of $(|\mathcal S|, |\mathcal A|, H, \ln \frac{1}{\delta})$ up to some logarithmic factors.


Proceedings of the 2nd International Workshop on Reading Music Systems

arXiv.org Artificial Intelligence

The International Workshop on Reading Music Systems (WoRMS) is a workshop that tries to connect researchers who develop systems for reading music, such as in the field of Optical Music Recognition, with other researchers and practitioners that could benefit from such systems, like librarians or musicologists. The relevant topics of interest for the workshop include, but are not limited to: Music reading systems; Optical music recognition; Datasets and performance evaluation; Image processing on music scores; Writer identification; Authoring, editing, storing and presentation systems for music scores; Multi-modal systems; Novel input-methods for music to produce written music; Web-based Music Information Retrieval services; Applications and projects; Use-cases related to written music. These are the proceedings of the 2nd International Workshop on Reading Music Systems, held in Delft on the 2nd of November 2019.


Reward Function Optimization of a Deep Reinforcement Learning Collision Avoidance System

arXiv.org Artificial Intelligence

The Traffic Alert Collision Avoidance System (TCAS) has been an integral part of the increased safety of air transport since it was federally mandated in the 1991 for all passenger carrying aircraft with more than 30 seats flying in U.S. airspace [1, 2]. TCAS led to a dramatic reduction in the occurrence of mid air collisions in modern aviation; however the heuristic based approach undertaken in TCAS has made it difficult to adapt the system to the evolving complexity of the National Airspace System (NAS), which includes new cooperative surveillance systems (e.g., ADS-B) and new vehicle entrants. In response, the Federal Aviation Administration (FAA) commissioned the development of a replacement for TCAS. This new system, referred to as the Next Generation Airborne Collision Avoidance System X (ACAS X), which is currently in development at MIT Lincoln Laboratory and John Hopkins Applied Physics Laboratory, is expected to integrate into multiple aircraft platforms and reduce nuisance alerts as well as reduce the risk of Near Mid Air Collisions (NMAC) [3]. ACAS X introduced several variants designed to reduce the risk of NMAC for a particular operation, such as commercial aviation (ACAS Xa) [4], large uncrewed aerial systems (ACAS Xu) [5], smaller uncrewed aerial vehicles (ACAS sXu) [6], and ACAS Xr which is under development for advanced air mobility and helicopter operations. Each variant adds capabilities and design considerations for the operational environment and platforms that will be commonly seen by the ACAS X equipped vehicle. For example, ACAS sXu introduced vehicle to vehicle surveillance to accommodate a future link that sUAS may use to interrogate and coordinate with each other. While, ACAS Xu added Remain Well Clear alerting due to its use in remotely piloted or autonomous UAS.


Deep Learning-based Beam Tracking for Millimeter-wave Communications under Mobility

arXiv.org Artificial Intelligence

In this paper, we propose a deep learning-based beam tracking method for millimeter-wave (mmWave)communications. Beam tracking is employed for transmitting the known symbols using the sounding beams and tracking time-varying channels to maintain a reliable communication link. When the pose of a user equipment (UE) device varies rapidly, the mmWave channels also tend to vary fast, which hinders seamless communication. Thus, models that can capture temporal behavior of mmWave channels caused by the motion of the device are required, to cope with this problem. Accordingly, we employa deep neural network to analyze the temporal structure and patterns underlying in the time-varying channels and the signals acquired by inertial sensors. We propose a model based on long short termmemory (LSTM) that predicts the distribution of the future channel behavior based on a sequence of input signals available at the UE. This channel distribution is used to 1) control the sounding beams adaptively for the future channel state and 2) update the channel estimate through the measurement update step under a sequential Bayesian estimation framework. Our experimental results demonstrate that the proposed method achieves a significant performance gain over the conventional beam tracking methods under various mobility scenarios.


Targets in Reinforcement Learning to solve Stackelberg Security Games

arXiv.org Artificial Intelligence

Reinforcement Learning (RL) algorithms have been successfully applied to real world situations like illegal smuggling, poaching, deforestation, climate change, airport security, etc. These scenarios can be framed as Stackelberg security games (SSGs) where defenders and attackers compete to control target resources. The algorithm's competency is assessed by which agent is controlling the targets. This review investigates modeling of SSGs in RL with a focus on possible improvements of target representations in RL algorithms.


A Reinforcement Learning Approach to Optimize Available Network Bandwidth Utilization

arXiv.org Artificial Intelligence

Efficient data transfers over high-speed, long-distance shared networks require proper utilization of available network bandwidth. Using parallel TCP streams enables an application to utilize network parallelism and can improve transfer throughput; however, finding the optimum number of parallel TCP streams is challenging due to nondeterministic background traffic sharing the same network. Additionally, the non-stationary, multi-objectiveness, and partially-observable nature of network signals in the host systems add extra complexity in finding the current network condition. In this work, we present a novel approach to finding the optimum number of parallel TCP streams using deep reinforcement learning (RL). We devise a learning-based algorithm capable of generalizing different network conditions and utilizing the available network bandwidth intelligently. Contrary to rule-based heuristics that do not generalize well in unknown network scenarios, our RL-based solution can dynamically discover and adapt the parallel TCP stream numbers to maximize the network bandwidth utilization without congesting the network and ensure fairness among contending transfers. We extensively evaluated our RL-based algorithm's performance, comparing it with several state-of-the-art online optimization algorithms. The results show that our RL-based algorithm can find near-optimal solutions 40% faster while achieving up to 15% higher throughput. We also show that, unlike a greedy algorithm, our devised RL-based algorithm can avoid network congestion and fairly share the available network resources among contending transfers.


Discrete Control in Real-World Driving Environments using Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Training self-driving cars is often challenging since they require a vast amount of labeled data in multiple real-world contexts, which is computationally and memory intensive. Researchers often resort to driving simulators to train the agent and transfer the knowledge to a real-world setting. Since simulators lack realistic behavior, these methods are quite inefficient. To address this issue, we introduce a framework (perception, planning, and control) in a real-world driving environment that transfers the real-world environments into gaming environments by setting up a reliable Markov Decision Process (MDP). We propose variations of existing Reinforcement Learning (RL) algorithms in a multi-agent setting to learn and execute the discrete control in real-world environments. Experiments show that the multi-agent setting outperforms the single-agent setting in all the scenarios. We also propose reliable initialization, data augmentation, and training techniques that enable the agents to learn and generalize to navigate in a real-world environment with minimal input video data, and with minimal training. Additionally, to show the efficacy of our proposed algorithm, we deploy our method in the virtual driving environment TORCS.


Value-based CTDE Methods in Symmetric Two-team Markov Game: from Cooperation to Team Competition

arXiv.org Artificial Intelligence

In this paper, we identify the best learning scenario to train a team of agents to compete against multiple possible strategies of opposing teams. We evaluate cooperative value-based methods in a mixed cooperative-competitive environment. We restrict ourselves to the case of a symmetric, partially observable, two-team Markov game. We selected three training methods based on the centralised training and decentralised execution (CTDE) paradigm: QMIX, MAVEN and QVMix. For each method, we considered three learning scenarios differentiated by the variety of team policies encountered during training. For our experiments, we modified the StarCraft Multi-Agent Challenge environment to create competitive environments where both teams could learn and compete simultaneously. Our results suggest that training against multiple evolving strategies achieves the best results when, for scoring their performances, teams are faced with several strategies.


Towards True Lossless Sparse Communication in Multi-Agent Systems

arXiv.org Artificial Intelligence

Communication enables agents to cooperate to achieve their goals. Learning when to communicate, i.e., sparse (in time) communication, and whom to message is particularly important when bandwidth is limited. Recent work in learning sparse individualized communication, however, suffers from high variance during training, where decreasing communication comes at the cost of decreased reward, particularly in cooperative tasks. We use the information bottleneck to reframe sparsity as a representation learning problem, which we show naturally enables lossless sparse communication at lower budgets than prior art. In this paper, we propose a method for true lossless sparsity in communication via Information Maximizing Gated Sparse Multi-Agent Communication (IMGS-MAC). Our model uses two individualized regularization objectives, an information maximization autoencoder and sparse communication loss, to create informative and sparse communication. We evaluate the learned communication `language' through direct causal analysis of messages in non-sparse runs to determine the range of lossless sparse budgets, which allow zero-shot sparsity, and the range of sparse budgets that will inquire a reward loss, which is minimized by our learned gating function with few-shot sparsity. To demonstrate the efficacy of our results, we experiment in cooperative multi-agent tasks where communication is essential for success. We evaluate our model with both continuous and discrete messages. We focus our analysis on a variety of ablations to show the effect of message representations, including their properties, and lossless performance of our model.