AITopics | Undirected Networks

Collaborating Authors

Undirected Networks

News Overviews Instructional Materials AI-Alerts Classics

Adaptive Belief Discretization for POMDP Planning

arXiv.org Artificial IntelligenceApr-15-2021

Partially Observable Markov Decision Processes (POMDP) is a widely used model to represent the interaction of an environment and an agent, under state uncertainty. Since the agent does not observe the environment state, its uncertainty is typically represented through a probabilistic belief. While the set of possible beliefs is infinite, making exact planning intractable, the belief space's complexity (and hence planning complexity) is characterized by its covering number. Many POMDP solvers uniformly discretize the belief space and give the planning error in terms of the (typically unknown) covering number. We instead propose an adaptive belief discretization scheme, and give its associated planning error. We furthermore characterize the covering number with respect to the POMDP parameters. This allows us to specify the exact memory requirements on the planner, needed to bound the value function error. We then propose a novel, computationally efficient solver using this scheme. We demonstrate that our algorithm is highly competitive with the state of the art in a variety of scenarios.

algorithm, belief space, inference, (15 more...)

arXiv.org Artificial Intelligence

2104.07276

Country:

North America > United States (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Sweden (0.04)
Europe > Norway > Eastern Norway > Oslo (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Learning Multimodal Contact-Rich Skills from Demonstrations Without Reward Engineering

Balakuntala, Mythra V., Kaur, Upinder, Ma, Xin, Wachs, Juan, Voyles, Richard M.

arXiv.org Artificial IntelligenceApr-14-2021

Everyday contact-rich tasks, such as peeling, cleaning, and writing, demand multimodal perception for effective and precise task execution. However, these present a novel challenge to robots as they lack the ability to combine these multimodal stimuli for performing contact-rich tasks. Learning-based methods have attempted to model multi-modal contact-rich tasks, but they often require extensive training examples and task-specific reward functions which limits their practicality and scope. Hence, we propose a generalizable model-free learning-from-demonstration framework for robots to learn contact-rich skills without explicit reward engineering. We present a novel multi-modal sensor data representation which improves the learning performance for contact-rich skills. We performed training and experiments using the real-life Sawyer robot for three everyday contact-rich skills -- cleaning, writing, and peeling. Notably, the framework achieves a success rate of 100\% for the peeling and writing skill, and 80\% for the cleaning skill. Hence, this skill learning framework can be extended for learning other physical manipulation skills.

demonstration, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2103.01296

Country: North America > United States (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
(2 more...)

Add feedback

Muesli: Combining Improvements in Policy Optimization

Hessel, Matteo, Danihelka, Ivo, Viola, Fabio, Guez, Arthur, Schmitt, Simon, Sifre, Laurent, Weber, Theophane, Silver, David, van Hasselt, Hado

arXiv.org Artificial IntelligenceApr-13-2021

We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero's state-of-the-art performance on Atari. Notably, Muesli does so without using deep search: it acts directly with a policy network and has computation speed comparable to model-free baselines. The Atari results are complemented by extensive ablations, and by additional results on continuous control and 9x9 Go.

combining improvement, muesli, optimization, (13 more...)

arXiv.org Artificial Intelligence

2104.06159

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
(4 more...)

Add feedback

Two-stage training algorithm for AI robot soccer

Kim, Taeyoung, Vecchietti, Luiz Felipe, Choi, Kyujin, Sariel, Sanem, Har, Dongsoo

arXiv.org Artificial IntelligenceApr-13-2021

In multi-agent reinforcement learning, the cooperative learning behavior of agents is very important. In the field of heterogeneous multi-agent reinforcement learning, cooperative behavior among different types of agents in a group is pursued. Learning a joint-action set during centralized training is an attractive way to obtain such cooperative behavior, however, this method brings limited learning performance with heterogeneous agents. To improve the learning performance of heterogeneous agents during centralized training, two-stage heterogeneous centralized training which allows the training of multiple roles of heterogeneous agents is proposed. During training, two training processes are conducted in a series. One of the two stages is to attempt training each agent according to its role, aiming at the maximization of individual role rewards. The other is for training the agents as a whole to make them learn cooperative behaviors while attempting to maximize shared collective rewards, e.g., team rewards. Because these two training processes are conducted in a series in every timestep, agents can learn how to maximize role rewards and team rewards simultaneously. The proposed method is applied to 5 versus 5 AI robot soccer for validation. Simulation results show that the proposed method can train the robots of the robot soccer team effectively, achieving higher role rewards and higher team rewards as compared to other approaches that can be used to solve problems of training cooperative multi-agent.

agent, score-concede rate, tshct, (12 more...)

arXiv.org Artificial Intelligence

2104.05931

Country:

Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > South Korea > Daejeon > Daejeon (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

Investigating Methods to Improve Language Model Integration for Attention-based Encoder-Decoder ASR Models

Zeineldeen, Mohammad, Glushko, Aleksandr, Michel, Wilfried, Zeyer, Albert, Schlüter, Ralf, Ney, Hermann

arXiv.org Machine LearningApr-12-2021

Attention-based encoder-decoder (AED) models learn an implicit internal language model (ILM) from the training transcriptions. The integration with an external LM trained on much more unpaired text usually leads to better performance. A Bayesian interpretation as in the hybrid autoregressive transducer (HAT) suggests dividing by the prior of the discriminative acoustic model, which corresponds to this implicit LM, similarly as in the hybrid hidden Markov model approach. The implicit LM cannot be calculated efficiently in general and it is yet unclear what are the best methods to estimate it. In this work, we compare different approaches from the literature and propose several novel methods to estimate the ILM directly from the AED model. Our proposed methods outperform all previous approaches. We also investigate other methods to suppress the ILM mainly by decreasing the capacity of the AED model, limiting the label context, and also by training the AED model together with a pre-existing LM.

aed model, decoder, speech recognition, (12 more...)

arXiv.org Machine Learning

2104.05544

Country:

Europe > Germany > North Rhine-Westphalia > Cologne Region > Aachen (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Europe > Iceland > Capital Region > Reykjavik (0.04)
(3 more...)

Genre: Research Report (0.84)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.82)

Add feedback

Survey on reinforcement learning for language processing

Uc-Cetina, Victor, Navarro-Guerrero, Nicolas, Martin-Gonzalez, Anabel, Weber, Cornelius, Wermter, Stefan

arXiv.org Artificial IntelligenceApr-12-2021

Machine learning algorithms have been very successful to solve problems in the natural language processing (NLP) domain for many years, especially supervised and unsupervised methods. However, this is not the case with reinforcement learning (RL), which is somewhat surprising since in other domains, reinforcement learning methods have experienced an increased level of success with some impressive results, for instance in board games such as AlphaGo Zero [106]. Yet, deep reinforcement learning for natural language processing is still in its infancy when compared to supervised learning [65]. Thus, the goal of this article is to provide a review of applications of reinforcement learning to NLP and we present an analysis of the underlying structure of the problems that make them viable to be treated entirely or partially as RL problems intended as an aid to newcomers to the field. We also analyze some existing research gaps and provide a list of promising research directions in which natural language systems might benefit from reinforcement learning algorithms.

computational linguistic, reinforcement, reinforcement learning, (11 more...)

arXiv.org Artificial Intelligence

2104.05565

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Austria > Vienna (0.14)
(45 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Leisure & Entertainment > Games > Go (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

MPTP: Motion-Planning-aware Task Planning for Navigation in Belief Space

Thomas, Antony, Mastrogiovanni, Fulvio, Baglietto, Marco

arXiv.org Artificial IntelligenceApr-10-2021

We present an integrated Task-Motion Planning (TMP) framework for navigation in large-scale environments. Of late, TMP for manipulation has attracted significant interest resulting in a proliferation of different approaches. In contrast, TMP for navigation has received considerably less attention. Autonomous robots operating in real-world complex scenarios require planning in the discrete (task) space and the continuous (motion) space. In knowledge-intensive domains, on the one hand, a robot has to reason at the highest-level, for example, the objects to procure, the regions to navigate to in order to acquire them; on the other hand, the feasibility of the respective navigation tasks have to be checked at the execution level. This presents a need for motion-planning-aware task planners. In this paper, we discuss a probabilistically complete approach that leverages this task-motion interaction for navigating in large knowledge-intensive domains, returning a plan that is optimal at the task-level. The framework is intended for motion planning under motion and sensing uncertainty, which is formally known as belief space planning. The underlying methodology is validated in simulation, in an office environment and its scalability is tested in the larger Willow Garage world. A reasonable comparison with a work that is closest to our approach is also provided. We also demonstrate the adaptability of our approach by considering a building floor navigation domain. Finally, we also discuss the limitations of our approach and put forward suggestions for improvements and future work.

artificial intelligence, machine learning, robot, (18 more...)

arXiv.org Artificial Intelligence

2104.04696

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Massachusetts (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)
(8 more...)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Risk-Aware Lane Selection on Highway with Dynamic Obstacles

Bae, Sangjae, Isele, David, Fujimura, Kikuo, Moura, Scott J.

arXiv.org Artificial IntelligenceApr-8-2021

This paper proposes a discretionary lane selection algorithm. In particular, highway driving is considered as a targeted scenario, where each lane has a different level of traffic flow. When lane-changing is discretionary, it is advised not to change lanes unless highly beneficial, e.g., reducing travel time significantly or securing higher safety. Evaluating such "benefit" is a challenge, along with multiple surrounding vehicles in dynamic speed and heading with uncertainty. We propose a real-time lane-selection algorithm with careful cost considerations and with modularity in design. The algorithm is a search-based optimization method that evaluates uncertain dynamic positions of other vehicles under a continuous time and space domain. For demonstration, we incorporate a state-of-the-art motion planner framework (Neural Networks integrated Model Predictive Control) under a CARLA simulation environment.

ground transportation, neural network, vehicle, (22 more...)

arXiv.org Artificial Intelligence

2104.04105

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
Asia (0.14)

Genre: Research Report (0.50)

Industry:

Automobiles & Trucks (1.00)
Transportation > Ground > Road (0.88)
Energy > Oil & Gas (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Learning to Coordinate via Multiple Graph Neural Networks

Xu, Zhiwei, Zhang, Bin, Bai, Yunpeng, Li, Dapeng, Fan, Guoliang

arXiv.org Artificial IntelligenceApr-8-2021

The collaboration between agents has gradually become an important topic in multi-agent systems. The key is how to efficiently solve the credit assignment problems. This paper introduces MGAN for collaborative multi-agent reinforcement learning, a new algorithm that combines graph convolutional networks and value-decomposition methods. MGAN learns the representation of agents from different perspectives through multiple graph networks, and realizes the proper allocation of attention between all agents. We show the amazing ability of the graph network in representation learning by visualizing the output of the graph network, and therefore improve interpretability for the actions of each agent in the multi-agent system.

agent, graph convolutional network, scenario, (14 more...)

arXiv.org Artificial Intelligence

2104.03503

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Informative Path Planning for Extreme Anomaly Detection in Environment Exploration and Monitoring

Blanchard, Antoine, Sapsis, Themistoklis

arXiv.org Machine LearningApr-7-2021

This includes missions related to environment exploration and monitoring in which an UAV is tasked with producing a map for a quantity of interest (e.g., pollutant concentration, terrain elevation, or vegetation growth) by collecting measurements at various locations across a region of interest (e.g., a reservoir, a city, or a crop) [10, 13, 17, 23, 40]. The data collected by the UAV can be used to construct a statistical model for the quantity of interest, which in turn can be used for analysis and policy making. Of course, the statistical model is only as good as the measurements made by the UAV. Therefore, the question of data collection (i.e., how, when, and where to make measurements) is of paramount importance, especially from the standpoint of detecting anomalies in the environment. Path-planning algorithms for environment exploration come in two flavors. Approaches in which the UAV decides on its next move one step at a time are referred to as myopic [24, 42]. Myopic algorithms are suitable for most situations but lack a mechanism for anticipation, which may be problematic in cases where path-planning decisions may have negative longterm consequences (e.g., the UAV gets stuck because of maneuverability constraints).

bayesian inference, planning & scheduling, uav, (19 more...)

arXiv.org Machine Learning

2005.1004

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe (0.14)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback