AITopics

We present a reinforcement learning-based solution to autonomously race on a miniature race car platform. We show that a policy that is trained purely in simulation using a relatively simple vehicle model, including model randomization, can be successfully transferred to the real robotic setup. We achieve this by using novel policy output regularization approach and a lifted action space which enables smooth actions but still aggressive race car driving. We show that this regularized policy does outperform the Soft Actor Critic (SAC) baseline method, both in simulation and on the real car, but it is still outperformed by a Model Predictive Controller (MPC) state of the art method. The refinement of the policy with three hours of real-world interaction data allows the reinforcement learning policy to achieve lap times similar to the MPC controller while reducing track constraint violations by 50%.

artificial intelligence, downstream oil & gas, regularization, (14 more...)

2011.13332

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Belgium (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry:

Automobiles & Trucks (0.94)
Leisure & Entertainment > Sports > Motorsports (0.88)
Energy > Oil & Gas > Downstream (0.58)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Raziei, Zohreh, Moghaddam, Mohsen

Adaptable Automation with Modular Deep Reinforcement Learning and Policy Transfer

The need for "intelligence" in such automation systems stems from the fact that most robotic operations in industry are currently limited to rote and repetitive tasks performed within structured environments. This leaves an entire swath of more complex tasks with high degrees of uncertainty and dynamic environments [7] difficult or even impossible to automate. Examples include maintenance and material handling for producing the desired product in manufacturing systems [8], robot surgeries and pharmacy automation in healthcare systems [9], safe working environments in disaster management for deep-sea operation, and nuclear energy [10], fruit picking, crop sensing, and selective weeding in agriculture systems [11]. A fundamental question concerning the notion of intelligent automation in this context then becomes: How can we enable adaptable industrial automation systems that can analyze and act upon their perceived environment rather than merely executing a set of predefined programs? Adaptability is among the key characteristics of industrial automation systems in response to unpredictable changes or disruptions in the process [12].

algorithm, arxiv preprint arxiv, learning, (14 more...)

2012.01934

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine (1.00)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Dai, Tianhong, Liu, Hengyan, Bharath, Anil Anthony

Episodic Self-Imitation Learning with Hindsight

Episodic self-imitation learning, a novel self-imitation algorithm with a trajectory selection module and an adaptive loss function, is proposed to speed up reinforcement learning. Compared to the original self-imitation learning algorithm, which samples good state-action pairs from the experience replay buffer, our agent leverages entire episodes with hindsight to aid self-imitation learning. A selection module is introduced to filter uninformative samples from each episode of the update. The proposed method overcomes the limitations of the standard self-imitation learning algorithm, a transitions-based method which performs poorly in handling continuous control environments with sparse rewards. From the experiments, episodic self-imitation learning is shown to perform better than baseline on-policy algorithms, achieving comparable performance to state-of-the-art off-policy algorithms in several simulated robot control tasks. The trajectory selection module is shown to prevent the agent learning undesirable hindsight experiences. With the capability of solving sparse reward problems in continuous control settings, episodic self-imitation learning has the potential to be applied to real-world problems that have continuous action spaces, such as robot guidance and manipulation.

hindsight experience, learning, self-imitation learning, (13 more...)

2011.13467

Country: Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Path Design and Resource Management for NOMA enhanced Indoor Intelligent Robots

Zhong, Ruikang, Liu, Xiao, Liu, Yuanwei, Chen, Yue, Wang, Xianbin

A communication enabled indoor intelligent robots (IRs) service framework is proposed, where nonorthogonal multiple access (NOMA) technique is adopted to enable highly reliable communications. In cooperation with the ultramodern indoor channel model recently proposed by the International Telecommunication Union (ITU), the Lego modeling method is proposed, which can deterministically describe the indoor layout and channel state in order to construct the radio map. The investigated radio map is invoked as a virtual environment to train the reinforcement learning agent, which can save training time and hardware costs. Build on the proposed communication model, motions of IRs who need to reach designated mission destinations and their corresponding down-link power allocation policy are jointly optimized to maximize the mission efficiency and communication reliability of IRs. In an effort to solve this optimization problem, a novel reinforcement learning approach named deep transfer deterministic policy gradient (DT-DPG) algorithm is proposed. Our simulation results demonstrate that 1) With the aid of NOMA techniques, the communication reliability of IRs is effectively improved; 2) The radio map is qualified to be a virtual training environment, and its statistical channel state information improves training efficiency by about 30%; 3) The proposed DT-DPG algorithm is superior to the conventional deep deterministic policy gradient (DDPG) algorithm in terms of optimization performance, training time, and anti-local optimum ability. Xianbin Wang is with Department of Electrical and Computer Engineering, Western University, London, ON N6A5B9, Canada (email: xianbin.wang@uwo.ca). The explosive development of robotics and artificial intelligence technologies have changed, are changing and will continue to transform human lives. In recent years, intelligent robots (IRs) are proven competent to provide a variety of services, such as security monitoring, sanitation, and travel guides [1]. New various services offered by IRs require a large amount of communication, computation and data resources, which are not necessarily provided locally [2].

algorithm, dt-dpg algorithm, radio map, (14 more...)

2011.11745

Country:

North America > Canada > Ontario > Middlesex County > London (0.24)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Education (0.88)
Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Simm, Gregor N. C., Pinsler, Robert, Csányi, Gábor, Hernández-Lobato, José Miguel

Symmetry-Aware Actor-Critic for 3D Molecular Design

arXiv.org Machine LearningNov-25-2020

Automating molecular design using deep reinforcement learning (RL) has the potential to greatly accelerate the search for novel materials. Despite recent progress on leveraging graph representations to design molecules, such methods are fundamentally limited by the lack of three-dimensional (3D) information. In light of this, we propose a novel actor-critic architecture for 3D molecular design that can generate molecular structures unattainable with previous approaches. This is achieved by exploiting the symmetries of the design process through a rotationally covariant state-action representation based on a spherical harmonics series expansion. We demonstrate the benefits of our approach on several 3D molecular design tasks, where we find that building in such symmetries significantly improves generalization and the quality of generated molecules.

deep learning, molecule, neural network, (19 more...)

arXiv.org Machine Learning

2011.12747

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Lee, Sanghwa, Lee, Jaeyoung, Hasuo, Ichiro

Predictive PER: Balancing Priority and Diversity towards Stable Deep Reinforcement Learning

Prioritized experience replay (PER) samples important transitions, rather than uniformly, to improve the performance of a deep reinforcement learning agent. We claim that such prioritization has to be balanced with sample diversity for making the DQN stabilized and preventing forgetting. Our proposed improvement over PER, called Predictive PER (PPER), takes three countermeasures (TDInit, TDClip, TDPred) to (i) eliminate priority outliers and explosions and (ii) improve the sample diversity and distributions, weighted by priorities, both leading to stabilizing the DQN. The most notable among the three is the introduction of the second DNN called TDPred to generalize the in-distribution priorities. Ablation study and full experiments with Atari games show that each countermeasure by its own way and PPER contribute to successfully enhancing stability and thus performance over PER.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2011.13093

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Puerto Rico > San Juan > San Juan (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.56)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Cordier, Thibault, Urvoy, Tanguy, Rojas-Barahona, Lina M., Lefèvre, Fabrice

Diluted Near-Optimal Expert Demonstrations for Guiding Dialogue Stochastic Policy Optimisation

These interactions can be taken from either human-to-human or human-machine conversations. However, human interactions are scarce and costly, making learning from few interactions essential. One solution to speedup the learning process is to guide the agent's exploration with the help of an expert. We present in this paper several imitation learning strategies for dialogue policy where the guiding expert is a near-optimal handcrafted policy. We incorporate these strategies with state-of-the-art reinforcement learning methods based on Q-learning and actorcritic. We notably propose a randomised exploration policy which allows for a seamless hybridisation of the learned policy and the expert, which can be seen as a dilution of the expert's demonstration into the resulting policy. Our experiments show that our hybridisation strategy outperforms several baselines, and that it could accelerate the learning when facing real humans.

demonstration, learning, reinforcement learning, (14 more...)

2012.04687

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > France (0.04)
North America > United States > New York > Monroe County > Rochester (0.04)
(6 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Moghadam, Majid, Alizadeh, Ali, Tekin, Engin, Elkaim, Gabriel Hugh

An End-to-end Deep Reinforcement Learning Approach for the Long-term Short-term Planning on the Frenet Space

Tactical decision making and strategic motion planning for autonomous highway driving are challenging due to the complication of predicting other road users' behaviors, diversity of environments, and complexity of the traffic interactions. This paper presents a novel end-to-end continuous deep reinforcement learning approach towards autonomous cars' decision-making and motion planning. For the first time, we define both states and action spaces on the Frenet space to make the driving behavior less variant to the road curvatures than the surrounding actors' dynamics and traffic interactions. The agent receives time-series data of past trajectories of the surrounding vehicles and applies convolutional neural networks along the time channels to extract features in the backbone. The algorithm generates continuous spatiotemporal trajectories on the Frenet frame for the feedback controller to track. Extensive high-fidelity highway simulations on CARLA show the superiority of the presented approach compared with commonly used baselines and discrete reinforcement learning on various traffic scenarios. Furthermore, the proposed method's advantage is confirmed with a more comprehensive performance evaluation against 1000 randomly generated test scenarios.

agent, trajectory, vehicle, (16 more...)

2011.13098

Country:

North America > United States > California (0.04)
Asia > Middle East > Jordan (0.04)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)

Genre: Research Report (1.00)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Gopalakrishnan, Anand, van Steenkiste, Sjoerd, Schmidhuber, Jürgen

Unsupervised Object Keypoint Learning using Local Spatial Predictability

Hence, which layer(s) we choose as our feature embedding will have an effect on the outcome of the local spatial prediction problem. While more abstract high-level features are expected to better capture the internal predictive structure of an object, it will be more difficult to attribute the error of the prediction network to the exact image location. On the other hand, while more low-level features can be localized more accurately, they may lack the expressiveness to capture high-level properties of objects. Nonetheless, in practice we find that a spatial feature embedding based on earlier layers of the encoder works well (see also Section 5.3 for an ablation). Local Spatial Prediction Task Using the learned spatial feature embedding we seek out salient regions of the input image that correspond to object parts. Our approach is based on the idea that objects correspond to local regions in feature space that have high internal predictive structure, which allows us to formulate the following local spatial prediction (LSP) task. For each location in the learned spatial feature embedding, we seek to predict the value of the features (across the feature maps) from its neighbouring feature values. When neighbouring areas correspond to the same object-(part), i.e. they regularly appear together, we expect that this prediction problem is easy (green arrow in Figure 3).

keypoint, representation, transporter, (13 more...)

2011.1293

Country:

Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(5 more...)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.74)

Nguyen, Phuong D. H., Georgie, Yasmin Kim, Kayhan, Ezgi, Eppe, Manfred, Hafner, Verena Vanessa, Wermter, Stefan

Sensorimotor representation learning for an "active self" in robots: A model survey

For example, sensorimotor birth, infants spend their first months of life undergoing experiences are used to learn a forward model, and a many developmental milestones to incrementally develop forward model can be the basis for learning high-level the representation of their body. This body schema is cognitive conceptual representations. In agreement with related mainly to touch, proprioception, and vision (see Schillaci et al. (2016), we aim to go deeper into the role of Table 1) as these sensory modalities continue to develop multisensory information collected through exploration from the fetal stage (see Hoffmann, 2017; Adolph in the formation of an agent's body and peripersonal and Joh, 2007 for reviews). Later on, the representation space representation, and how these sensorimotor representations of the surrounding space of the body--the PPS--is affect the agent's sense of the active self, aggregated from the proprioceptive and exteroceptive including the sense of agency and the sense of body modalities (see Table 1). In addition, infants develop ownership. Thus, motor explorations will be mentioned the capability to generate motor actions corresponding but not exhaustively discussed in this surveyed work.

body schema, infant, representation, (16 more...)

2011.1286

Country:

Europe > Germany > Hamburg (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Germany > Berlin (0.04)
(4 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.92)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.92)
(4 more...)