AITopics

2009.12068

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > New York (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(9 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

#artificialintelligenceSep-24-2020, 04:35:17 GMT

Watch a Robot AI Beat World-Class Curling Competitors

Artificial intelligence still needs to bridge the "sim-to-real" gap. Deep-learning techniques that are all the rage in AI log superlative performances in mastering cerebral games, including chess and Go, both of which can be played on a computer. But translating simulations to the physical world remains a bigger challenge. A robot named Curly that uses "deep reinforcement learning"--making improvements as it corrects its own errors--came out on top in three of four games against top-ranked human opponents from South Korean teams that included a women's team and a reserve squad for the national wheelchair team. One crucial finding was that the AI system demonstrated its ability to adapt to changing ice conditions.

artificial intelligence, machine learning, reinforcement learning, (2 more...)

#artificialintelligence

Country: Asia > South Korea (0.37)

Industry: Leisure & Entertainment > Games > Chess (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Huang, Yizhou, Xie, Kevin, Bharadhwaj, Homanga, Shkurti, Florian

Continual Model-Based Reinforcement Learning with Hypernetworks

arXiv.org Artificial IntelligenceSep-24-2020

Lifelong model-based robot learning is predicated upon continual adaptation to the dynamics of new tasks. For example, robots need to learn to manipulate unseen objects with various mass distributions, walk on new types of terrains with different friction, elasticity, and other physical properties, or even learn to adapt to different tasks, such as walking, running, or climbing stairs. This presents at least two challenges for many model-based reinforcement learning (MBRL) and model-predictive control (MPC) formulations, which typically comprise of a dynamics learning phase followed by a planning/policy optimization and execution phase. First, these methods are not scalable because the time required to train the dynamics model grows linearly with the size of the collected experience. Second, as the robot learner encounters and adapts to new tasks, it has to avoid catastrophic forgetting of the dynamics of old tasks, and should ideally exhibit both forward transfer (old tasks improve the learning performance on the new task) and backward transfer (new task improves the performance on old tasks). Many MBRL and MPC methods lack this type of adaptation and positive transfer. In this work, we propose to extend the task-aware continual learning approach based on hypernetworks in [1] to adapt to changing environment dynamics and to address the scalability and positive transfer challenges mentioned above in a reinforcement learning setting.

hypernetwork, neural network, upstream oil & gas, (15 more...)

2009.11997

Country:

North America > Canada > Ontario > Toronto (0.29)
North America > United States (0.14)

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Chaudhury, Subhajit, Kimura, Daiki, Talamadupula, Kartik, Tatsubori, Michiaki, Munawar, Asim, Tachibana, Ryuki

Bootstrapped Q-learning with Context Relevant Observation Pruning to Generalize in Text-based Games

arXiv.org Machine LearningSep-24-2020

We show that Reinforcement Learning (RL) methods for solving Text-Based Games (TBGs) often fail to generalize on unseen games, especially in small data regimes. To address this issue, we propose Context Relevant Episodic State Truncation (CREST) for irrelevant token removal in observation text for improved generalization. Our method first trains a base model using Q-learning, which typically overfits the training games. The base model's action token distribution is used to perform observation pruning that removes irrelevant tokens. A second bootstrapped model is then retrained on the pruned observation text. Our bootstrapped agent shows improved generalization in solving unseen TextWorld games, using 10x-20x fewer training games compared to previous state-of-the-art methods despite requiring less number of training episodes.

generalization, observation text, training game, (15 more...)

2009.11896

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Machine LearningSep-24-2020

Multimodal Safety-Critical Scenarios Generation for Decision-Making Algorithms Evaluation

Ding, Wenhao, Chen, Baiming, Li, Bo, Eun, Kim Ji, Zhao, Ding

Existing neural network-based autonomous systems are shown to be vulnerable against adversarial attacks, therefore sophisticated evaluation on their robustness is of great importance. However, evaluating the robustness only under the worst-case scenarios based on known attacks is not comprehensive, not to mention that some of them even rarely occur in the real world. In addition, the distribution of safety-critical data is usually multimodal, while most traditional attacks and evaluation methods focus on a single modality. To solve the above challenges, we propose a flow-based multimodal safety-critical scenario generator for evaluating decisionmaking algorithms. The proposed generative model is optimized with weighted likelihood maximization and a gradient-based sampling procedure is integrated to improve the sampling efficiency. The safety-critical scenarios are generated by querying the task algorithms and the log-likelihood of the generated scenarios is in proportion to the risk level. Experiments on a self-driving task demonstrate our advantages in terms of testing efficiency and multimodal modeling capability. We evaluate six Reinforcement Learning algorithms with our generated traffic scenarios and provide empirical conclusions about their robustness.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2009.08311

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Illinois (0.04)
North America > Canada > British Columbia (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.40)

Industry:

Transportation (0.95)
Information Technology > Security & Privacy (0.69)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Machine LearningSep-24-2020

Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization

Cen, Shicong, Cheng, Chen, Chen, Yuxin, Wei, Yuting, Chi, Yuejie

Natural policy gradient (NPG) methods are among the most widely used policy optimization algorithms in contemporary reinforcement learning. This class of methods is often applied in conjunction with entropy regularization -- an algorithmic scheme that encourages exploration -- and is closely related to soft policy iteration and trust region policy optimization. Despite the empirical success, the theoretical underpinnings for NPG methods remain limited even for the tabular setting. This paper develops $\textit{non-asymptotic}$ convergence guarantees for entropy-regularized NPG methods under softmax parameterization, focusing on discounted Markov decision processes (MDPs). Assuming access to exact policy evaluation, we demonstrate that the algorithm converges linearly -- or even quadratically once it enters a local region around the optimal policy -- when computing optimal value functions of the regularized MDP. Moreover, the algorithm is provably stable vis-\`a-vis inexactness of policy evaluation. Our convergence results accommodate a wide range of learning rates, and shed light upon the role of entropy regularization in enabling fast convergence.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2007.06558

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

Yavas, M. Ugur, Ure, N. Kemal, Kumbasar, Tufan

A New Approach for Tactical Decision Making in Lane Changing: Sample Efficient Deep Q Learning with a Safety Feedback Reward

arXiv.org Artificial IntelligenceSep-24-2020

The efficient design and implementation of DRL agents There has been a growing interest in self-driving cars involves many steps which are starting with state-action by the industry since Darpa Urban Challenge [1]. Despite representations, balancing multi-objective reward function, the great achievements in this competition, the deployment tuning the hyper-parameters of the optimization algorithm, of self-driving cars into production is a quite complicated deciding the network architecture, generating rich data out problem due to reasons such as long tail of edge cases, of realistic scenarios and finally broad evaluation against a safety verification and the need of intelligent algorithms that proper baseline methods with different seeds. Considering are capable of negotiating with human drivers. There are the aforementioned steps, [7] lacks the comparison with a already level-2 capable cars in production that autonomously fair baseline and uses a very naive simulation environment control the vehicle at both the longitudinal and lateral levels.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2009.11905

Country:

Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.05)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.05)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Government > Military (1.00)
Automobiles & Trucks (1.00)
Transportation > Ground > Road (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Manoury, Alexandre, Nguyen, Sao Mai, Buche, Cédric

Hierarchical Affordance Discovery using Intrinsic Motivation

arXiv.org Artificial IntelligenceSep-23-2020

To be capable of lifelong learning in a real-life environment, robots have to tackle multiple challenges. Being able to relate physical properties they may observe in their environment to possible interactions they may have is one of them. This skill, named affordance learning, is strongly related to embodiment and is mastered through each person's development: each individual learns affordances differently through their own interactions with their surroundings. Current methods for affordance learning usually use either fixed actions to learn these affordances or focus on static setups involving a robotic arm to be operated. In this article, we propose an algorithm using intrinsic motivation to guide the learning of affordances for a mobile robot. This algorithm is capable to autonomously discover, learn and adapt interrelated affordances without pre-programmed actions. Once learned, these affordances may be used by the algorithm to plan sequences of actions in order to perform tasks of various difficulties. We then present one experiment and analyse our system before comparing it with other approaches from reinforcement learning and affordance learning.

affordance, machine learning, reinforcement learning, (12 more...)

doi: 10.1145/3349537.3351898

2009.10968

Country:

Europe > France > Brittany > Finistère > Brest (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre:

Research Report (0.50)
Instructional Material (0.48)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Kalagarla, Krishna C., Jain, Rahul, Nuzzo, Pierluigi

A Sample-Efficient Algorithm for Episodic Finite-Horizon MDP with Constraints

arXiv.org Artificial IntelligenceSep-23-2020

Constrained Markov Decision Processes (CMDPs) formalize sequential decision-making problems whose objective is to minimize a cost function while satisfying constraints on various cost functions. In this paper, we consider the setting of episodic fixed-horizon CMDPs. We propose an online algorithm which leverages the linear programming formulation of finitehorizon CMDP for repeated optimistic planning to provide a probably approximately correct (PAC) guarantee on the number of episodes needed to ensure an ǫ-optimal policy, i.e., with resulting objective value within ǫ of the optimal value and satisfying the constraints within ǫ-tolerance, with probability at least 1 δ. S, the number of episodes needed have a linear dependence on the state and action space sizes S and A, respectively, and quadratic dependence on the time horizon H. Markov decision processes (MDPs) [1] offer a natural framework to express sequential decision-making problems and reason about autonomous system behaviors. However, the single cost objective of a traditional MDP formulation may fall short of fully capturing problems with multiple conflicting objectives and additional constraints that must be satisfied.

ln 4, log 2, probability, (15 more...)

2009.11348

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Chen, Irene Y., Joshi, Shalmali, Ghassemi, Marzyeh, Ranganath, Rajesh

Probabilistic Machine Learning for Healthcare

arXiv.org Machine LearningSep-23-2020

Machine learning can be used to make sense of healthcare data. Probabilistic machine learning models help provide a complete picture of observed data in healthcare. In this review, we examine how probabilistic machine learning can advance healthcare. We consider challenges in the predictive model building pipeline where probabilistic models can be beneficial including calibration and missing data. Beyond predictive models, we also investigate the utility of probabilistic machine learning models in phenotyping, in generative models for clinical use cases, and in reinforcement learning.

machine learning, probabilistic model, reinforcement learning, (15 more...)

2009.11087

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)