AITopics

2102.09907

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.87)

Analytics and Machine Learning in Vehicle Routing Research

Bai, Ruibin, Chen, Xinan, Chen, Zhi-Long, Cui, Tianxiang, Gong, Shuhui, He, Wentao, Jiang, Xiaoping, Jin, Huan, Jin, Jiahuan, Kendall, Graham, Li, Jiawei, Lu, Zheng, Ren, Jianfeng, Weng, Paul, Xue, Ning, Zhang, Huayan

The Vehicle Routing Problem (VRP) is one of the most intensively studied combinatorial optimisation problems for which numerous models and algorithms have been proposed. To tackle the complexities, uncertainties and dynamics involved in real-world VRP applications, Machine Learning (ML) methods have been used in combination with analytical approaches to enhance problem formulations and algorithmic performance across different problem solving scenarios. However, the relevant papers are scattered in several traditional research fields with very different, sometimes confusing, terminologies. This paper presents a first, comprehensive review of hybrid methods that combine analytical techniques with ML tools in addressing VRP problems. Specifically, we review the emerging research streams on ML-assisted VRP modelling and ML-assisted VRP optimisation. We conclude that ML can be beneficial in enhancing VRP modelling, and improving the performance of algorithms for both online and offline VRP optimisations. Finally, challenges and future opportunities of VRP research are discussed.

algorithm, vehicle, vehicle routing problem, (15 more...)

2102.10012

Country:

Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.14)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Zhejiang Province > Ningbo (0.04)
(12 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.68)
Research Report > Promising Solution (0.45)

Industry: Transportation > Freight & Logistics Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(4 more...)

Duminy, Nicolas, Nguyen, Sao Mai, Zhu, Junshuai, Duhaut, Dominique, Kerdreux, Jerome

Intrinsically Motivated Open-Ended Multi-Task Learning Using Transfer Learning to Discover Task Hierarchy

In open-ended continuous environments, robots need to learn multiple parameterised control tasks in hierarchical reinforcement learning. We hypothesise that the most complex tasks can be learned more easily by transferring knowledge from simpler tasks, and faster by adapting the complexity of the actions to the task. We propose a task-oriented representation of complex actions, called procedures, to learn online task relationships and unbounded sequences of action primitives to control the different observables of the environment. Combining both goal-babbling with imitation learning, and active learning with transfer of knowledge based on intrinsic motivation, our algorithm self-organises its learning process. It chooses at any given time a task to focus on; and what, how, when and from whom to transfer knowledge. We show with a simulation and a real industrial robot arm, in cross-task and cross-learner transfer settings, that task composition is key to tackle highly complex tasks. Task decomposition is also efficiently transferred across different embodied learners and by active imitation, where the robot requests just a small amount of demonstrations and the adequate type of information. The robot learns and exploits task dependencies so as to learn tasks of every complexity.

intrinsically motivated open-ended multi-task learning, learner, learning, (9 more...)

doi: 10.3390/app11030975

2102.09854

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)
(10 more...)

Genre: Research Report > New Finding (0.46)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.52)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Model-Invariant State Abstractions for Model-Based Reinforcement Learning

Tomar, Manan, Zhang, Amy, Calandra, Roberto, Taylor, Matthew E., Pineau, Joelle

Accuracy and generalization of dynamics models is key to the success of model-based reinforcement learning (MBRL). As the complexity of tasks increases, learning dynamics models becomes increasingly sample inefficient for MBRL methods. However, many tasks also exhibit sparsity in the dynamics, i.e., actions have only a local effect on the system dynamics. In this paper, we exploit this property with a causal invariance perspective in the single-task setting, introducing a new type of state abstraction called \textit{model-invariance}. Unlike previous forms of state abstractions, a model-invariance state abstraction leverages causal sparsity over state variables. This allows for generalization to novel combinations of unseen values of state variables, something that non-factored forms of state abstractions cannot do. We prove that an optimal policy can be learned over this model-invariance state abstraction. Next, we propose a practical method to approximately learn a model-invariant representation for complex domains. We validate our approach by showing improved modeling performance over standard maximum likelihood approaches on challenging tasks, such as the MuJoCo-based Humanoid. Furthermore, within the MBRL setting we show strong performance gains w.r.t. sample efficiency across a host of other continuous control tasks.

abstraction, model-invariant state abstraction, state variable, (11 more...)

2102.0985

Country:

North America > Canada > Alberta (0.14)
North America > United States > Massachusetts (0.04)
North America > United States > Illinois (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Wu, Minchao, Norrish, Michael, Walder, Christian, Dezfouli, Amir

TacticZero: Learning to Prove Theorems from Scratch with Deep Reinforcement Learning

We propose a novel approach to interactive theorem-proving (ITP) using deep reinforcement learning. Unlike previous work, our framework is able to prove theorems both end-to-end and from scratch (i.e., without relying on example proofs from human experts). We formulate the process of ITP as a Markov decision process (MDP) in which each state represents a set of potential derivation paths. The agent learns to select promising derivations as well as appropriate tactics within each derivation using deep policy gradients. This structure allows us to introduce a novel backtracking mechanism which enables the agent to efficiently discard (predicted) dead-end derivations and restart the derivation from promising alternatives. Experimental results show that the framework provides comparable performance to that of the approaches that use human experts, and that it is also capable of proving theorems that it has never seen during training. We further elaborate the role of each component of the framework using ablation studies.

latexit latexit sha1, latexit sha1, theorem, (14 more...)

2102.09756

Country:

Oceania > Australia > Australian Capital Territory > Canberra (0.04)
North America > United States > North Carolina > Wake County > Morrisville (0.04)
North America > United States > New York > New York County > New York City (0.04)
(10 more...)

Genre:

Research Report > New Finding (0.48)
Instructional Material > Course Syllabus & Notes (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Finite-Time Analysis of Asynchronous Q-Learning with Discrete-Time Switching System Models

Lee, Donghwan

This paper develops a novel framework to analyze the convergence of Q-learning algorithm from a discrete-time switching system perspective. We prove that asynchronous Q-learning with a constant step-size can be naturally formulated as discrete-time stochastic switched linear systems. It offers novel and intuitive insights on Q-learning mainly based on control theoretic frameworks. For instance, the proposed analysis explains the overestimation phenomenon in Q-learning due to the maximization bias. Based on the control system theoretic argument and some nice structures of Q-learning, a new finite-time analysis of the Q-learning is given with a novel error bound.

comparison system, convergence, q-learning, (14 more...)

2102.08583

Country:

Asia > Middle East > Jordan (0.04)
Asia > South Korea > Daejeon > Daejeon (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceFeb-18-2021

Learning Memory-Dependent Continuous Control from Demonstrations

Hou, Siqing, Han, Dongqi, Tani, Jun

Efficient exploration has presented a long-standing challenge in reinforcement learning, especially when rewards are sparse. A developmental system can overcome this difficulty by learning from both demonstrations and self-exploration. However, existing methods are not applicable to most real-world robotic controlling problems because they assume that environments follow Markov decision processes (MDP); thus, they do not extend to partially observable environments where historical observations are necessary for decision making. This paper builds on the idea of replaying demonstrations for memory-dependent continuous control, by proposing a novel algorithm, Recurrent Actor-Critic with Demonstration and Experience Replay (READER). Experiments involving several memory-crucial continuous control tasks reveal significantly reduce interactions with the environment using our method with a reasonably small number of demonstration samples. The algorithm also shows better sample efficiency and learning capabilities than a baseline reinforcement learning algorithm for memory-based control from demonstrations.

agent, algorithm, demonstration, (15 more...)

2102.09208

Country:

Asia > Japan > Kyūshū & Okinawa > Okinawa (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > South Korea (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.89)

Cappart, Quentin, Chételat, Didier, Khalil, Elias, Lodi, Andrea, Morris, Christopher, Veličković, Petar

Combinatorial optimization and reasoning with graph neural networks

arXiv.org Machine LearningFeb-18-2021

Nowadays, combinatorial optimization (CO) is an interdisciplinary field spanning optimization, operations research, discrete mathematics, and computer science, with many critical real-world applications such as vehicle routing or scheduling; see [71] for a general overview. Intuitively, CO deals with selecting a subset from a finite set that optimizes a cost or objective function. Although many CO problems are hard from a complexity theory standpoint due to their discrete nature, many of them are routinely solved in practice. Historically, the optimization and theoretical computer science communities have been focusing on finding optimal [71], heuristic [12], or approximative [130] solutions for individual problem instances. However, in many practical situations of interest, one often needs to solve problem instances which share patterns and characteristics repeatedly.

algorithm, graph, neural network, (15 more...)

2102.09544

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > Canada > Quebec > Montreal (0.04)
North America > United States > California > Los Angeles County > Santa Monica (0.04)
(2 more...)

Genre: Overview (1.00)

Industry: Transportation > Ground > Rail (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
(5 more...)

Khodadadian, Sajad, Chen, Zaiwei, Maguluri, Siva Theja

Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm

arXiv.org Machine LearningFeb-18-2021

Reinforcement Learning (RL) is a paradigm where an agent aims at maximizing its cumulative reward by searching for an optimal policy, in an environment modeled as a Markov Decision Process (MDP) (Sutton and Barto, 2018). RL algorithms have achieved tremendous successes in a wide range of applications such as self-driving cars with Deep Deterministic Policy Gradient (DDPG) (Lillicrap et al., 2015), and AlphaGo in the game of Go (Silver et al., 2016). The algorithms in RL can be categorized into value space methods, such as Q-learning (Watkins and Dayan, 1992), TD-learning (Sutton, 1988), and policy space methods, such as actor-critic (AC) (Konda and Tsitsiklis, 2000). Despite great empirical successes (Bahdanau et al., 2016; Wang et al., 2016), the finite-sample convergence of AC type of algorithms are not completely characterized theoretically. An AC algorithm can be thought as a generalized policy iteration (Puterman, 1995), and consists of two phases, namely actor and critic. The objective of the actor is to improve the policy, while the critic aims at evaluating the performance of a specific policy. A step of the actor can be thought as a step of Stochastic Gradient Ascent (Bottou et al., 2018) with preconditioning.

algorithm, convergence, theorem 2, (12 more...)

2102.09318

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment > Games > Go (0.54)
Information Technology (0.54)
Transportation > Passenger (0.34)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Fakoor, Rasool, Mueller, Jonas, Chaudhari, Pratik, Smola, Alexander J.

Continuous Doubly Constrained Batch Reinforcement Learning

arXiv.org Machine LearningFeb-18-2021

Reliant on too many experiments to learn good actions, current Reinforcement Learning (RL) algorithms have limited applicability in real-world settings, which can be too expensive to allow exploration. We propose an algorithm for batch RL, where effective policies are learned using only a fixed offline dataset instead of online interactions with the environment. The limited data in batch RL produces inherent uncertainty in value estimates of states/actions that were insufficiently represented in the training data. This leads to particularly severe extrapolation when our candidate policies diverge from one that generated the data. We propose to mitigate this issue via two straightforward penalties: a policy-constraint to reduce this divergence and a value-constraint that discourages overly optimistic estimates. Over a comprehensive set of 32 continuous-action batch RL benchmarks, our approach compares favorably to state-of-the-art methods, regardless of how the offline data were collected.

cdc, learning, penalty, (14 more...)

2102.09225

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania (0.04)

Genre:

Research Report > Experimental Study (0.48)
Research Report > New Finding (0.34)

Industry:

Information Technology (0.46)
Health & Medicine (0.33)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)