AITopics

2012.08888

Country:

Asia > China > Guangdong Province > Guangzhou (0.04)
Europe > United Kingdom (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.94)

Zhou, Dongruo, Gu, Quanquan, Szepesvari, Csaba

Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes

arXiv.org Machine LearningDec-15-2020

We study reinforcement learning (RL) with linear function approximation where the underlying transition probability kernel of the Markov decision process (MDP) is a linear mixture model (Jia et al., 2020; Ayoub et al., 2020; Zhou et al., 2020) and the learning agent has access to either an integration or a sampling oracle of the individual basis kernels. We propose a new Bernstein-type concentration inequality for self-normalized martingales for linear bandit problems with bounded noise. Based on the new inequality, we propose a new, computationally efficient algorithm with linear function approximation named $\text{UCRL-VTR}^{+}$ for the aforementioned linear mixture MDPs in the episodic undiscounted setting. We show that $\text{UCRL-VTR}^{+}$ attains an $\tilde O(dH\sqrt{T})$ regret where $d$ is the dimension of feature mapping, $H$ is the length of the episode and $T$ is the number of interactions with the MDP. We also prove a matching lower bound $\Omega(dH\sqrt{T})$ for this setting, which shows that $\text{UCRL-VTR}^{+}$ is minimax optimal up to logarithmic factors. In addition, we propose the $\text{UCLK}^{+}$ algorithm for the same family of MDPs under discounting and show that it attains an $\tilde O(d\sqrt{T}/(1-\gamma)^{1.5})$ regret, where $\gamma\in [0,1)$ is the discount factor. Our upper bound matches the lower bound $\Omega(d\sqrt{T}/(1-\gamma)^{1.5})$ proved in Zhou et al. (2020) up to logarithmic factors, suggesting that $\text{UCLK}^{+}$ is nearly minimax optimal. To the best of our knowledge, these are the first computationally efficient, nearly minimax optimal algorithms for RL with linear function approximation.

algorithm, inequality hold, probability, (10 more...)

arXiv.org Machine Learning

2012.08507

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > Canada > Alberta (0.14)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.84)

arXiv.org Machine LearningDec-15-2020

Variational Beam Search for Online Learning with Distribution Shifts

Li, Aodong, Boyd, Alex, Smyth, Padhraic, Mandt, Stephan

We consider the problem of online learning in the presence of sudden distribution shifts as frequently encountered in applications such as autonomous navigation. Distribution shifts require constant performance monitoring and re-training. They may also be hard to detect and can lead to a slow but steady degradation in model performance. To address this problem we propose a new Bayesian meta-algorithm that can both (i) make inferences about subtle distribution shifts based on minimal sequential observations and (ii) accordingly adapt a model in an online fashion. The approach uses beam search over multiple change point hypotheses to perform inference on a hierarchical sequential latent variable modeling framework. Our proposed approach is model-agnostic, applicable to both supervised and unsupervised learning, and yields significant improvements over state-of-the-art Bayesian online learning approaches.

hypothesis, learning, variational beam search, (13 more...)

arXiv.org Machine Learning

2012.08101

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > California > Orange County > Irvine (0.04)
North America > United States > Iowa (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry:

Education > Educational Setting > Online (0.82)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
(2 more...)

Bonet, Blai, Geffner, Hector

General Policies, Serializations, and Planning Width

It has been observed that in many of the benchmark planning domains, atomic goals can be reached with a simple polynomial exploration procedure, called IW, that runs in time exponential in the problem width. Such problems have indeed a bounded width: a width that does not grow with the number of problem variables and is often no greater than two. Yet, while the notion of width has become part of the state-of-the-art planning algorithms like BFWS, there is still no good explanation for why so many benchmark domains have bounded width. In this work, we address this question by relating bounded width and serialized width to ideas of generalized planning, where general policies aim to solve multiple instances of a planning problem all at once. We show that bounded width is a property of planning domains that admit optimal general policies in terms of features that are explicitly or implicitly represented in the domain encoding. The results are extended to much larger class of domains with bounded serialized width where the general policies do not have to be optimal. The study leads also to a new simple, meaningful, and expressive language for specifying domain serializations in the form of policy sketches which can be used for encoding domain control knowledge by hand or for learning it from traces. The use of sketches and the meaning of the theoretical results are all illustrated through a number of examples.

serialization, trajectory, transition, (15 more...)

2012.08033

Country:

Europe > France (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Sweden > Östergötland County > Linköping (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.92)

Feature Selection for Learning to Predict Outcomes of Compute Cluster Jobs with Application to Decision Support

Okanlawon, Adedolapo, Yang, Huichen, Bose, Avishek, Hsu, William, Andresen, Dan, Tanash, Mohammed

We present a machine learning framework and a new test bed for data mining from the Slurm Workload Manager for high-performance computing (HPC) clusters. The focus was to find a method for selecting features to support decisions: helping users decide whether to resubmit failed jobs with boosted CPU and memory allocations or migrate them to a computing cloud. This task was cast as both supervised classification and regression learning, specifically, sequential problem solving suitable for reinforcement learning. Selecting relevant features can improve training accuracy, reduce training time, and produce a more comprehensible model, with an intelligent system that can explain predictions and inferences. We present a supervised learning model trained on a Simple Linux Utility for Resource Management (Slurm) data set of HPC jobs using three different techniques for selecting features: linear regression, lasso, and ridge regression. Our data set represented both HPC jobs that failed and those that succeeded, so our model was reliable, less likely to overfit, and generalizable. Our model achieved an R^2 of 95\% with 99\% accuracy. We identified five predictors for both CPU and memory properties.

information, prediction, slurm data, (15 more...)

2012.07982

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Kansas > Riley County > Manhattan (0.04)

Genre: Research Report (0.65)

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
(3 more...)

Learning to Stop: Dynamic Simulation Monte-Carlo Tree Search

Lan, Li-Cheng, Tsai, Meng-Yu, Wu, Ti-Rong, Wu, I-Chen, Hsieh, Cho-Jui

Monte Carlo tree search (MCTS) has achieved state-of-the-art results in many domains such as Go and Atari games when combining with deep neural networks (DNNs). When more simulations are executed, MCTS can achieve higher performance but also requires enormous amounts of CPU and GPU resources. However, not all states require a long searching time to identify the best action that the agent can find. For example, in 19x19 Go and NoGo, we found that for more than half of the states, the best action predicted by DNN remains unchanged even after searching 2 minutes. This implies that a significant amount of resources can be saved if we are able to stop the searching earlier when we are confident with the current searching result. In this paper, we propose to achieve this goal by predicting the uncertainty of the current searching status and use the result to decide whether we should stop searching. With our algorithm, called Dynamic Simulation MCTS (DS-MCTS), we can speed up a NoGo agent trained by AlphaZero 2.5 times faster while maintaining a similar winning rate. Also, under the same average simulation count, our method can achieve a 61% winning rate against the original program.

mct-un, simulation, win rate, (16 more...)

2012.0791

Country:

Asia > Taiwan (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
Europe > Netherlands > South Holland > Delft (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Deshwal, Aryan, Belakaria, Syrine, Doppa, Janardhan Rao, Fern, Alan

Optimizing Discrete Spaces via Expensive Evaluations: A Learning to Search Framework

We consider the problem of optimizing expensive black-box functions over discrete spaces (e.g., sets, sequences, graphs). The key challenge is to select a sequence of combinatorial structures to evaluate, in order to identify high-performing structures as quickly as possible. Our main contribution is to introduce and evaluate a new learning-to-search framework for this problem called L2S-DISCO. The key insight is to employ search procedures guided by control knowledge at each step to select the next structure and to improve the control knowledge as new function evaluations are observed. We provide a concrete instantiation of L2S-DISCO for local search procedure and empirically evaluate it on diverse real-world benchmarks. Results show the efficacy of L2S-DISCO over state-of-the-art algorithms in solving complex optimization problems.

l2s-disco, local search, optimization, (17 more...)

doi: 10.1609/aaai.v34i04.5788

2012.0732

Country:

North America > United States > Washington (0.04)
North America > United States > Oregon (0.04)
North America > United States > California (0.04)
North America > Canada > British Columbia (0.04)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

#artificialintelligenceDec-12-2020, 03:50:22 GMT

Understanding The Binary Search Algorithm In Python

Algorithms are an essential aspect of programming. In this article, we will cover one such cool algorithm that can be used to efficiently find the location of a specific element in a list or array. We will cover the binary search algorithm in complete detail and try to implement it with python. In mathematics and computer science, an algorithm is a finite sequence of well-defined, computer-implementable instructions, typically to solve a class of problems or to perform a computation. One such algorithm that we will cover in this article is the binary search algorithm.

algorithm, binary search algorithm, search algorithm, (4 more...)

#artificialintelligence

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)

Borg, Markus, Abdessalem, Raja Ben, Nejati, Shiva, Jegeden, Francois-Xavier, Shin, Donghwan

Digital Twins Are Not Monozygotic -- Cross-Replicating ADAS Testing in Two Industry-Grade Automotive Simulators

arXiv.org Artificial IntelligenceDec-12-2020

The increasing levels of software- and data-intensive driving automation call for an evolution of automotive software testing. As a recommended practice of the Verification and Validation (V&V) process of ISO/PAS 21448, a candidate standard for safety of the intended functionality for road vehicles, simulation-based testing has the potential to reduce both risks and costs. There is a growing body of research on devising test automation techniques using simulators for Advanced Driver-Assistance Systems (ADAS). However, how similar are the results if the same test scenarios are executed in different simulators? We conduct a replication study of applying a Search-Based Software Testing (SBST) solution to a real-world ADAS (PeVi, a pedestrian vision detection system) using two different commercial simulators, namely, TASS/Siemens PreScan and ESI Pro-SiVIC. Based on a minimalistic scene, we compare critical test scenarios generated using our SBST solution in these two simulators. We show that SBST can be used to effectively and efficiently generate critical test scenarios in both simulators, and the test results obtained from the two simulators can reveal several weaknesses of the ADAS under test. However, executing the same test scenarios in the two simulators leads to notable differences in the details of the test outputs, in particular, related to (1) safety violations revealed by tests, and (2) dynamics of cars and pedestrians. Based on our findings, we recommend future V&V plans to include multiple simulators to support robust simulation-based testing and to base test objectives on measures that are less dependant on the internals of the simulators.

pro-sivic, scenario, simulator, (15 more...)

2012.06822

Country:

North America > United States > Massachusetts > Middlesex County > Burlington (0.04)
North America > Canada > Ontario > National Capital Region > Ottawa (0.04)
Europe > Sweden > Skåne County > Lund (0.04)
Europe > France > Pays de la Loire > Loire-Atlantique > Nantes (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation > Ground > Road (1.00)
Information Technology (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.67)

Huang, Taoan, Dilkina, Bistra, Koenig, Sven

Learning to Resolve Conflicts for Multi-Agent Path Finding with Conflict-Based Search

arXiv.org Artificial IntelligenceDec-10-2020

Conflict-Based Search (CBS) is a state-of-the-art algorithm for multi-agent path finding. At the high level, CBS repeatedly detects conflicts and resolves one of them by splitting the current problem into two subproblems. Previous work chooses the conflict to resolve by categorizing the conflict into three classes and always picking a conflict from the highest-priority class. In this work, we propose an oracle for conflict selection that results in smaller search tree sizes than the one used in previous work. However, the computation of the oracle is slow. Thus, we propose a machine-learning framework for conflict selection that observes the decisions made by the oracle and learns a conflict-selection strategy represented by a linear ranking function that imitates the oracle's decisions accurately and quickly. Experiments on benchmark maps indicate that our method significantly improves the success rates, the search tree sizes and runtimes over the current state-of-the-art CBS solver.

agent, conflict, ranking function, (13 more...)

2012.06005

Country:

North America > United States > California (0.14)
North America > United States > Texas > Tarrant County > Fort Worth (0.04)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (0.67)
Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)