AITopics

2006.12007

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Games (1.00)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Tarbouriech, Jean, Pirotta, Matteo, Valko, Michal, Lazaric, Alessandro

A Provably Efficient Sample Collection Strategy for Reinforcement Learning

A common assumption in reinforcement learning (RL) is to have access to a generative model (i.e., a simulator of the environment), which allows to generate samples from any desired state-action pair. Nonetheless, in many settings a generative model may not be available and an adaptive exploration strategy is needed to efficiently collect samples from an unknown environment by direct interaction. In this paper, we study the scenario where an algorithm based on the generative model assumption defines the (possibly time-varying) amount of samples $b(s,a)$ required at each state-action pair $(s,a)$ and an exploration strategy has to learn how to generate $b(s,a)$ samples as fast as possible. Building on recent results for regret minimization in the stochastic shortest path (SSP) setting (Cohen et al., 2020; Tarbouriech et al., 2020), we derive an algorithm that requires $\tilde{O}( B D + D^{3/2} S^2 A)$ time steps to collect the $B = \sum_{s,a} b(s,a)$ desired samples, in any unknown and communicating MDP with $S$ states, $A$ actions and diameter $D$. Leveraging the generality of our strategy, we readily apply it to a variety of existing settings (e.g., model estimation, pure exploration in MDPs) for which we obtain improved sample-complexity guarantees, and to a set of new problems such as best-state identification and sparse reward discovery.

optimization problem, requirement, upstream oil & gas, (19 more...)

2007.06437

Country:

North America > United States (0.28)
Europe > United Kingdom > England (0.14)

Genre:

Research Report (0.82)
Instructional Material > Course Syllabus & Notes (0.34)

Industry: Energy > Oil & Gas > Upstream (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.67)

Abeille, Marc, Lazaric, Alessandro

Efficient Optimistic Exploration in Linear-Quadratic Regulators via Lagrangian Relaxation

We study the exploration-exploitation dilemma in the linear quadratic regulator (LQR) setting. Inspired by the extended value iteration algorithm used in optimistic algorithms for finite MDPs, we propose to relax the optimistic optimization of \ofulq and cast it into a constrained \textit{extended} LQR problem, where an additional control variable implicitly selects the system dynamics within a confidence interval. We then move to the corresponding Lagrangian formulation for which we prove strong duality. As a result, we show that an $\epsilon$-optimistic controller can be computed efficiently by solving at most $O\big(\log(1/\epsilon)\big)$ Riccati equations. Finally, we prove that relaxing the original \ofu problem does not impact the learning performance, thus recovering the $\tilde{O}(\sqrt{T})$ regret of \ofulq. To the best of our knowledge, this is the first computationally efficient confidence-based algorithm for LQR with worst-case optimal regret guarantees.

constraint-based reasoning, efficient optimistic exploration, upstream oil & gas, (17 more...)

2007.06482

Country:

Europe > Sweden (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Gote, Christoph, Casiraghi, Giona, Schweitzer, Frank, Scholtes, Ingo

Predicting Sequences of Traversed Nodes in Graphs using Network Models with Multiple Higher Orders

We propose a novel sequence prediction method for sequential data capturing node traversals in graphs. Our method builds on a statistical modelling framework that combines multiple higher-order network models into a single multi-order model. We develop a technique to fit such multi-order models in empirical sequential data and to select the optimal maximum order. Our framework facilitates both next-element and full sequence prediction given a sequence-prefix of any length. We evaluate our model based on six empirical data sets containing sequences from website navigation as well as public transport systems. The results show that our method out-performs state-of-the-art algorithms for next-element prediction. We further demonstrate the accuracy of our method during out-of-sample sequence prediction and validate that our method can scale to data sets with millions of sequences.

data mining, machine learning, prediction, (15 more...)

2007.06662

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(2 more...)

Genre: Research Report > New Finding (0.88)

Industry: Transportation > Infrastructure & Services (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Shariff, Roshan, Szepesvári, Csaba

Efficient Planning in Large MDPs with Weak Linear Function Approximation

Large-scale Markov decision processes (MDPs) require planning algorithms with runtime independent of the number of states of the MDP. We consider the planning problem in MDPs using linear value function approximation with only weak requirements: low approximation error for the optimal value function, and a small set of "core" states whose features span those of other states. In particular, we make no assumptions about the representability of policies or value functions of non-optimal policies. Our algorithm produces almost-optimal actions for any state using a generative oracle (simulator) for the MDP, while its computation time scales polynomially with the number of features, core states, and actions and the effective horizon.

artificial intelligence, machine learning, reinforcement learning, (22 more...)

2007.06184

Country:

North America > Canada > Alberta (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
(2 more...)

#artificialintelligenceJul-12-2020, 02:51:33 GMT

zx

Edit: If you want to see MarkovComposer in action, but you don't want to mess with Java code, you can access a web version of it here. In the following article, I'll present some of the research I've been working on lately. Algorithms, or algorithmic composition, have been used to compose music for centuries. For example, Western punctus contra punctum can be sometimes reduced to algorithmic determinacy. Then, why not use fast-learning computers capable of billions of calculations per second to do what they do best, to follow algorithms?

artificial intelligence, machine learning, matrix, (14 more...)

#artificialintelligence

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Wang, Feng, Zhong, Chen, Gursoy, M. Cenk, Velipasalar, Senem

Adversarial jamming attacks and defense strategies via adaptive deep reinforcement learning

arXiv.org Artificial IntelligenceJul-12-2020

As the applications of deep reinforcement learning (DRL) in wireless communications grow, sensitivity of DRL based wireless communication strategies against adversarial attacks has started to draw increasing attention. In order to address such sensitivity and alleviate the resulting security concerns, we in this paper consider a victim user that performs DRL-based dynamic channel access, and an attacker that executes DRLbased jamming attacks to disrupt the victim. Hence, both the victim and attacker are DRL agents and can interact with each other, retrain their models, and adapt to opponents' policies. In this setting, we initially develop an adversarial jamming attack policy that aims at minimizing the accuracy of victim's decision making on dynamic channel access. Subsequently, we devise defense strategies against such an attacker, and propose three defense strategies, namely diversified defense with proportional-integral-derivative (PID) control, diversified defense with an imitation attacker, and defense via orthogonal policies. We design these strategies to maximize the attacked victim's accuracy and evaluate their performances.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

doi: 10.1109/ACCESS.2021.3133506

2007.06055

Country:

North America > United States > New York > Onondaga County > Syracuse (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

arXiv.org Artificial IntelligenceJul-11-2020

Learning Abstract Models for Strategic Exploration and Fast Reward Transfer

Liu, Evan Zheran, Keramati, Ramtin, Seshadri, Sudarshan, Guu, Kelvin, Pasupat, Panupong, Brunskill, Emma, Liang, Percy

Model-based reinforcement learning (RL) is appealing because (i) it enables planning and thus more strategic exploration, and (ii) by decoupling dynamics from rewards, it enables fast transfer to new reward functions. However, learning an accurate Markov Decision Process (MDP) over high-dimensional states (e.g., raw pixels) is extremely challenging because it requires function approximation, which leads to compounding errors. Instead, to avoid compounding errors, we propose learning an abstract MDP over abstract states: low-dimensional coarse representations of the state (e.g., capturing agent position, ignoring other objects). We assume access to an abstraction function that maps the concrete states to abstract states. In our approach, we construct an abstract MDP, which grows through strategic exploration via planning. Similar to hierarchical RL approaches, the abstract actions of the abstract MDP are backed by learned subpolicies that navigate between abstract states. Our approach achieves strong results on three of the hardest Arcade Learning Environment games (Montezuma's Revenge, Pitfall!, and Private Eye), including superhuman performance on Pitfall! without demonstrations. After training on one task, we can reuse the learned abstract MDP for new reward functions, achieving higher reward in 1000x fewer samples than model-free methods trained from scratch.

machine learning, reinforcement learning, transition, (16 more...)

2007.05896

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
Africa > Togo (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Education (0.87)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

arXiv.org Artificial IntelligenceJul-10-2020

A Survey on Autonomous Vehicle Control in the Era of Mixed-Autonomy: From Physics-Based to AI-Guided Driving Policy Learning

Di, Xuan, Shi, Rongye

This paper serves as an introduction and overview of the potentially useful models and methodologies from artificial intelligence (AI) into the field of transportation engineering for autonomous vehicle (AV) control in the era of mixed autonomy. We will discuss state-of-the-art applications of AI-guided methods, identify opportunities and obstacles, raise open questions, and help suggest the building blocks and areas where AI could play a role in mixed autonomy. We divide the stage of autonomous vehicle (AV) deployment into four phases: the pure HVs, the HV-dominated, the AVdominated, and the pure AVs. This paper is primarily focused on the latter three phases. It is the first-of-its-kind survey paper to comprehensively review literature in both transportation engineering and AI for mixed traffic modeling. Models used for each phase are summarized, encompassing game theory, deep (reinforcement) learning, and imitation learning. While reviewing the methodologies, we primarily focus on the following research questions: (1) What scalable driving policies are to control a large number of AVs in mixed traffic comprised of human drivers and uncontrollable AVs? (2) How do we estimate human driver behaviors? (3) How should the driving behavior of uncontrollable AVs be modeled in the environment? (4) How are the interactions between human drivers and autonomous vehicles characterized? Hopefully this paper will not only inspire our transportation community to rethink the conventional models that are developed in the data-shortage era, but also reach out to other disciplines, in particular robotics and machine learning, to join forces towards creating a safe and efficient mixed traffic ecosystem.

deep learning, neural network, vehicle, (21 more...)

2007.05156

Country:

Asia (1.00)
North America > United States > California (0.45)

Genre:

Overview (1.00)
Research Report > New Finding (0.48)
Research Report > Experimental Study (0.34)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
Leisure & Entertainment > Games > Computer Games (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Fruit, Ronan, Pirotta, Matteo, Lazaric, Alessandro

Improved Analysis of UCRL2 with Empirical Bernstein Inequality

arXiv.org Machine LearningJul-10-2020

We consider the problem of exploration-exploitation in communicating Markov Decision Processes. We provide an analysis of UCRL2 with Empirical Bernstein inequalities (UCRL2B). For any MDP with $S$ states, $A$ actions, $\Gamma \leq S$ next states and diameter $D$, the regret of UCRL2B is bounded as $\widetilde{O}(\sqrt{D\Gamma S A T})$.

artificial intelligence, inequality, machine learning, (17 more...)

2007.05456

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)