AITopics

2102.10252

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Florida > Hillsborough County > Tampa (0.04)
North America > Canada > Ontario > Toronto (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.35)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
(5 more...)

arXiv.org Machine LearningFeb-19-2021

Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning

Liao, Luofeng, Fu, Zuyue, Yang, Zhuoran, Kolar, Mladen, Wang, Zhaoran

In offline reinforcement learning (RL) an optimal policy is learnt solely from a priori collected observational data. However, in observational data, actions are often confounded by unobserved variables. Instrumental variables (IVs), in the context of RL, are the variables whose influence on the state variables are all mediated through the action. When a valid instrument is present, we can recover the confounded transition dynamics through observational data. We study a confounded Markov decision process where the transition dynamics admit an additive nonlinear functional form. Using IVs, we derive a conditional moment restriction (CMR) through which we can identify transition dynamics based on observational data. We propose a provably efficient IV-aided Value Iteration (IVVI) algorithm based on a primal-dual reformulation of CMR. To the best of our knowledge, this is the first provably efficient algorithm for instrument-aided offline RL.

algorithm, assumption, observational data, (13 more...)

2102.09907

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.87)

Deep Latent Competition: Learning to Race Using Visual Control Policies in Latent Space

Schwarting, Wilko, Seyde, Tim, Gilitschenski, Igor, Liebenwein, Lucas, Sander, Ryan, Karaman, Sertac, Rus, Daniela

Learning competitive behaviors in multi-agent settings such as racing requires long-term reasoning about potential adversarial interactions. This paper presents Deep Latent Competition (DLC), a novel reinforcement learning algorithm that learns competitive visual control policies through self-play in imagination. The DLC agent imagines multi-agent interaction sequences in the compact latent space of a learned world model that combines a joint transition function with opponent viewpoint prediction. Imagined self-play reduces costly sample generation in the real world, while the latent representation enables planning to scale gracefully with observation dimensionality. We demonstrate the effectiveness of our algorithm in learning competitive behaviors on a novel multi-agent racing benchmark that requires planning from image observations. Code and videos available at https://sites.google.com/view/deep-latent-competition.

agent, learning, prediction, (16 more...)

2102.09812

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.05)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Zhang, Honghua, Juba, Brendan, Broeck, Guy Van den

Probabilistic Generating Circuits

Generating functions, which are widely used in combinatorics and probability theory, encode function values into the coefficients of a polynomial. In this paper, we explore their use as a tractable probabilistic model, and propose probabilistic generating circuits (PGCs) for their efficient representation. PGCs strictly subsume many existing tractable probabilistic models, including determinantal point processes (DPPs), probabilistic circuits (PCs) such as sum-product networks, and tractable graphical models. We contend that PGCs are not just a theoretical framework that unifies vastly different existing models, but also show huge potential in modeling realistic data. We exhibit a simple class of PGCs that are not trivially subsumed by simple combinations of PCs and DPPs, and obtain competitive performance on a suite of density estimation benchmarks. We also highlight PGCs' connection to the theory of strongly Rayleigh distributions.

generating polynomial, pgc, polynomial, (16 more...)

2102.09768

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Africa > South Sudan > Equatoria > Central Equatoria > Juba (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Wu, Minchao, Norrish, Michael, Walder, Christian, Dezfouli, Amir

TacticZero: Learning to Prove Theorems from Scratch with Deep Reinforcement Learning

We propose a novel approach to interactive theorem-proving (ITP) using deep reinforcement learning. Unlike previous work, our framework is able to prove theorems both end-to-end and from scratch (i.e., without relying on example proofs from human experts). We formulate the process of ITP as a Markov decision process (MDP) in which each state represents a set of potential derivation paths. The agent learns to select promising derivations as well as appropriate tactics within each derivation using deep policy gradients. This structure allows us to introduce a novel backtracking mechanism which enables the agent to efficiently discard (predicted) dead-end derivations and restart the derivation from promising alternatives. Experimental results show that the framework provides comparable performance to that of the approaches that use human experts, and that it is also capable of proving theorems that it has never seen during training. We further elaborate the role of each component of the framework using ablation studies.

latexit latexit sha1, latexit sha1, theorem, (14 more...)

2102.09756

Country:

Oceania > Australia > Australian Capital Territory > Canberra (0.04)
North America > United States > North Carolina > Wake County > Morrisville (0.04)
North America > United States > New York > New York County > New York City (0.04)
(10 more...)

Genre:

Research Report > New Finding (0.48)
Instructional Material > Course Syllabus & Notes (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Nioche, Aurélien, Murena, Pierre-Alexandre, de la Torre-Ortiz, Carlos, Oulasvirta, Antti

Improving Artificial Teachers by Considering How People Learn and Forget

Applications for self-regulated teaching are very popular (e.g., with Duolingo estimates of 100M downloads from Google Play at the time of writing). One of the central challenges for research on intelligent user interfaces is to identify algorithmic principles that can pick the best interventions for reliably improving human learning toward stated objectives in light of realistically obtainable data on the user. The computational problem we study is how, when given some learning materials, we can organize them into lessons and reviews such that, over time, human learning is maximized with respect to a set learning objective. Predicting the effects of teaching interventions on human learning is challenging, however. Firstly, the state of user memory is both latent (that is, not directly observable) and non-stationary (that is, evolving over time, on account of such effects as loss of activation and interference), and an intervention that is ideal for one user may be a poor choice for another user -- there are large individual-to-individual differences in forgetting and recall.

learner, probability, psychologist, (15 more...)

doi: 10.1145/3397481.3450696

2102.04174

Country:

North America > United States > Texas > Brazos County > College Station (0.15)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre:

Instructional Material (1.00)
Research Report > Experimental Study (0.68)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (0.93)
Information Technology > Human Computer Interaction > Interfaces (0.87)
Information Technology > Data Science > Data Mining (0.68)
(3 more...)

Farhi, Elad I., Indelman, Vadim

iX-BSP: Incremental Belief Space Planning

arXiv.org Artificial IntelligenceFeb-18-2021

Deciding what's next? is a fundamental problem in robotics and Artificial Intelligence. Under belief space planning (BSP), in a partially observable setting, it involves calculating the expected accumulated belief-dependent reward, where the expectation is with respect to all future measurements. Since solving this general un-approximated problem quickly becomes intractable, state of the art approaches turn to approximations while still calculating planning sessions from scratch. In this work we propose a novel paradigm, Incremental BSP (iX-BSP), based on the key insight that calculations across planning sessions are similar in nature and can be appropriately re-used. We calculate the expectation incrementally by utilizing Multiple Importance Sampling techniques for selective re-sampling and re-use of measurement from previous planning sessions. The formulation of our approach considers general distributions and accounts for data association aspects. We demonstrate how iX-BSP could benefit existing approximations of the general problem, introducing iML-BSP, which re-uses calculations across planning sessions under the common Maximum Likelihood assumption. We evaluate both methods and demonstrate a substantial reduction in computation time while statistically preserving accuracy. The evaluation includes both simulation and real-world experiments considering autonomous vision-based navigation and SLAM. As a further contribution, we introduce to iX-BSP the non-integral wildfire approximation, allowing one to trade accuracy for computational performance by averting from updating re-used beliefs when they are "close enough". We evaluate iX-BSP under wildfire demonstrating a substantial reduction in computation time while controlling the accuracy sacrifice. We also provide analytical and empirical bounds of the effect wildfire holds over the objective value.

planning & scheduling, planning session, upstream oil & gas, (23 more...)

2102.09539

Country:

Asia > Middle East > Israel (0.14)
Europe > Spain (0.14)

Genre:

Workflow (1.00)
Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Industry: Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.92)
(3 more...)

arXiv.org Artificial IntelligenceFeb-18-2021

Learning Memory-Dependent Continuous Control from Demonstrations

Hou, Siqing, Han, Dongqi, Tani, Jun

Efficient exploration has presented a long-standing challenge in reinforcement learning, especially when rewards are sparse. A developmental system can overcome this difficulty by learning from both demonstrations and self-exploration. However, existing methods are not applicable to most real-world robotic controlling problems because they assume that environments follow Markov decision processes (MDP); thus, they do not extend to partially observable environments where historical observations are necessary for decision making. This paper builds on the idea of replaying demonstrations for memory-dependent continuous control, by proposing a novel algorithm, Recurrent Actor-Critic with Demonstration and Experience Replay (READER). Experiments involving several memory-crucial continuous control tasks reveal significantly reduce interactions with the environment using our method with a reasonably small number of demonstration samples. The algorithm also shows better sample efficiency and learning capabilities than a baseline reinforcement learning algorithm for memory-based control from demonstrations.

agent, algorithm, demonstration, (15 more...)

2102.09208

Country:

Asia > Japan > Kyūshū & Okinawa > Okinawa (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > South Korea (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.89)

Khodadadian, Sajad, Chen, Zaiwei, Maguluri, Siva Theja

Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm

arXiv.org Machine LearningFeb-18-2021

Reinforcement Learning (RL) is a paradigm where an agent aims at maximizing its cumulative reward by searching for an optimal policy, in an environment modeled as a Markov Decision Process (MDP) (Sutton and Barto, 2018). RL algorithms have achieved tremendous successes in a wide range of applications such as self-driving cars with Deep Deterministic Policy Gradient (DDPG) (Lillicrap et al., 2015), and AlphaGo in the game of Go (Silver et al., 2016). The algorithms in RL can be categorized into value space methods, such as Q-learning (Watkins and Dayan, 1992), TD-learning (Sutton, 1988), and policy space methods, such as actor-critic (AC) (Konda and Tsitsiklis, 2000). Despite great empirical successes (Bahdanau et al., 2016; Wang et al., 2016), the finite-sample convergence of AC type of algorithms are not completely characterized theoretically. An AC algorithm can be thought as a generalized policy iteration (Puterman, 1995), and consists of two phases, namely actor and critic. The objective of the actor is to improve the policy, while the critic aims at evaluating the performance of a specific policy. A step of the actor can be thought as a step of Stochastic Gradient Ascent (Bottou et al., 2018) with preconditioning.

algorithm, convergence, theorem 2, (12 more...)

2102.09318

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment > Games > Go (0.54)
Information Technology (0.54)
Transportation > Passenger (0.34)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Erdemir, Ecenaz, Dragotti, Pier Luigi, Gunduz, Deniz

Active Privacy-utility Trade-off Against a Hypothesis Testing Adversary

arXiv.org Machine LearningFeb-18-2021

We consider a user releasing her data containing some personal information in return of a service. We model user's personal information as two correlated random variables, one of them, called the secret variable, is to be kept private, while the other, called the useful variable, is to be disclosed for utility. We consider active sequential data release, where at each time step the user chooses from among a finite set of release mechanisms, each revealing some information about the user's personal information, i.e., the true hypotheses, albeit with different statistics. The user manages data release in an online fashion such that maximum amount of information is revealed about the latent useful variable, while the confidence for the sensitive variable is kept below a predefined level. For the utility, we consider both the probability of correct detection of the useful variable and the mutual information (MI) between the useful variable and released data. We formulate both problems as a Markov decision process (MDP), and numerically solve them by advantage actor-critic (A2C) deep reinforcement learning (RL).

adversary, hypothesis, information, (16 more...)

2102.08308

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > United Kingdom > England > East Sussex > Brighton (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry:

Information Technology > Security & Privacy (1.00)
Energy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
(2 more...)