AITopics

2201.08115

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.64)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Konan, Sachin, Seraj, Esmaeil, Gombolay, Matthew

Iterated Reasoning with Mutual Information in Cooperative and Byzantine Decentralized Teaming

arXiv.org Artificial IntelligenceJan-20-2022

Information sharing is key in building team cognition and enables coordination and cooperation. High-performing human teams also benefit from acting strategically with hierarchical levels of iterated communication and rationalizability, meaning a human agent can reason about the actions of their teammates in their decision-making. Yet, the majority of prior work in Multi-Agent Reinforcement Learning (MARL) does not support iterated rationalizability and only encourage inter-agent communication, resulting in a suboptimal equilibrium cooperation strategy. In this work, we show that reformulating an agent's policy to be conditional on the policies of its neighboring teammates inherently maximizes Mutual Information (MI) lower-bound when optimizing under Policy Gradient (PG). Building on the idea of decision-making under bounded rationality and cognitive hierarchy theory, we show that our modified PG approach not only maximizes local agent rewards but also implicitly reasons about MI between agents without the need for any explicit ad-hoc regularization terms. Our approach, InfoPG, outperforms baselines in learning emergent collaborative behaviors and sets the state-of-the-art in decentralized cooperative MARL tasks. Our experiments validate the utility of InfoPG by achieving higher sample efficiency and significantly larger cumulative reward in several complex cooperative multi-agent domains.

agent, conference paper, infopg, (16 more...)

2201.08484

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Khamaru, Koulik, Xia, Eric, Wainwright, Martin J., Jordan, Michael I.

Instance-Dependent Confidence and Early Stopping for Reinforcement Learning

arXiv.org Machine LearningJan-20-2022

Various algorithms for reinforcement learning (RL) exhibit dramatic variation in their convergence rates as a function of problem structure. Such problem-dependent behavior is not captured by worst-case analyses and has accordingly inspired a growing effort in obtaining instance-dependent guarantees and deriving instance-optimal algorithms for RL problems. This research has been carried out, however, primarily within the confines of theory, providing guarantees that explain \textit{ex post} the performance differences observed. A natural next step is to convert these theoretical guarantees into guidelines that are useful in practice. We address the problem of obtaining sharp instance-dependent confidence regions for the policy evaluation problem and the optimal value estimation problem of an MDP, given access to an instance-optimal algorithm. As a consequence, we propose a data-dependent stopping rule for instance-optimal algorithms. The proposed stopping rule adapts to the instance-specific difficulty of the problem and allows for early termination for problems with favorable structure.

algorithm, probability, procedure, (15 more...)

2201.08536

Country:

Asia > Middle East > Jordan (0.14)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Mou, Wenlong, Khamaru, Koulik, Wainwright, Martin J., Bartlett, Peter L., Jordan, Michael I.

Optimal variance-reduced stochastic approximation in Banach spaces

arXiv.org Machine LearningJan-20-2022

We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space. Focusing on a stochastic query model that provides noisy evaluations of the operator, we analyze a variance-reduced stochastic approximation scheme, and establish non-asymptotic bounds for both the operator defect and the estimation error, measured in an arbitrary semi-norm. In contrast to worst-case guarantees, our bounds are instance-dependent, and achieve the local asymptotic minimax risk non-asymptotically. For linear operators, contractivity can be relaxed to multi-step contractivity, so that the theory can be applied to problems like average reward policy evaluation problem in reinforcement learning. We illustrate the theory via applications to stochastic shortest path problems, two-player zero-sum Markov games, as well as policy evaluation and $Q$-learning for tabular Markov decision processes.

equation, operator, probability, (14 more...)

2201.08518

Country:

Asia > Middle East > Jordan (0.14)
North America > United States > California (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

arXiv.org Artificial IntelligenceJan-19-2022

Reinforcement Learning Textbook

Ivanov, Sergey

This textbook covers principles behind main modern deep reinforcement learning algorithms that achieved breakthrough results in many domains from game AI to robotics. All required theory is explained with proofs using unified notation and emphasize on the differences between different types of algorithms and the reasons why they are constructed the way they are.

arxiv preprint arxiv, mdp, reinforcement learning, (12 more...)

2201.09746

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Africa > Togo (0.04)

Genre:

Research Report (0.69)
Instructional Material (0.65)

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.92)

Benveniste, Albert, Raclet, Jean-Baptiste

Mixed Nondeterministic-Probabilistic Automata: Blending graphical probabilistic models with nondeterminism

arXiv.org Artificial IntelligenceJan-19-2022

Graphical models in probability and statistics are a core concept in the area of probabilistic reasoning and probabilistic programming-graphical models include Bayesian networks and factor graphs. In this paper we develop a new model of mixed (nondeterministic/probabilistic) automata that subsumes both nondeterministic automata and graphical probabilistic models. Mixed Automata are equipped with parallel composition, simulation relation, and support message passing algorithms inherited from graphical probabilistic models. Segala's Probabilistic Automatacan be mapped to Mixed Automata.

automata, parallel composition, probability, (16 more...)

2201.07474

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(9 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

arXiv.org Artificial IntelligenceJan-18-2022

Prospective Learning: Back to the Future

Vogelstein, Joshua T., Verstynen, Timothy, Kording, Konrad P., Isik, Leyla, Krakauer, John W., Etienne-Cummings, Ralph, Ogburn, Elizabeth L., Priebe, Carey E., Burns, Randal, Kutten, Kwame, Knierim, James J., Potash, James B., Hartung, Thomas, Smirnova, Lena, Worley, Paul, Savonenko, Alena, Phillips, Ian, Miller, Michael I., Vidal, Rene, Sulam, Jeremias, Charles, Adam, Cowan, Noah J., Bichuch, Maxim, Venkataraman, Archana, Li, Chen, Thakor, Nitish, Kebschull, Justus M, Albert, Marilyn, Xu, Jinchong, Shuler, Marshall Hussain, Caffo, Brian, Ratnanather, Tilak, Geisa, Ali, Roh, Seung-Eon, Yezerets, Eva, Madhyastha, Meghana, How, Javier J., Tomita, Tyler M., Dey, Jayanta, Ningyuan, null, Huang, null, Shin, Jong M., Kinfu, Kaleab Alemayehu, Chaudhari, Pratik, Baker, Ben, Schapiro, Anna, Jayaraman, Dinesh, Eaton, Eric, Platt, Michael, Ungar, Lyle, Wehbe, Leila, Kepecs, Adam, Christensen, Amy, Osuagwu, Onyema, Brunton, Bing, Mensh, Brett, Muotri, Alysson R., Silva, Gabriel, Puppo, Francesca, Engert, Florian, Hillman, Elizabeth, Brown, Julia, White, Chris, Yang, Weiwei

Research on both natural intelligence (NI) and artificial intelligence (AI) generally assumes that the future resembles the past: intelligent agents or systems (what we call 'intelligence') observe and act on the world, then use this experience to act on future experiences of the same kind. We call this 'retrospective learning'. For example, an intelligence may see a set of pictures of objects, along with their names, and learn to name them. A retrospective learning intelligence would merely be able to name more pictures of the same objects. We argue that this is not what true intelligence is about. In many real world problems, both NIs and AIs will have to learn for an uncertain future. Both must update their internal models to be useful for future tasks, such as naming fundamentally new objects and using these objects effectively in a new context or to achieve previously unencountered goals. This ability to learn for the future we call 'prospective learning'. We articulate four relevant factors that jointly define prospective learning. Continual learning enables intelligences to remember those aspects of the past which it believes will be most useful in the future. Prospective constraints (including biases and priors) facilitate the intelligence finding general solutions that will be applicable to future problems. Curiosity motivates taking actions that inform future decision making, including in previously unmet situations. Causal estimation enables learning the structure of relations that guide choosing actions for specific outcomes, even when the specific action-outcome contingencies have never been observed before. We argue that a paradigm shift from retrospective to prospective learning will enable the communities that study intelligence to unite and overcome existing bottlenecks to more effectively explain, augment, and engineer intelligences.

intelligence, learning, prospective learning, (14 more...)

2201.07372

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(9 more...)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Games (1.00)
Education (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(5 more...)

arXiv.org Artificial IntelligenceJan-18-2022

Accelerating Representation Learning with View-Consistent Dynamics in Data-Efficient Reinforcement Learning

Huang, Tao, Wang, Jiachen, Chen, Xiao

Learning informative representations from image-based observations is of fundamental concern in deep Reinforcement Learning (RL). However, data-inefficiency remains a significant barrier to this objective. To overcome this obstacle, we propose to accelerate state representation learning by enforcing view-consistency on the dynamics. Firstly, we introduce a formalism of Multi-view Markov Decision Process (MMDP) that incorporates multiple views of the state. Following the structure of MMDP, our method, View-Consistent Dynamics (VCD), learns state representations by training a view-consistent dynamics model in the latent space, where views are generated by applying data augmentation to states. Empirical evaluation on DeepMind Control Suite and Atari-100k demonstrates VCD to be the SoTA data-efficient algorithm on visual control tasks.

dynamic model, representation, vcd, (16 more...)

2201.07016

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Taniguchi, Akira, Murakami, Hiroaki, Ozaki, Ryo, Taniguchi, Tadahiro

Unsupervised Multimodal Word Discovery based on Double Articulation Analysis with Co-occurrence cues

arXiv.org Artificial IntelligenceJan-18-2022

Human infants acquire their verbal lexicon from minimal prior knowledge of language based on the statistical properties of phonological distributions and the co-occurrence of other sensory stimuli. In this study, we propose a novel fully unsupervised learning method discovering speech units by utilizing phonological information as a distributional cue and object information as a co-occurrence cue. The proposed method can not only (1) acquire words and phonemes from speech signals using unsupervised learning, but can also (2) utilize object information based on multiple modalities (i.e., vision, tactile, and auditory) simultaneously. The proposed method is based on the Nonparametric Bayesian Double Articulation Analyzer (NPB-DAA) discovering phonemes and words from phonological features, and Multimodal Latent Dirichlet Allocation (MLDA) categorizing multimodal information obtained from objects. In the experiment, the proposed method showed higher word discovery performance than the baseline methods. In particular, words that expressed the characteristics of the object (i.e., words corresponding to nouns and adjectives) were segmented accurately. Furthermore, we examined how learning performance is affected by differences in the importance of linguistic information. When the weight of the word modality was increased, the performance was further improved compared to the fixed condition.

categorization, information, modality, (16 more...)

2201.06786

Country:

Asia > Japan (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
(3 more...)

Kerimkulov, Bekzhan, Leahy, James-Michael, Šiška, David, Szpruch, Lukasz

Convergence of policy gradient for entropy regularized MDPs with neural network approximation in the mean-field regime

arXiv.org Machine LearningJan-18-2022

We study the global convergence of policy gradient for infinite-horizon, continuous state and action space, entropy-regularized Markov decision processes (MDPs). We consider a softmax policy with (one-hidden layer) neural network approximation in a mean-field regime. Additional entropic regularization in the associated mean-field probability measure is added, and the corresponding gradient flow is studied in the 2-Wasserstein metric. We show that the objective function is increasing along the gradient flow. Further, we prove that if the regularization in terms of the mean-field measure is sufficient, the gradient flow converges exponentially fast to the unique stationary solution, which is the unique maximizer of the regularized MDP objective. Lastly, we study the sensitivity of the value function along the gradient flow with respect to regularization parameters and the initial condition. Our results rely on the careful analysis of non-linear Fokker--Planck--Kolmogorov equation and extend the pioneering work of Mei et al. 2020 and Agarwal et al. 2020, which quantify the global convergence rate of policy gradient for entropy-regularized MDPs in the tabular setting.

equation, gradient flow, theorem 3, (15 more...)

2201.07296

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
North America > United States > New York (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)