AITopics | log pi

Collaborating Authors

log pi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Imitation-Regularized Offline Learning

Ma, Yifei, Wang, Yu-Xiang, Balakrishnan, null, Narayanaswamy, null

arXiv.org Machine LearningJan-15-2019

We study the problem of offline learning in automated decision systems under the contextual bandits model. We are given logged historical data consisting of contexts, (randomized) actions, and (nonnegative) rewards. A common goal is to evaluate what would happen if different actions were taken in the same contexts, so as to optimize the action policies accordingly. The typical approach to this problem, inverse probability weighted estimation (IPWE) [Bottou et al., 2013], requires logged action probabilities, which may be missing in practice due to engineering complications. Even when available, small action probabilities cause large uncertainty in IPWE, rendering the corresponding results insignificant. To solve both problems, we show how one can use policy improvement (PIL) objectives, regularized by policy imitation (IML). We motivate and analyze PIL as an extension to Clipped-IPWE, by showing that both are lower-bound surrogates to the vanilla IPWE. We also formally connect IML to IPWE variance estimation [Swaminathan and Joachims 2015] and natural policy gradients. Without probability logging, our PIL-IML interpretations justify and improve, by reward-weighting, the state-of-art cross-entropy (CE) loss that predicts the action items among all action candidates available in the same contexts. With probability logging, our main theoretical contribution connects IML-underfitting to the existence of either confounding variables or model misspecification. We show the value and accuracy of our insights by simulations based on Simpson's paradox, standard UCI multiclass-to-bandit conversions and on the Criteo counterfactual analysis challenge dataset.

ipwe, objective, probability, (15 more...)

arXiv.org Machine Learning

1901.04723

Country: Asia > Japan > Kyūshū & Okinawa > Okinawa (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Add feedback

Communication-Efficient Distributed Reinforcement Learning

Chen, Tianyi, Zhang, Kaiqing, Giannakis, Georgios B., Başar, Tamer

arXiv.org Machine LearningDec-7-2018

This paper studies the distributed reinforcement learning (DRL) problem involving a central controller and a group of learners. Two DRL settings that find broad applications are considered: multi-agent reinforcement learning (RL) and parallel RL. In both settings, frequent information exchange between the learners and the controller are required. However, for many distributed systems, e.g., parallel machines for training deep RL algorithms, and multi-robot systems for learning the optimal coordination strategies, the overhead caused by frequent communication is not negligible and becomes the bottleneck of the overall performance. To overcome this challenge, we develop a new policy gradient method that is amenable to efficient implementation in such communication-constrained settings. By adaptively skipping the policy gradient communication, our method can reduce the communication overhead without degrading the learning accuracy. Analytically, we can establish that i) the convergence rate of our algorithm is the same as the vanilla policy gradient for the DRL tasks; and, ii) if the distributed computing units are heterogeneous in terms of their reward functions and initial state distributions, the number of communication rounds needed to achieve a targeted learning accuracy is reduced. Numerical experiments on a popular multi-agent RL benchmark corroborate the significant communication reduction of our algorithm compared to the alternatives.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

1812.03239

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.70)

Industry: Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Policy Optimization with Second-Order Advantage Information

Li, Jiajin, Wang, Baoxiang

arXiv.org Artificial IntelligenceMay-9-2018

Policy optimization on high-dimensional continuous control tasks exhibits its difficulty caused by the large variance of the policy gradient estimators. We present the action subspace dependent gradient (ASDG) estimator which incorporates the Rao-Blackwell theorem (RB) and Control Variates (CV) into a unified framework to reduce the variance. To invoke RB, our proposed algorithm (POSA) learns the underlying factorization structure among the action space based on the second-order advantage information. POSA captures the quadratic information explicitly and efficiently by utilizing the wide & deep architecture. Empirical studies show that our proposed approach demonstrates the performance improvements on high-dimensional synthetic settings and OpenAI Gym's MuJoCo continuous control tasks.

artificial intelligence, estimator, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1805.03586

Genre: Research Report (0.40)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

The Stochastic complexity of spin models: How simple are simple spin models?

Beretta, Alberto, Battistin, Claudia, de Mulatier, Clélia, Mastromatteo, Iacopo, Marsili, Matteo

arXiv.org Machine LearningApr-12-2017

The Stochastic complexity of spin models: How simple are simple spin models? Alberto Beretta, 1 Claudia Battistin, 2 Cl elia de Mulatier, 1 Iacopo Mastromatteo, 3 and Matteo Marsili 1 1 The Abdus Salam International Centre for Theoretical Physics (ICTP), Strada Costiera 11, I-34014 Trieste, Italy 2 Kavli Institute for Systems Neuroscience and Centre for Neural Computation, Olav Kyrres gate 9, 7030 Trondheim, Norway 3 Capital Fund Management, 23 rue de l'Universit e, 75007 Paris, France Simple models, in information theoretic terms, are those with a small stochastic complexity. We study the stochastic complexity of spin models with interactions of arbitrary order. Invariance with respect to bijections within the space of operators allows us to classify models in complexity classes. This invariance also shows that simplicity is not related to the order of the interactions, but rather to their mutual arrangement.

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Machine Learning

1702.07549

Country:

Europe > Norway > Central Norway > Trøndelag > Trondheim (0.24)
Europe > Italy > Friuli Venezia Giulia > Trieste Province > Trieste (0.24)
Europe > France > Île-de-France > Paris > Paris (0.24)

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback