AITopics

2107.08577

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.82)

Singh, Ishika, Singh, Gargi, Modi, Ashutosh

Pre-trained Language Models as Prior Knowledge for Playing Text-based Games

arXiv.org Artificial IntelligenceJul-18-2021

Recently, text world games have been proposed to enable artificial agents to understand and reason about real-world scenarios. These text-based games are challenging for artificial agents, as it requires understanding and interaction using natural language in a partially observable environment. In this paper, we improve the semantic understanding of the agent by proposing a simple RL with LM framework where we use transformer-based language models with Deep RL models. We perform a detailed study of our framework to demonstrate how our model outperforms all existing agents on the popular game, Zork1, to achieve a score of 44.7, which is 1.6 higher than the state-of-the-art model. Our proposed approach also performs comparably to the state-of-the-art models on the other set of text games.

lantern, qvalue, white house, (17 more...)

2107.08408

Country:

North America > Canada > Ontario > Toronto (0.04)
Europe > Germany > Berlin (0.04)
Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)
(2 more...)

Genre: Research Report (0.90)

Industry:

Education (0.67)
Leisure & Entertainment > Games > Computer Games (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

Lechiakh, Mohamed, Maurer, Alexandre

FEBR: Expert-Based Recommendation Framework for beneficial and personalized content

arXiv.org Artificial IntelligenceJul-17-2021

So far, most research on recommender systems focused on maintaining long-term user engagement and satisfaction, by promoting relevant and personalized content. However, it is still very challenging to evaluate the quality and the reliability of this content. In this paper, we propose FEBR (Expert-Based Recommendation Framework), an apprenticeship learning framework to assess the quality of the recommended content on online platforms. The framework exploits the demonstrated trajectories of an expert (assumed to be reliable) in a recommendation evaluation environment, to recover an unknown utility function. This function is used to learn an optimal policy describing the expert's behavior, which is then used in the framework to provide high-quality and personalized recommendations. We evaluate the performance of our solution through a user interest simulation environment (using RecSim). We simulate interactions under the aforementioned expert policy for videos recommendation, and compare its efficiency with standard recommendation methods. The results show that our approach provides a significant gain in terms of content quality, evaluated by experts and watched by users, while maintaining almost the same watch time as the baseline approaches.

recommendation, reward function, video, (15 more...)

2108.01455

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.05)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
(11 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Luo, Haipeng, Wei, Chen-Yu, Lee, Chung-Wei

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses

arXiv.org Machine LearningJul-17-2021

Policy optimization methods are among the most widely-used methods in reinforcement learning. Its empirical success has been demonstrated in various domains such as computer games [Schulman et al., 2017] and robotics [Levine and Koltun, 2013]. However, due to its local-search nature, global optimality guarantees of policy optimization often rely on unrealistic assumptions to ensure global exploration (see e.g., [Abbasi-Yadkori et al., 2019, Agarwal et al., 2020b, Neu and Olkhovskaya, 2020, Wei et al., 2021]), making it theoretically less appealing compared to other methods. Motivated by this issue, a line of recent works [Cai et al., 2020, Shani et al., 2020, Agarwal et al., 2020a, Zanette et al., 2021] equip policy optimization with global exploration by adding exploration bonuses to the update, and prove favorable guarantees even without making extra exploratory assumptions. Moreover, they all demonstrate some robustness aspect of policy optimization (such as being able to handle adversarial losses or a certain degree of model misspecification). Despite these important progresses, however, many limitations still exist, including worse regret rates comparing to the best value-based or model-based approaches [Shani et al., 2020, Agarwal et al., 2020a, Zanette et al., 2021], or requiring full-information feedback on the entire loss function (as opposed to the more realistic bandit feedback) [Cai et al., 2020]. To address these issues, in this work, we propose a new type of exploration bonuses called dilated bonuses, which satisfies a certain dilated Bellman equation and provably leads to improved exploration compared to existing works (Section 3). We apply this general idea to advance the state-of-the-art of policy optimization for learning finite-horizon episodic MDPs with adversarial losses and bandit feedback. More specifically, our main results are: - First, in the tabular setting, addressing the main open question left in [Shani et al., 2020], we improve their Õ(T

algorithm, policy optimization, probability, (14 more...)

2107.08346

Country:

North America > United States > California (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.49)

Industry: Leisure & Entertainment (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)
(2 more...)

Birmpa, Panagiota, Feng, Jinchao, Katsoulakis, Markos A., Rey-Bellet, Luc

Model Uncertainty and Correctability for Directed Graphical Models

arXiv.org Machine LearningJul-17-2021

Probabilistic graphical models are a fundamental tool in probabilistic modeling, machine learning and artificial intelligence. They allow us to integrate in a natural way expert knowledge, physical modeling, heterogeneous and correlated data and quantities of interest. For exactly this reason, multiple sources of model uncertainty are inherent within the modular structure of the graphical model. In this paper we develop information-theoretic, robust uncertainty quantification methods and non-parametric stress tests for directed graphical models to assess the effect and the propagation through the graph of multi-sourced model uncertainties to quantities of interest. These methods allow us to rank the different sources of uncertainty and correct the graphical model by targeting its most impactful components with respect to the quantities of interest. Thus, from a machine learning perspective, we provide a mathematically rigorous approach to correctability that guarantees a systematic selection for improvement of components of a graphical model while controlling potential new errors created in the process in other parts of the model. We demonstrate our methods in two physico-chemical examples, namely quantum scale-informed chemical kinetics and materials screening to improve the efficiency of fuel cells.

bayesian network, model sensitivity index, model uncertainty, (12 more...)

2107.08179

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report (0.81)

Industry:

Government > Military (0.45)
Government > Regional Government > North America Government > United States Government (0.45)
Energy > Renewable > Hydrogen (0.34)
Energy > Energy Storage (0.34)

Technology:

Information Technology > Artificial Intelligence > Systems & Languages (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

arXiv.org Artificial IntelligenceJul-17-2021

Vision-Based Autonomous Car Racing Using Deep Imitative Reinforcement Learning

Cai, Peide, Wang, Hengli, Huang, Huaiyang, Liu, Yuxuan, Liu, Ming

Autonomous car racing is a challenging task in the robotic control area. Traditional modular methods require accurate mapping, localization and planning, which makes them computationally inefficient and sensitive to environmental changes. Recently, deep-learning-based end-to-end systems have shown promising results for autonomous driving/racing. However, they are commonly implemented by supervised imitation learning (IL), which suffers from the distribution mismatch problem, or by reinforcement learning (RL), which requires a huge amount of risky interaction data. In this work, we present a general deep imitative reinforcement learning approach (DIRL), which successfully achieves agile autonomous racing using visual inputs. The driving knowledge is acquired from both IL and model-based RL, where the agent can learn from human teachers as well as perform self-improvement by safely interacting with an offline world model. We validate our algorithm both in a high-fidelity driving simulation and on a real-world 1/20-scale RC-car with limited onboard computation. The evaluation results demonstrate that our method outperforms previous IL and RL methods in terms of sample efficiency and task performance. Demonstration videos are available at https://caipeide.github.io/autorace-dirl/

learning, prediction, world model, (16 more...)

2107.08325

Country:

Asia > China > Hong Kong (0.04)
North America > United States > New York (0.04)
Asia > China > Guangdong Province (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)
Automobiles & Trucks (1.00)
Transportation > Passenger (0.70)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Sokota, Samuel, de Witt, Christian Schroeder, Igl, Maximilian, Zintgraf, Luisa, Torr, Philip, Whiteson, Shimon, Foerster, Jakob

Implicit Communication as Minimum Entropy Coupling

arXiv.org Artificial IntelligenceJul-17-2021

In many common-payoff games, achieving good performance requires players to develop protocols for communicating their private information implicitly -- i.e., using actions that have non-communicative effects on the environment. Multi-agent reinforcement learning practitioners typically approach this problem using independent learning methods in the hope that agents will learn implicit communication as a byproduct of expected return maximization. Unfortunately, independent learning methods are incapable of doing this in many settings. In this work, we isolate the implicit communication problem by identifying a class of partially observable common-payoff games, which we call implicit referential games, whose difficulty can be attributed to implicit communication. Next, we introduce a principled method based on minimum entropy coupling that leverages the structure of implicit referential games, yielding a new perspective on implicit communication. Lastly, we show that this method can discover performant implicit communication protocols in settings with very large spaces of messages.

information, receiver, sender, (14 more...)

2107.08295

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre:

Overview (0.67)
Research Report (0.64)

Industry:

Information Technology (0.68)
Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.75)

Gabrié, Marylou, Rotskoff, Grant M., Vanden-Eijnden, Eric

Efficient Bayesian Sampling Using Normalizing Flows to Assist Markov Chain Monte Carlo Methods

arXiv.org Machine LearningJul-16-2021

Markov Chain Monte Carlo (MCMC) algorithms (Liu, Since no data set from the target posterior distribution 2008) are nowadays the methods of choice to sample complex is available beforehand, the flow is typically posterior distributions. MCMC methods generate a trained using the reverse Kullback-Leibler (KL) sequence of configurations over which the time average of divergence that only requires samples from a base any suitable observable converges towards its ensemble average distribution. This strategy may perform poorly over some target distribution, here the posterior. This when the posterior is complicated and hard to is achieved by proposing new samples from a proposal density sample with an untrained normalizing flow. Here that is easy to sample, then accepting or rejecting them we explore a distinct training strategy, using the using a criterion that guarantees that the transition kernel of direct KL divergence as loss, in which samples the chain is in detailed balance with respect to the posterior from the posterior are generated by (i) assisting density: a popular choice is Metropolis-Hastings criterion.

algorithm 1, efficient bayesian sampling, normalizing flow, (13 more...)

2107.08001

Country:

North America > United States > New York > New York County > New York City (0.05)
Asia > Middle East > Jordan (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.61)

arXiv.org Artificial IntelligenceJul-15-2021

High-level Decisions from a Safe Maneuver Catalog with Reinforcement Learning for Safe and Cooperative Automated Merging

Kamran, Danial, Ren, Yu, Lauer, Martin

Reinforcement learning (RL) has recently been used for solving challenging decision-making problems in the context of automated driving. However, one of the main drawbacks of the presented RL-based policies is the lack of safety guarantees, since they strive to reduce the expected number of collisions but still tolerate them. In this paper, we propose an efficient RL-based decision-making pipeline for safe and cooperative automated driving in merging scenarios. The RL agent is able to predict the current situation and provide high-level decisions, specifying the operation mode of the low level planner which is responsible for safety. In order to learn a more generic policy, we propose a scalable RL architecture for the merging scenario that is not sensitive to changes in the environment configurations. According to our experiments, the proposed RL agent can efficiently identify cooperative drivers from their vehicle state history and generate interactive maneuvers, resulting in faster and more comfortable automated driving. At the same time, thanks to the safety constraints inside the planner, all of the maneuvers are collision free and safe.

ground transportation, upstream oil & gas, vehicle, (20 more...)

2107.07413

Country: Europe > Germany (0.15)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Information Technology > Robotics & Automation (0.75)
Energy > Oil & Gas (0.74)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Erdmenger, Johanna, Grosvenor, Kevin T., Jefferson, Ro

Towards quantifying information flows: relative entropy in deep neural networks and the renormalization group

arXiv.org Machine LearningJul-14-2021

We investigate the analogy between the renormalization group (RG) and deep neural networks, wherein subsequent layers of neurons are analogous to successive steps along the RG. In particular, we quantify the flow of information by explicitly computing the relative entropy or Kullback-Leibler divergence in both the one- and two-dimensional Ising models under decimation RG, as well as in a feedforward neural network as a function of depth. We observe qualitatively identical behavior characterized by the monotonic increase to a parameter-dependent asymptotic value. On the quantum field theory side, the monotonic increase confirms the connection between the relative entropy and the c-theorem. For the neural networks, the asymptotic behavior may have implications for various information maximization methods in machine learning, as well as for disentangling compactness and generalizability. Furthermore, while both the two-dimensional Ising model and the random neural networks we consider exhibit non-trivial critical points, the relative entropy appears insensitive to the phase structure of either system. In this sense, more refined probes are required in order to fully elucidate the flow of information in these models.

entropy, neural network, relative entropy, (15 more...)

2107.06898

Country:

Europe > Germany > Bavaria > Lower Franconia > Würzburg (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
North America > United States > North Carolina (0.04)
(2 more...)

Genre:

Workflow (0.67)
Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)