AITopics | Fox, Roy

Plotting

Fox, Roy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Target Entropy Annealing for Discrete Soft Actor-Critic

Xu, Yaosheng, Hu, Dailin, Liang, Litian, McAleer, Stephen, Abbeel, Pieter, Fox, Roy

arXiv.org Artificial IntelligenceDec-6-2021

Soft Actor-Critic (SAC) is considered the state-of-the-art algorithm in continuous action space settings. It uses the maximum entropy framework for efficiency and stability, and applies a heuristic temperature Lagrange term to tune the temperature $\alpha$, which determines how "soft" the policy should be. It is counter-intuitive that empirical evidence shows SAC does not perform well in discrete domains. In this paper we investigate the possible explanations for this phenomenon and propose Target Entropy Scheduled SAC (TES-SAC), an annealing method for the target entropy parameter applied on SAC. Target entropy is a constant in the temperature Lagrange term and represents the target policy entropy in discrete SAC. We compare our method on Atari 2600 games with different constant target entropy SAC, and analyze on how our scheduling affects SAC.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2112.02852

Country: North America > United States > California (0.28)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.35)

Add feedback

Count-Based Temperature Scheduling for Maximum Entropy Reinforcement Learning

Hu, Dailin, Abbeel, Pieter, Fox, Roy

arXiv.org Artificial IntelligenceNov-28-2021

Maximum Entropy Reinforcement Learning (MaxEnt RL) algorithms such as Soft Q-Learning (SQL) and Soft Actor-Critic trade off reward and policy entropy, which has the potential to improve training stability and robustness. Most MaxEnt RL methods, however, use a constant tradeoff coefficient (temperature), contrary to the intuition that the temperature should be high early in training to avoid overfitting to noisy value estimates and decrease later in training as we increasingly trust high value estimates to truly lead to good rewards. Moreover, our confidence in value estimates is state-dependent, increasing every time we use more evidence to update an estimate. In this paper, we present a simple state-based temperature scheduling approach, and instantiate it for SQL as Count-Based Soft Q-Learning (CBSQL). We evaluate our approach on a toy domain as well as in several Atari 2600 domains and show promising results.

artificial intelligence, machine learning, reinforcement learning, (11 more...)

arXiv.org Artificial Intelligence

2111.14204

Country: North America > United States > California (0.28)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.61)

Add feedback

Temporal-Difference Value Estimation via Uncertainty-Guided Soft Updates

Liang, Litian, Xu, Yaosheng, McAleer, Stephen, Hu, Dailin, Ihler, Alexander, Abbeel, Pieter, Fox, Roy

arXiv.org Artificial IntelligenceOct-27-2021

Temporal-Difference (TD) learning methods, such as Q-Learning, have proven effective at learning a policy to perform control tasks. One issue with methods like Q-Learning is that the value update introduces bias when predicting the TD target of a unfamiliar state. Estimation noise becomes a bias after the max operator in the policy improvement step, and carries over to value estimations of other states, causing Q-Learning to overestimate the Q value. Algorithms like Soft Q-Learning (SQL) introduce the notion of a soft-greedy policy, which reduces the estimation bias via soft updates in early stages of training. However, the inverse temperature $\beta$ that controls the softness of an update is usually set by a hand-designed heuristic, which can be inaccurate at capturing the uncertainty in the target estimate. Under the belief that $\beta$ is closely related to the (state dependent) model uncertainty, Entropy Regularized Q-Learning (EQL) further introduces a principled scheduling of $\beta$ by maintaining a collection of the model parameters that characterizes model uncertainty. In this paper, we present Unbiased Soft Q-Learning (UQL), which extends the work of EQL from two action, finite state spaces to multi-action, infinite state space Markov Decision Processes. We also provide a principled numerical scheduling of $\beta$, extended from SQL and using model uncertainty, during the optimization process. We show the theoretical guarantees and the effectiveness of this update method in experiments on several discrete control environments.

machine learning, reinforcement learning, teaching method, (18 more...)

arXiv.org Artificial Intelligence

2110.14818

Country: North America > United States > California (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Modular Framework for Visuomotor Language Grounding

Nottingham, Kolby, Liang, Litian, Shin, Daeyun, Fowlkes, Charless C., Fox, Roy, Singh, Sameer

arXiv.org Artificial IntelligenceSep-5-2021

Natural language instruction following tasks serve as a valuable test-bed for grounded language and robotics research. However, data collection for these tasks is expensive and end-to-end approaches suffer from data inefficiency. We propose the structuring of language, acting, and visual tasks into separate modules that can be trained independently. Using a Language, Action, and Vision (LAV) framework removes the dependence of action and vision modules on instruction following datasets, making them more efficient to train. We also present a preliminary evaluation of LAV on the ALFRED task for visual and interactive instruction following.

artificial intelligence, module, natural language, (16 more...)

arXiv.org Artificial Intelligence

2109.02161

Country: North America > United States > California (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)

Add feedback

Improving Social Welfare While Preserving Autonomy via a Pareto Mediator

McAleer, Stephen, Lanier, John, Dennis, Michael, Baldi, Pierre, Fox, Roy

arXiv.org Artificial IntelligenceJun-7-2021

Machine learning algorithms often make decisions on behalf of agents with varied and sometimes conflicting interests. In domains where agents can choose to take their own action or delegate their action to a central mediator, an open question is how mediators should take actions on behalf of delegating agents. The main existing approach uses delegating agents to punish non-delegating agents in an attempt to get all agents to delegate, which tends to be costly for all. We introduce a Pareto Mediator which aims to improve outcomes for delegating agents without making any of them worse off. Our experiments in random normal form games, a restaurant recommendation game, and a reinforcement learning sequential social dilemma show that the Pareto Mediator greatly increases social welfare. Also, even when the Pareto Mediator is based on an incorrect model of agent utility, performance gracefully degrades to the pre-intervention level, due to the individual autonomy preserved by the voluntary mediator.

agent, artificial intelligence, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2106.03927

Country: North America > United States > California (0.47)

Genre: Research Report (0.50)

Industry:

Consumer Products & Services > Restaurants (0.87)
Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

XDO: A Double Oracle Algorithm for Extensive-Form Games

McAleer, Stephen, Lanier, John, Baldi, Pierre, Fox, Roy

arXiv.org Artificial IntelligenceMar-10-2021

Policy Space Response Oracles (PSRO) is a deep reinforcement learning algorithm for two-player zero-sum games that has empirically found approximate Nash equilibria in large games. Although PSRO is guaranteed to converge to a Nash equilibrium, it may take an exponential number of iterations as the number of infostates grows. We propose Extensive-Form Double Oracle (XDO), an extensive-form double oracle algorithm that is guaranteed to converge to an approximate Nash equilibrium linearly in the number of infostates. Unlike PSRO, which mixes best responses at the root of the game, XDO mixes best responses at every infostate. We also introduce Neural XDO (NXDO), where the best response is learned through deep RL. In tabular experiments on Leduc poker, we find that XDO achieves an approximate Nash equilibrium in a number of iterations 1-2 orders of magnitude smaller than PSRO. In experiments on a modified Leduc poker game, we show that tabular XDO achieves over 11x lower exploitability than CFR and over 82x lower exploitability than PSRO and XFP in the same amount of time. We also show that NXDO beats PSRO and is competitive with NFSP on a large no-limit poker game.

artificial intelligence, game theory, iteration, (18 more...)

arXiv.org Artificial Intelligence

2103.06426

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A* Search Without Expansions: Learning Heuristic Functions with Deep Q-Networks

Agostinelli, Forest, Shmakov, Alexander, McAleer, Stephen, Fox, Roy, Baldi, Pierre

arXiv.org Artificial IntelligenceFeb-8-2021

A* search is an informed search algorithm that uses a heuristic function to guide the order in which nodes are expanded. Since the computation required to expand a node and compute the heuristic values for all of its generated children grows linearly with the size of the action space, A* search can become impractical for problems with large action spaces. This computational burden becomes even more apparent when heuristic functions are learned by general, but computationally expensive, deep neural networks. To address this problem, we introduce DeepCubeAQ, a deep reinforcement learning and search algorithm that builds on the DeepCubeA algorithm and deep Q-networks. DeepCubeAQ learns a heuristic function that, with a single forward pass through a deep neural network, computes the sum of the transition cost and the heuristic value of all of the children of a node without explicitly generating any of the children, eliminating the need for node expansions. DeepCubeAQ then uses a novel variant of A* search, called AQ* search, that uses the deep Q-network to guide search. We use DeepCubeAQ to solve the Rubik's cube when formulated with a large action space that includes 1872 meta-actions and show that this 157-fold increase in the size of the action space incurs less than a 4-fold increase in computation time when performing AQ* search and that AQ* search is orders of magnitude faster than A* search.

action space, deep learning, neural network, (20 more...)

arXiv.org Artificial Intelligence

2102.04518

Country: North America > United States > California (0.15)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

Add feedback

Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games

McAleer, Stephen, Lanier, John, Fox, Roy, Baldi, Pierre

arXiv.org Artificial IntelligenceJun-15-2020

Finding approximate Nash equilibria in zero-sum imperfect-information games is challenging when the number of information states is large. Policy Space Response Oracles (PSRO) is a deep reinforcement learning algorithm grounded in game theory that is guaranteed to converge to an approximate Nash equilibrium. However, PSRO requires training a reinforcement learning policy at each iteration, making it too slow for large games. We show through counterexamples and experiments that DCH and Rectified PSRO, two existing approaches to scaling up PSRO, fail to converge even in small games. We introduce Pipeline PSRO (P2SRO), the first scalable general method for finding approximate Nash equilibria in large zero-sum imperfect-information games. P2SRO is able to parallelize PSRO with convergence guarantees by maintaining a hierarchical pipeline of reinforcement learning workers, each training against the policies generated by lower levels in the hierarchy. We show that unlike existing methods, P2SRO converges to an approximate Nash equilibrium, and does so faster as the number of parallel workers increases, across a variety of imperfect information games. We also introduce an open-source environment for Barrage Stratego, a variant of Stratego with an approximate game tree complexity of $10^{50}$. P2SRO is able to achieve state-of-the-art performance on Barrage Stratego and beats all existing bots.

computer game, game theory, nash equilibrium, (18 more...)

arXiv.org Artificial Intelligence

2006.08555

Country: North America > United States > California > Orange County > Irvine (0.15)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A multi-agent control framework for co-adaptation in brain-computer interfaces

Merel, Josh S., Fox, Roy, Jebara, Tony, Paninski, Liam

Neural Information Processing SystemsDec-31-2013

In a closed-loop brain-computer interface (BCI), adaptive decoders are used to learn parameters suited to decoding the user's neural response. Feedback to the user provides information which permits the neural tuning to also adapt. We present an approach to model this process of co-adaptation between the encoding model of the neural signal and the decoding algorithm as a multi-agent formulation of the linear quadratic Gaussian (LQG) control problem. In simulation we characterize how decoding performance improves as the neural encoding and adaptive decoder optimize, qualitatively resembling experimentally demonstrated closed-loop improvement. We then propose a novel, modified decoder update rule which is aware of the fact that the encoder is also changing and show it can improve simulated co-adaptation dynamics. Our modeling approach offers promise for gaining insights into co-adaptation as well as improving user learning of BCI control in practical settings.

decoder, neural network, neurology, (20 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Israel (0.14)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Neuroscience (0.70)

Add feedback

Bounded Planning in Passive POMDPs

Fox, Roy, Tishby, Naftali

arXiv.org Artificial IntelligenceJun-27-2012

In Passive POMDPs actions do not affect the world state, but still incur costs. When the agent is bounded by information-processing constraints, it can only keep an approximation of the belief. We present a variational principle for the problem of maintaining the information which is most useful for minimizing the cost, and introduce an efficient and simple algorithm for finding an optimum.

artificial intelligence, information, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1206.6405

Country:

North America > United States (0.28)
Asia > Middle East > Israel (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback