AITopics | Langford, John

Plotting

Langford, John

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Guaranteed Discovery of Control-Endogenous Latent States with Multi-Step Inverse Models

Lamb, Alex, Islam, Riashat, Efroni, Yonathan, Didolkar, Aniket, Misra, Dipendra, Foster, Dylan, Molu, Lekan, Chari, Rajan, Krishnamurthy, Akshay, Langford, John

arXiv.org Artificial IntelligenceDec-27-2022

In many sequential decision-making tasks, the agent is not able to model the full complexity of the world, which consists of multitudes of relevant and irrelevant information. For example, a person walking along a city street who tries to model all aspects of the world would quickly be overwhelmed by a multitude of shops, cars, and people moving in and out of view, each following their own complex and inscrutable dynamics. Is it possible to turn the agent's firehose of sensory information into a minimal latent state that is both necessary and sufficient for an agent to successfully act in the world? We formulate this question concretely, and propose the Agent Control-Endogenous State Discovery algorithm (AC-State), which has theoretical guarantees and is practically demonstrated to discover the minimal control-endogenous latent state which contains all of the information necessary for controlling the agent, while fully discarding all irrelevant information. This algorithm consists of a multi-step inverse model (predicting actions from distant observations) with an information bottleneck. AC-State enables localization, exploration, and navigation without reward or demonstrations. We demonstrate the discovery of the control-endogenous latent state in three domains: localizing a robot arm with distractions (e.g., changing lighting conditions and background), exploring a maze alongside other agents, and navigating in the Matterport house simulator.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2207.08229

Country: North America > Canada > Quebec (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

Towards Data-Driven Offline Simulations for Online Reinforcement Learning

Tang, Shengpu, Frujeri, Felipe Vieira, Misra, Dipendra, Lamb, Alex, Langford, John, Mineiro, Paul, Kochman, Sebastian

arXiv.org Artificial IntelligenceNov-14-2022

Modern decision-making systems, from robots to web recommendation engines, are expected to adapt: to user preferences, changing circumstances or even new tasks. Yet, it is still uncommon to deploy a dynamically learning agent (rather than a fixed policy) to a production system, as it's perceived as unsafe. Using historical data to reason about learning algorithms, similar to offline policy evaluation (OPE) applied to fixed policies, could help practitioners evaluate and ultimately deploy such adaptive agents to production. In this work, we formalize offline learner simulation (OLS) for reinforcement learning (RL) and propose a novel evaluation protocol that measures both fidelity and efficiency of the simulation. For environments with complex high-dimensional observations, we propose a semi-parametric approach that leverages recent advances in latent state discovery in order to achieve accurate and efficient offline simulations. In preliminary experiments, we show the advantage of our approach compared to fully non-parametric baselines.

machine learning, reinforcement learning, simulation, (20 more...)

arXiv.org Artificial Intelligence

2211.07614

Country: North America > United States (0.68)

Genre:

Research Report (0.50)
Instructional Material > Online (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.86)

Add feedback

Interaction-Grounded Learning

Xie, Tengyang, Langford, John, Mineiro, Paul, Momennejad, Ida

arXiv.org Machine LearningJun-9-2021

Consider a prosthetic arm, learning to adapt to its user's control signals. We propose Interaction-Grounded Learning for this novel setting, in which a learner's goal is to interact with the environment with no grounding or explicit reward to optimize its policies. Such a problem evades common RL solutions which require an explicit reward. The learning agent observes a multidimensional context vector, takes an action, and then observes a multidimensional feedback vector. This multidimensional feedback vector has no explicit reward information. In order to succeed, the algorithm must learn how to evaluate the feedback vector to discover a latent reward signal, with which it can ground its policies without supervision. We show that in an Interaction-Grounded Learning setting, with certain natural assumptions, a learner can discover the latent reward and ground its policy for successful interaction. We provide theoretical guarantees and a proof-of-concept empirical evaluation to demonstrate the effectiveness of our proposed approach.

health & medicine, interaction-grounded learning, neurology, (20 more...)

arXiv.org Machine Learning

2106.04887

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Resonance: Replacing Software Constants with Context-Aware Models in Real-time Communication

Gupchup, Jayant, Aazami, Ashkan, Fan, Yaran, Filipi, Senja, Finley, Tom, Inglis, Scott, Asteborg, Marcus, Caroll, Luke, Chari, Rajan, Cozowicz, Markus, Gopal, Vishak, Prakash, Vinod, Bendapudi, Sasikanth, Gerrits, Jack, Lau, Eric, Liu, Huazhou, Rossi, Marco, Slobodianyk, Dima, Birjukov, Dmitri, Cooper, Matty, Javar, Nilesh, Perednya, Dmitriy, Srinivasan, Sriram, Langford, John, Cutler, Ross, Gehrke, Johannes

arXiv.org Artificial IntelligenceNov-22-2020

Large software systems tune hundreds of'constants' to optimize their runtime performance. These values are commonly derived through intuition, lab tests, or A/B tests. A'one-size-fits-all' approach is often sub-optimal as the best value depends on runtime context. In this paper, we provide an experimental approach to replace constants with learned contextual functions for Skype-a widely used realtime communication (RTC) application. We present Resonance, a system based on contextual bandits (CB). We describe experiences from three real-world experiments: applying it to the audio, video, and transport components in Skype. We surface a unique and practical challenge of performing machine learning (ML) inference in large software systems written using encapsulation principles. Finally, we open-source FeatureBroker, a library to reduce the friction in adopting ML models in such development environments.

artificial intelligence, featurebroker, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2011.12715

Country: North America > Canada (0.14)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications (0.96)

Add feedback

Better Parameter-free Stochastic Optimization with ODE Updates for Coin-Betting

Chen, Keyi, Langford, John, Orabona, Francesco

arXiv.org Machine LearningOct-28-2020

Parameter-free stochastic gradient descent (PFSGD) algorithms do not require setting learning rates while achieving optimal theoretical performance. In practical applications, however, there remains an empirical gap between tuned stochastic gradient descent (SGD) and PFSGD. In this paper, we close the empirical gap with a new parameter-free algorithm based on continuous-time Coin-Betting on truncated models. The new update is derived through the solution of an Ordinary Differential Equation (ODE) and solved in a closed form. We show empirically that this new parameter-free algorithm outperforms algorithms with the "best default" learning rates and almost matches the performance of finely tuned baselines without anything to tune.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

2006.07507

Country:

North America > United States > Virginia (0.14)
North America > United States > California (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.94)

Add feedback

Learning the Linear Quadratic Regulator from Nonlinear Observations

Mhammedi, Zakaria, Foster, Dylan J., Simchowitz, Max, Misra, Dipendra, Sun, Wen, Krishnamurthy, Akshay, Rakhlin, Alexander, Langford, John

arXiv.org Machine LearningOct-8-2020

We introduce a new problem setting for continuous control called the LQR with Rich Observations, or RichLQR. In our setting, the environment is summarized by a low-dimensional continuous latent state with linear dynamics and quadratic costs, but the agent operates on high-dimensional, nonlinear observations such as images from a camera. To enable sample-efficient learning, we assume that the learner has access to a class of decoder functions (e.g., neural networks) that is flexible enough to capture the mapping from observations to latent states. We introduce a new algorithm, RichID, which learns a near-optimal policy for the RichLQR with sample complexity scaling only with the dimension of the latent state space and the capacity of the decoder function class. RichID is oracle-efficient and accesses the decoder class only through calls to a least-squares regression oracle. Our results constitute the first provable sample complexity guarantee for continuous control with an unknown nonlinearity in the system model and general function approximation.

artificial intelligence, machine learning, probability, (18 more...)

arXiv.org Machine Learning

2010.03799

Country: North America > United States (0.14)

Genre:

Research Report > New Finding (0.34)
Overview > Growing Problem (0.33)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Efficient Contextual Bandits with Continuous Actions

Majzoubi, Maryam, Zhang, Chicheng, Chari, Rajan, Krishnamurthy, Akshay, Langford, John, Slivkins, Aleksandrs

arXiv.org Machine LearningJun-10-2020

We create a computationally tractable algorithm for contextual bandits with continuous actions having unknown structure. Our reduction-style algorithm composes with most supervised learning representations. We prove that it works in a general sense and verify the new functionality with large-scale experiments.

algorithm, artificial intelligence, upstream oil & gas, (19 more...)

arXiv.org Machine Learning

2006.0604

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas > Upstream (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining (0.93)

Add feedback

Off-policy evaluation for slate recommendation

Swaminathan, Adith, Krishnamurthy, Akshay, Agarwal, Alekh, Dudik, Miro, Langford, John, Jose, Damien, Zitouni, Imed

Neural Information Processing SystemsMar-20-2020, 13:45:23 GMT

This paper studies the evaluation of policies that recommend an ordered set of items (e.g., a ranking) based on some context---a common scenario in web search, ads, and recommendation. We build on techniques from combinatorial bandits to introduce a new practical estimator that uses logged data to estimate a policy's performance. A thorough empirical evaluation on real-world data reveals that our estimator is accurate in a variety of settings, including as a subroutine in a learning-to-rank task, where it achieves competitive performance. We derive conditions under which our estimator is unbiased---these conditions are weaker than prior heuristics for slate evaluation---and experimentally demonstrate a smaller bias than parametric approaches, even when these conditions are violated. Finally, our theory and experiments also show exponential savings in the amount of required data compared with general unbiased estimators.

artificial intelligence, evaluation, machine learning, (4 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.52)

Add feedback

On Oracle-Efficient PAC RL with Rich Observations

Dann, Christoph, Jiang, Nan, Krishnamurthy, Akshay, Agarwal, Alekh, Langford, John, Schapire, Robert E.

Neural Information Processing SystemsFeb-14-2020, 08:11:51 GMT

We study the computational tractability of PAC reinforcement learning with rich observations. We present new provably sample-efficient algorithms for environments with deterministic hidden state dynamics and stochastic rich observations. These methods operate in an oracle model of computation -- accessing policy and value function classes exclusively through standard optimization primitives -- and therefore represent computationally efficient alternatives to prior algorithms that require enumeration. With stochastic hidden state dynamics, we prove that the only known sample-efficient algorithm, OLIVE, cannot be implemented in the oracle model. We also present several examples that illustrate fundamental challenges of tractable PAC reinforcement learning in such general settings.

artificial intelligence, machine learning, rich observation, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)

Add feedback

Efficient Second Order Online Learning by Sketching

Luo, Haipeng, Agarwal, Alekh, Cesa-Bianchi, Nicolò, Langford, John

Neural Information Processing SystemsFeb-14-2020, 07:11:22 GMT

We propose Sketched Online Newton (SON), an online second order learning algorithm that enjoys substantially improved regret guarantees for ill-conditioned data. SON is an enhanced version of the Online Newton Step, which, via sketching techniques enjoys a running time linear in the dimension and sketch size. We further develop sparse forms of the sketching methods (such as Oja's rule), making the computation linear in the sparsity of features. Together, the algorithm eliminates all computational obstacles in previous second order online learning approaches. Papers published at the Neural Information Processing Systems Conference.

computer based training, educational technology, online learning, (4 more...)

Neural Information Processing Systems

Industry:

Retail > Online (0.70)
Education > Educational Setting > Online (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.70)

Add feedback