AITopics | Krishnamurthy, Akshay

Plotting

Krishnamurthy, Akshay

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Offline Reinforcement Learning: Fundamental Barriers for Value Function Approximation

Foster, Dylan J., Krishnamurthy, Akshay, Simchi-Levi, David, Xu, Yunzong

arXiv.org Machine LearningNov-21-2021

We consider the offline reinforcement learning problem, where the aim is to learn a decision making policy from logged data. Offline RL -- particularly when coupled with (value) function approximation to allow for generalization in large or continuous state spaces -- is becoming increasingly relevant in practice, because it avoids costly and time-consuming online data collection and is well suited to safety-critical domains. Existing sample complexity guarantees for offline value function approximation methods typically require both (1) distributional assumptions (i.e., good coverage) and (2) representational assumptions (i.e., ability to represent some or all $Q$-value functions) stronger than what is required for supervised learning. However, the necessity of these conditions and the fundamental limits of offline RL are not well understood in spite of decades of research. This led Chen and Jiang (2019) to conjecture that concentrability (the most standard notion of coverage) and realizability (the weakest representation condition) alone are not sufficient for sample-efficient offline RL. We resolve this conjecture in the positive by proving that in general, even if both concentrability and realizability are satisfied, any algorithm requires sample complexity polynomial in the size of the state space to learn a non-trivial policy. Our results show that sample-efficient offline reinforcement learning requires either restrictive coverage conditions or representation conditions that go beyond supervised learning, and highlight a phenomenon called over-coverage which serves as a fundamental barrier for offline value function approximation methods. A consequence of our results for reinforcement learning with linear function approximation is that the separation between online and offline RL can be arbitrarily large, even in constant dimension.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2111.10919

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report > New Finding (0.74)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Universal and data-adaptive algorithms for model selection in linear contextual bandits

Muthukumar, Vidya, Krishnamurthy, Akshay

arXiv.org Machine LearningNov-8-2021

Model selection in contextual bandits is an important complementary problem to regret minimization with respect to a fixed model class. We consider the simplest non-trivial instance of model-selection: distinguishing a simple multi-armed bandit problem from a linear contextual bandit problem. Even in this instance, current state-of-the-art methods explore in a suboptimal manner and require strong "feature-diversity" conditions. In this paper, we introduce new algorithms that a) explore in a data-adaptive manner, and b) provide model selection guarantees of the form $\mathcal{O}(d^{\alpha} T^{1- \alpha})$ with no feature diversity conditions whatsoever, where $d$ denotes the dimension of the linear model and $T$ denotes the total number of rounds. The first algorithm enjoys a "best-of-both-worlds" property, recovering two prior results that hold under distinct distributional assumptions, simultaneously. The second removes distributional assumptions altogether, expanding the scope for tractable model selection. Our approach extends to model selection among nested linear contextual bandits under some additional assumptions.

data mining, machine learning, model selection, (20 more...)

arXiv.org Machine Learning

2111.04688

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Efficient First-Order Contextual Bandits: Prediction, Allocation, and Triangular Discrimination

Foster, Dylan J., Krishnamurthy, Akshay

arXiv.org Machine LearningJul-5-2021

In the contextual bandit problem, a learning agent repeatedly makes decisions based on contextual information, with the goal of learning a decision-making policy that minimizes their total loss over time. This model captures simple reinforcement learning tasks in which the agent must learn to make high-quality decisions in an uncertain environment, but does not need to engage in long-term planning or credit assignment. Owing to the availability of high-quality engineered reward metrics, contextual bandit algorithms are now routinely deployed in production for online personalization systems (Agarwal et al., 2016; Tewari and Murphy, 2017). Contextual bandits encompass both the general problem of statistical learning with function approximation (specifically, cost-sensitive classification) and the classical multi-armed bandit problem, yet present algorithmic challenges greater than the sum of both parts. In spite of these difficulties, extensive research effort over the past decade has resulted in efficient, general-purpose algorithms, as well as a sharp understanding of the optimal worst-case sample complexity (Auer et al., 2002a; Beygelzimer et al., 2011; Agarwal et al., 2014; Foster and Rakhlin, 2020; Simchi-Levi and Xu, 2020). While the algorithmic and statistical foundations for contextual bandits are beginning to take shape, we still lack an understanding of adaptive or data-dependent algorithms that can go beyond the worst case and exploit nice properties of real-world instances for better performance. This is in stark contrast to supervised statistical learning, where adaptivity has substantial theory, and where standard algorithms (e.g., empirical risk minimization) are known to automatically adapt to nice data (Bousquet et al., 2003). For contextual bandits, adaptivity poses new challenges that seem to require algorithmic innovation, and a major research frontier is to develop algorithmic principles for adaptivity and an understanding of the fundamental limits. To highlight the lack of understanding for adaptive and data-dependent algorithms, a COLT 2017 open problem posed by Agarwal, Krishnamurthy, Langford, Luo, and Schapire (Agarwal et al., 2017) asks whether

algorithm, artificial intelligence, big data, (18 more...)

arXiv.org Machine Learning

2107.02237

Country: Europe > United Kingdom > England (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Bayesian decision-making under misspecified priors with applications to meta-learning

Simchowitz, Max, Tosh, Christopher, Krishnamurthy, Akshay, Hsu, Daniel, Lykouris, Thodoris, Dudík, Miroslav, Schapire, Robert E.

arXiv.org Machine LearningJul-3-2021

Thompson sampling and other Bayesian sequential decision-making algorithms are among the most popular approaches to tackle explore/exploit trade-offs in (contextual) bandits. The choice of prior in these algorithms offers flexibility to encode domain knowledge but can also lead to poor performance when misspecified. In this paper, we demonstrate that performance degrades gracefully with misspecification. We prove that the expected reward accrued by Thompson sampling (TS) with a misspecified prior differs by at most $\tilde{\mathcal{O}}(H^2 \epsilon)$ from TS with a well specified prior, where $\epsilon$ is the total-variation distance between priors and $H$ is the learning horizon. Our bound does not require the prior to have any parametric form. For priors with bounded support, our bound is independent of the cardinality or structure of the action space, and we show that it is tight up to universal constants in the worst case. Building on our sensitivity analysis, we establish generic PAC guarantees for algorithms in the recently studied Bayesian meta-learning setting and derive corollaries for various families of priors. Our results generalize along two axes: (1) they apply to a broader family of Bayesian decision-making algorithms, including a Monte-Carlo implementation of the knowledge gradient algorithm (KG), and (2) they apply to Bayesian POMDPs, the most general Bayesian decision-making setting, encompassing contextual bandits as a special case. Through numerical simulations, we illustrate how prior misspecification and the deployment of one-step look-ahead (as in KG) can impact the convergence of meta-learning in multi-armed and contextual bandits with structured and correlated priors.

algorithm, bayesian inference, educational setting, (18 more...)

arXiv.org Machine Learning

2107.01509

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Investigating the Role of Negatives in Contrastive Representation Learning

Ash, Jordan T., Goel, Surbhi, Krishnamurthy, Akshay, Misra, Dipendra

arXiv.org Machine LearningJun-18-2021

Noise contrastive learning is a popular technique for unsupervised representation learning. In this approach, a representation is obtained via reduction to supervised learning, where given a notion of semantic similarity, the learner tries to distinguish a similar (positive) example from a collection of random (negative) examples. The success of modern contrastive learning pipelines relies on many parameters such as the choice of data augmentation, the number of negative examples, and the batch size; however, there is limited understanding as to how these parameters interact and affect downstream performance. We focus on disambiguating the role of one of these parameters: the number of negative examples. Theoretically, we show the existence of a collision-coverage trade-off suggesting that the optimal number of negative examples should scale with the number of underlying concepts in the data. Empirically, we scrutinize the role of the number of negatives in both NLP and vision tasks. In the NLP task, we find that the results broadly agree with our theory, while our vision experiments are murkier with performance sometimes even being insensitive to the number of negatives. We discuss plausible explanations for this behavior and suggest future directions to better align theory and practice.

artificial intelligence, inductive learning, machine learning, (3 more...)

arXiv.org Machine Learning

2106.09943

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.93)

Add feedback

Model-free Representation Learning and Exploration in Low-rank MDPs

Modi, Aditya, Chen, Jinglin, Krishnamurthy, Akshay, Jiang, Nan, Agarwal, Alekh

arXiv.org Machine LearningFeb-13-2021

The low rank MDP has emerged as an important model for studying representation learning and exploration in reinforcement learning. With a known representation, several model-free exploration strategies exist. In contrast, all algorithms for the unknown representation setting are model-based, thereby requiring the ability to model the full dynamics. In this work, we present the first model-free representation learning algorithms for low rank MDPs. The key algorithmic contribution is a new minimax representation learning objective, for which we provide variants with differing tradeoffs in their statistical and computational properties. We interleave this representation learning step with an exploration strategy to cover the state space in a reward-free manner. The resulting algorithms are provably sample efficient and can accommodate general function approximation to scale to complex environments.

artificial intelligence, null, reinforcement learning, (18 more...)

arXiv.org Machine Learning

2102.07035

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Learning the Linear Quadratic Regulator from Nonlinear Observations

Mhammedi, Zakaria, Foster, Dylan J., Simchowitz, Max, Misra, Dipendra, Sun, Wen, Krishnamurthy, Akshay, Rakhlin, Alexander, Langford, John

arXiv.org Machine LearningOct-8-2020

We introduce a new problem setting for continuous control called the LQR with Rich Observations, or RichLQR. In our setting, the environment is summarized by a low-dimensional continuous latent state with linear dynamics and quadratic costs, but the agent operates on high-dimensional, nonlinear observations such as images from a camera. To enable sample-efficient learning, we assume that the learner has access to a class of decoder functions (e.g., neural networks) that is flexible enough to capture the mapping from observations to latent states. We introduce a new algorithm, RichID, which learns a near-optimal policy for the RichLQR with sample complexity scaling only with the dimension of the latent state space and the capacity of the decoder function class. RichID is oracle-efficient and accesses the decoder class only through calls to a least-squares regression oracle. Our results constitute the first provable sample complexity guarantee for continuous control with an unknown nonlinearity in the system model and general function approximation.

artificial intelligence, machine learning, probability, (18 more...)

arXiv.org Machine Learning

2010.03799

Country: North America > United States (0.14)

Genre:

Research Report > New Finding (0.34)
Overview > Growing Problem (0.33)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Private Reinforcement Learning with PAC and Regret Guarantees

Vietri, Giuseppe, Balle, Borja, Krishnamurthy, Akshay, Wu, Zhiwei Steven

arXiv.org Machine LearningSep-18-2020

Motivated by high-stakes decision-making domains like personalized medicine where user information is inherently sensitive, we design privacy preserving exploration policies for episodic reinforcement learning (RL). We first provide a meaningful privacy formulation using the notion of joint differential privacy (JDP)--a strong variant of differential privacy for settings where each user receives their own sets of output (e.g., policy recommendations). We then develop a private optimism-based learning algorithm that simultaneously achieves strong PAC and regret bounds, and enjoys a JDP guarantee. Our algorithm only pays for a moderate privacy cost on exploration: in comparison to the non-private bounds, the privacy parameter only appears in lower-order terms. Finally, we present lower bounds on sample complexity and regret for reinforcement learning subject to JDP.

artificial intelligence, privacy, us government, (20 more...)

arXiv.org Machine Learning

2009.09052

Country: North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Contrastive learning, multi-view redundancy, and linear models

Tosh, Christopher, Krishnamurthy, Akshay, Hsu, Daniel

arXiv.org Machine LearningAug-23-2020

Self-supervised learning is an empirically successful approach to unsupervised learning based on creating artificial supervised learning problems. A popular self-supervised approach to representation learning is contrastive learning, which leverages naturally occurring pairs of similar and dissimilar data points, or multiple views of the same data. This work provides a theoretical analysis of contrastive learning in the multi-view setting, where two views of each datum are available. The main result is that linear functions of the learned representations are nearly optimal on downstream prediction tasks whenever the two views provide redundant information about the label.

artificial intelligence, machine learning, representation, (17 more...)

arXiv.org Machine Learning

2008.1015

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.47)

Add feedback

FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs

Agarwal, Alekh, Kakade, Sham, Krishnamurthy, Akshay, Sun, Wen

arXiv.org Machine LearningJul-22-2020

The ability to learn effective transformations of complex data sources, sometimes called representation learning, is an essential primitive in modern machine learning, leading to remarkable achievements in language modeling, vision, and serving as a partial explanation for the success of deep learning more broadly (Bengio et al., 2013). In Reinforcement Learning (RL), several works have shown empirically that learning succinct representations of perceptual inputs can accelerate the search for decision-making policies (Pathak et al., 2017; Tang et al., 2017; Oord et al., 2018; Srinivas et al., 2020). However, representation learning for RL is far more subtle than it is for supervised learning (Du et al., 2019a; Van Roy and Dong, 2019; Lattimore and Szepesvari, 2019), and the theoretical foundations of representation learning for RL are nascent. The first question that arises in this context is: what is a good representation? Intuitively, a good representation should help us achieve greater sample efficiency on downstream tasks.

artificial intelligence, reinforcement learning, representation, (17 more...)

arXiv.org Machine Learning

2006.10814

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback