AITopics | Krishnamurthy, Akshay

Collaborating Authors

Krishnamurthy, Akshay

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Information Theoretic Regret Bounds for Online Nonlinear Control

Kakade, Sham, Krishnamurthy, Akshay, Lowrey, Kendall, Ohnishi, Motoya, Sun, Wen

arXiv.org Machine LearningJun-22-2020

This work studies the problem of sequential control in an unknown, nonlinear dynamical system, where we model the underlying system dynamics as an unknown function in a known Reproducing Kernel Hilbert Space. This framework yields a general setting that permits discrete and continuous control inputs as well as non-smooth, non-differentiable dynamics. Our main result, the Lower Confidence-based Continuous Control ($LC^3$) algorithm, enjoys a near-optimal $O(\sqrt{T})$ regret bound against the optimal controller in episodic settings, where $T$ is the number of episodes. The bound has no explicit dependence on dimension of the system dynamics, which could be infinite, but instead only depends on information theoretic quantities. We empirically show its application to a number of nonlinear control tasks and demonstrate the benefit of exploration for learning model dynamics.

artificial intelligence, lc 3, reinforcement learning, (15 more...)

arXiv.org Machine Learning

2006.12466

Genre: Research Report (0.82)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.67)

Add feedback

Sample-Efficient Reinforcement Learning of Undercomplete POMDPs

Jin, Chi, Kakade, Sham M., Krishnamurthy, Akshay, Liu, Qinghua

arXiv.org Artificial IntelligenceJun-22-2020

In many sequential decision making settings, the agent lacks complete information about the underlying state of the system, a phenomenon known as partial observability. Partial observability significantly complicates the tasks of reinforcement learning and planning, because the non-Markovian nature of the observations forces the agent to maintain memory and reason about beliefs of the system state, all while exploring to collect information about the environment. For example, a robot may not be able to perceive all objects in the environment due to occlusions, and it must reason about how these objects may move to avoid collisions [Cassandra et al., 1996]. Similar reasoning problems arise in imperfect information games [Brown and Sandholm, 2018], medical diagnosis [Hauskrecht and Fraser, 2000], and elsewhere [Rafferty et al., 2011]. Furthermore, from a theoretical perspective, well-known complexity-theoretic results show that learning and planning in partially observable environments is statistically and computationally intractable in general [Papadimitriou and Tsitsiklis, 1987, Mundhenk et al., 2000, Vlassis et al., 2012, Mossel and Roch, 2005]. The standard formulation for reinforcement learning with partial observability is the Partially Observable Markov Decision Process (POMDP), in which an agent operating on noisy observations makes decisions that influence the evolution of a latent state. The complexity barriers apply for this model, but they are of a worst case nature, and they do not preclude efficient algorithms for interesting subclasses of POMDPs. Thus we ask: Can we develop efficient algorithms for reinforcement learning in large classes of POMDPs?

artificial intelligence, health & medicine, pomdp, (17 more...)

arXiv.org Artificial Intelligence

2006.12484

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Provably adaptive reinforcement learning in metric spaces

Cao, Tongyi, Krishnamurthy, Akshay

arXiv.org Machine LearningJun-18-2020

We study reinforcement learning in continuous state and action spaces endowed with a metric. We provide a refined analysis of the algorithm of Sinclair, Banerjee, and Yu (2019) and show that its regret scales with the \emph{zooming dimension} of the instance. This parameter, which originates in the bandit literature, captures the size of the subsets of near optimal actions and is always smaller than the covering dimension used in previous analyses. As such, our results are the first provably adaptive guarantees for reinforcement learning in metric spaces.

artificial intelligence, dimension, reinforcement learning, (15 more...)

arXiv.org Machine Learning

2006.10875

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.63)

Add feedback

Open Problem: Model Selection for Contextual Bandits

Foster, Dylan J., Krishnamurthy, Akshay, Luo, Haipeng

arXiv.org Machine LearningJun-18-2020

In statistical learning, algorithms for model selection allow the learner to adapt to the complexity of the best hypothesis class in a sequence. We ask whether similar guarantees are possible for contextual bandit learning.

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Machine Learning

2006.1094

Country:

North America > United States > Massachusetts (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.40)

Industry: Education > Educational Setting (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Efficient Contextual Bandits with Continuous Actions

Majzoubi, Maryam, Zhang, Chicheng, Chari, Rajan, Krishnamurthy, Akshay, Langford, John, Slivkins, Aleksandrs

arXiv.org Machine LearningJun-10-2020

We create a computationally tractable algorithm for contextual bandits with continuous actions having unknown structure. Our reduction-style algorithm composes with most supervised learning representations. We prove that it works in a general sense and verify the new functionality with large-scale experiments.

algorithm, artificial intelligence, upstream oil & gas, (19 more...)

arXiv.org Machine Learning

2006.0604

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas > Upstream (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining (0.93)

Add feedback

Off-policy evaluation for slate recommendation

Swaminathan, Adith, Krishnamurthy, Akshay, Agarwal, Alekh, Dudik, Miro, Langford, John, Jose, Damien, Zitouni, Imed

Neural Information Processing SystemsMar-20-2020, 13:45:23 GMT

This paper studies the evaluation of policies that recommend an ordered set of items (e.g., a ranking) based on some context---a common scenario in web search, ads, and recommendation. We build on techniques from combinatorial bandits to introduce a new practical estimator that uses logged data to estimate a policy's performance. A thorough empirical evaluation on real-world data reveals that our estimator is accurate in a variety of settings, including as a subroutine in a learning-to-rank task, where it achieves competitive performance. We derive conditions under which our estimator is unbiased---these conditions are weaker than prior heuristics for slate evaluation---and experimentally demonstrate a smaller bias than parametric approaches, even when these conditions are violated. Finally, our theory and experiments also show exponential savings in the amount of required data compared with general unbiased estimators.

artificial intelligence, evaluation, machine learning, (4 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.52)

Add feedback

Corrupted Multidimensional Binary Search: Learning in the Presence of Irrational Agents

Krishnamurthy, Akshay, Lykouris, Thodoris, Podimata, Chara

arXiv.org Machine LearningFeb-27-2020

Standard game-theoretic formulations for settings like contextual pricing and security games assume that agents act in accordance with a specific behavioral model. In practice however, some agents may not prescribe to the dominant behavioral model or may act in ways that are arbitrarily inconsistent. Existing algorithms heavily depend on the model being (approximately) accurate for all agents and have poor performance in the presence of even a few such arbitrarily irrational agents. How do we design learning algorithms that are robust to the presence of arbitrarily irrational agents? We address this question for a number of canonical game-theoretic applications by designing a robust algorithm for the fundamental problem of multidimensional binary search. The performance of our algorithm degrades gracefully with the number of corrupted rounds, which correspond to irrational agents and need not be known in advance. As binary search is the key primitive in algorithms for contextual pricing, Stackelberg Security Games, and other game-theoretic applications, we immediately obtain robust algorithms for these settings. Our techniques draw inspiration from learning theory, game theory, high-dimensional geometry, and convex analysis, and may be of independent algorithmic interest.

big data, game theory, hyperplane, (19 more...)

arXiv.org Machine Learning

2002.1165

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.48)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Add feedback

Contextual bandits with surrogate losses: Margin bounds and efficient algorithms

Foster, Dylan J., Krishnamurthy, Akshay

Neural Information Processing SystemsFeb-14-2020, 11:11:22 GMT

We use surrogate losses to obtain several new regret bounds and new algorithms for contextual bandit learning. Using the ramp loss, we derive a new margin-based regret bound in terms of standard sequential complexity measures of a benchmark class of real-valued regression functions. Using the hinge loss, we derive an efficient algorithm with a $\sqrt{dT}$-type mistake bound against benchmark policies induced by $d$-dimensional regressors. Under realizability assumptions, our results also yield classical regret bounds. Papers published at the Neural Information Processing Systems Conference.

artificial intelligence, efficient algorithm, machine learning, (2 more...)

Neural Information Processing Systems

Genre: Research Report (0.73)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

On Oracle-Efficient PAC RL with Rich Observations

Dann, Christoph, Jiang, Nan, Krishnamurthy, Akshay, Agarwal, Alekh, Langford, John, Schapire, Robert E.

Neural Information Processing SystemsFeb-14-2020, 08:11:51 GMT

We study the computational tractability of PAC reinforcement learning with rich observations. We present new provably sample-efficient algorithms for environments with deterministic hidden state dynamics and stochastic rich observations. These methods operate in an oracle model of computation -- accessing policy and value function classes exclusively through standard optimization primitives -- and therefore represent computationally efficient alternatives to prior algorithms that require enumeration. With stochastic hidden state dynamics, we prove that the only known sample-efficient algorithm, OLIVE, cannot be implemented in the oracle model. We also present several examples that illustrate fundamental challenges of tractable PAC reinforcement learning in such general settings.

artificial intelligence, machine learning, rich observation, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)

Add feedback

Robust Dynamic Assortment Optimization in the Presence of Outlier Customers

Chen, Xi, Krishnamurthy, Akshay, Wang, Yining

arXiv.org Machine LearningOct-9-2019

We consider the dynamic assortment optimization problem under the multinomial logit model (MNL) with unknown utility parameters. The main question investigated in this paper is model mis-specification under the $\varepsilon$-contamination model, which is a fundamental model in robust statistics and machine learning. In particular, throughout a selling horizon of length $T$, we assume that customers make purchases according to a well specified underlying multinomial logit choice model in a ($1-\varepsilon$)-fraction of the time periods, and make arbitrary purchasing decisions instead in the remaining $\varepsilon$-fraction of the time periods. In this model, we develop a new robust online assortment optimization policy via an active elimination strategy. We establish both upper and lower bounds on the regret, and show that our policy is optimal up to logarithmic factor in T when the assortment capacity is constant. Furthermore, we develop a fully adaptive policy that does not require any prior knowledge of the contamination parameter $\varepsilon$. Our simulation study shows that our policy outperforms the existing policies based on upper confidence bounds (UCB) and Thompson sampling.

artificial intelligence, customer, optimization problem, (16 more...)

arXiv.org Machine Learning

1910.04183

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback