AITopics | Muthukumar, Vidya

Collaborating Authors

Muthukumar, Vidya

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Online Model Selection for Reinforcement Learning with Function Approximation

Lee, Jonathan N., Pacchiano, Aldo, Muthukumar, Vidya, Kong, Weihao, Brunskill, Emma

arXiv.org Machine LearningNov-19-2020

Deep reinforcement learning has achieved impressive successes yet often requires a very large amount of interaction data. This result is perhaps unsurprising, as using complicated function approximation often requires more data to fit, and early theoretical results on linear Markov decision processes provide regret bounds that scale with the dimension of the linear approximation. Ideally, we would like to automatically identify the minimal dimension of the approximation that is sufficient to encode an optimal policy. Towards this end, we consider the problem of model selection in RL with function approximation, given a set of candidate RL algorithms with known regret guarantees. The learner's goal is to adapt to the complexity of the optimal algorithm without knowing it \textit{a priori}. We present a meta-algorithm that successively rejects increasingly complex models using a simple statistical test. Given at least one candidate that satisfies realizability, we prove the meta-algorithm adapts to the optimal complexity with $\tilde{O}(L^{5/6} T^{2/3})$ regret compared to the optimal candidate's $\tilde{O}(\sqrt T)$ regret, where $T$ is the number of episodes and $L$ is the number of algorithms. The dimension and horizon dependencies remain optimal with respect to the best candidate, and our meta-algorithmic approach is flexible to incorporate multiple candidate algorithms and models. Finally, we show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds that depend on the gaps between the maximal values attainable by the candidates.

algorithm, artificial intelligence, fuzzy logic, (17 more...)

arXiv.org Machine Learning

2011.0975

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.81)

Add feedback

On the proliferation of support vectors in high dimensions

Hsu, Daniel, Muthukumar, Vidya, Xu, Ji

arXiv.org Machine LearningSep-22-2020

The support vector machine (SVM) is a well-established classification method whose name refers to the particular training examples, called support vectors, that determine the maximum margin separating hyperplane. The SVM classifier is known to enjoy good generalization properties when the number of support vectors is small compared to the number of training examples. However, recent research has shown that in sufficiently high-dimensional linear classification problems, the SVM can generalize well despite a proliferation of support vectors where all training examples are support vectors. In this paper, we identify new deterministic equivalences for this phenomenon of support vector proliferation, and use them to (1) substantially broaden the conditions under which the phenomenon occurs in high-dimensional settings, and (2) prove a nearly matching converse result.

artificial intelligence, machine learning, probability, (15 more...)

arXiv.org Machine Learning

2009.1067

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Add feedback

Classification vs regression in overparameterized regimes: Does the loss function matter?

Muthukumar, Vidya, Narang, Adhyyan, Subramanian, Vignesh, Belkin, Mikhail, Hsu, Daniel, Sahai, Anant

arXiv.org Machine LearningMay-16-2020

Paradigmatic problems in supervised machine learning (ML) involve predicting an output response from an input, based on patterns extracted from a (training) dataset. In classification, the output response is (finitely) discrete and we need to classify input data into one of these discrete categories. In regression, the output is continuous, typically a real number or a vector. Owing to this important distinction in output response, the two tasks are typically treated differently. The differences in treatment manifest in two phases of modern ML: optimization (training), which consists of an algorithmic procedure to extract a predictor from the training data, typically by minimizing the training loss (also called empirical risk); and generalization (testing), which consists of an evaluation of the obtained predictor on a separate test, or validation, dataset. Traditionally, the choice of loss functions for both phases is starkly different across classification and regression tasks. The squared-loss function is typically used both for the training and the testing phases in regression. In contrast, the hinge or logistic (cross-entropy for multi-class problems) loss functions are typically used in the training phase of classification, while the very different 0-1 loss function is used for testing.

artificial intelligence, equation, machine learning, (18 more...)

arXiv.org Machine Learning

2005.08054

Country:

North America > United States (0.45)
Europe (0.45)

Genre: Research Report > New Finding (0.46)

Industry: Energy > Oil & Gas (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

OSOM: A Simultaneously Optimal Algorithm for Multi-Armed and Linear Contextual Bandits

Chatterji, Niladri S., Muthukumar, Vidya, Bartlett, Peter L.

arXiv.org Machine LearningJun-11-2019

We consider the stochastic linear (multi-armed) contextual bandit problem with the possibility of hidden \textit{simple multi-armed bandit} structure in which the rewards are independent of the contextual information. Algorithms that are designed solely for one of the regimes are known to be sub-optimal for their alternate regime. We design a single computationally efficient algorithm that simultaneously obtains problem-dependent optimal regret rates in the simple multi-armed bandit regime and minimax optimal regret rates in the linear contextual bandit regime, without knowing a priori which of the two models generates the rewards. These results are proved under the condition of stochasticity of contextual information over multiple rounds. Our results should be viewed as a step towards principled data-dependent policy class selection for contextual bandits.

algorithm, artificial intelligence, big data, (21 more...)

arXiv.org Machine Learning

1905.1004

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

Harmless interpolation of noisy data in regression

Muthukumar, Vidya, Vodrahalli, Kailas, Sahai, Anant

arXiv.org Machine LearningMar-21-2019

In classification problems (i.e. when the labels Y are discrete), the scaling of the test error with respect to n is determined by characterizations of the VC-dimension [2]/Rademacher complexity [3] of the function class, which in the worst case increases with its number of parameters. In regression (i.e. when the labels Y are continuous), the mean-squared error of the ordinary least-squares estimate is characterized by the condition number of the regression matrix, which is reasonable for appropriate ratios of d/n but tends to increase astronomically as d approaches n. The qualitative fear is the same: if the function class is too complex, it starts to overfit noise and can generalize poorly to unseen test data. But there is a gap between "can" and "will" -- and indeed this conventional wisdom has been challenged by the recent advent of deeper and deeper neural networks. In particular, a thought-provoking paper [4] noted that several deep neural networks generalize well despite achieving zero or close to zero training error, and being so expressive that they even have the ability to fit pure noise. As they put it, "understanding deep learning requires rethinking generalization". How can we reconcile the fact that good interpolative solutions exist with the classical bias-variance tradeoff? These phenomena are being actively investigated in a statistical sense [5,6] and a computational sense [7-9] in classification problems and/or noiseless models.

deep learning, neural network, noise, (19 more...)

arXiv.org Machine Learning

1903.09139

Genre: Research Report (0.82)

Industry:

Materials > Chemicals > Industrial Gases > Liquified Gas (0.69)
Materials > Chemicals > Commodity Chemicals > Petrochemicals > LNG (0.69)
Energy > Oil & Gas > Midstream (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Worst-case vs Average-case Design for Estimation from Fixed Pairwise Comparisons

Pananjady, Ashwin, Mao, Cheng, Muthukumar, Vidya, Wainwright, Martin J., Courtade, Thomas A.

arXiv.org Machine LearningJul-19-2017

Pairwise comparison data arises in many domains, including tournament rankings, web search, and preference elicitation. Given noisy comparisons of a fixed subset of pairs of items, we study the problem of estimating the underlying comparison probabilities under the assumption of strong stochastic transitivity (SST). We also consider the noisy sorting subclass of the SST model. We show that when the assignment of items to the topology is arbitrary, these permutation-based models, unlike their parametric counterparts, do not admit consistent estimation for most comparison topologies used in practice. We then demonstrate that consistent estimation is possible when the assignment of items to the topology is randomized, thus establishing a dichotomy between worst-case and average-case designs. We propose two estimators in the average-case setting and analyze their risk, showing that it depends on the comparison topology only through the degree sequence of the topology. The rates achieved by these estimators are shown to be optimal for a large class of graphs. Our results are corroborated by simulations on multiple comparison topologies.

artificial intelligence, estimator, machine learning, (18 more...)

arXiv.org Machine Learning

1707.06217

Country: North America > United States (0.45)

Genre: Research Report > New Finding (0.34)

Industry: Government (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback