Goto

Collaborating Authors

 Mania, Horia


Time varying regression with hidden linear dynamics

arXiv.org Machine Learning

The distribution of labels given the covariates changes over time in a variety of applications of regression. Some example domains where such problems arise include economics, marketing, fashion, and supply chain optimization, where market properties evolve over time. Motivated by such problems, we revisit a model for time-varying linear regression that assumes the unknown parameters evolve according to a linear dynamical system. One way to account for distribution change in linear regression is to assume that the unknown model parameters change slowly with time [2, 15, 37]. While this assumption simplifies the problem and makes it tractable, it misses on exploiting additional structure available and it also fails to model periodicity (e.g., due to seasonality) present in some problems. As an alternative, we are interested in a dynamic model previously studied by Chow [7], Carraro [5], and Shumway et al. [26].


Why do classifier accuracies show linear trends under distribution shift?

arXiv.org Machine Learning

Several recent studies observed that when classification models are evaluated on two different data distributions, the models' accuracies on one distribution are approximately a linear function of their accuracies on another distribution. We offer an explanation for these observations based on two assumptions that can be assessed empirically: (1) certain events have similar probabilities under the two distributions; (2) the probability that a lower accuracy model correctly classifies a data point sampled from one distribution when a higher accuracy model classifies it incorrectly is small.


Bandit Learning in Decentralized Matching Markets

arXiv.org Machine Learning

A fundamental question at the intersection of learning theory and game theory is as follows: how should individually rational agents act when they have to learn about the consequences of their actions in the same uncertain environment? The emerging research area of multiplayer learning-- which explores this basic question and is motivated by a broad range of modern applications, from modeling competition among firms [Mansour et al., 2018, Aridor et al., 2019] to implementing protocols for wireless networks [Liu and Zhao, 2010, Cesa-Bianchi et al., 2016, Shahrampour et al., 2017]--has been of increasing interest. A particularly salient application is the online marketplace, which can often be modeled as a two-sided matching market with uncertainty. Examples include online labor markets (Upwork and TaskRabbit for freelancing, Handy for housecleaning), online crowdsourcing platforms (Amazon Mechanical Turk), and peer-to-peer platforms (Airbnb). The multi-armed bandit is a core learning problem in which a player is faced with a choice among K actions, each of which is associated with a reward distribution, and the goal is to learn which action has the highest reward, doing so as quickly as possible so as to be able to reap rewards even while the learning process is underway. To introduce a game-theoretic aspect into the bandit problem, it is natural to place the problem into the context of a two-way matching market, where the choices faced by the players are identified with the entities on the other side of the market, and where the need to realize a matching imposes economic constraints and incentives. Such a blend of bandit learning with two-sided matching markets was introduced by Liu et al. [2020], who formulated a problem in which the players and the arms form the two sides of the market, and each side has preferences over the other side.


Active Learning for Nonlinear System Identification with Guarantees

arXiv.org Machine Learning

While the identification of nonlinear dynamical systems is a fundamental building block of model-based reinforcement learning and feedback control, its sample complexity is only understood for systems that either have discrete states and actions or for systems that can be identified from data generated by i.i.d. random inputs. Nonetheless, many interesting dynamical systems have continuous states and actions and can only be identified through a judicious choice of inputs. Motivated by practical settings, we study a class of nonlinear dynamical systems whose state transitions depend linearly on a known feature embedding of state-action pairs. To estimate such systems in finite time identification methods must explore all directions in feature space. We propose an active learning approach that achieves this by repeating three steps: trajectory planning, trajectory tracking, and re-estimation of the system from all available data. We show that our method estimates nonlinear dynamical systems at a parametric rate, similar to the statistical rate of standard linear regression.


Model Similarity Mitigates Test Set Overuse

Neural Information Processing Systems

Excessive reuse of test data has become commonplace in today's machine learning workflows. Popular benchmarks, competitions, industrial scale tuning, among other applications, all involve test data reuse beyond guidance by statistical confidence bounds. Nonetheless, recent replication studies give evidence that popular benchmarks continue to support progress despite years of extensive reuse. We proffer a new explanation for the apparent longevity of test data: Many proposed models are similar in their predictions and we prove that this similarity mitigates overfitting. Specifically, we show empirically that models proposed for the ImageNet ILSVRC benchmark agree in their predictions well beyond what we can conclude from their accuracy levels alone.


Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator

Neural Information Processing Systems

We consider adaptive control of the Linear Quadratic Regulator (LQR), where an unknown linear system is controlled subject to quadratic costs. Leveraging recent developments in the estimation of linear systems and in robust controller synthesis, we present the first provably polynomial time algorithm that achieves sub-linear regret on this problem. We further study the interplay between regret minimization and parameter estimation by proving a lower bound on the expected regret in terms of the exploration schedule used by any algorithm. Finally, we conduct a numerical study comparing our robust adaptive algorithm to other methods from the adaptive LQR literature, and demonstrate the flexibility of our proposed method by extending it to a demand forecasting problem subject to state constraints. Papers published at the Neural Information Processing Systems Conference.


Competing Bandits in Matching Markets

arXiv.org Machine Learning

Stable matching, a classical model for two-sided markets, has long been studied with little consideration for how each side's preferences are learned. With the advent of massive online markets powered by data-driven matching platforms, it has become necessary to better understand the interplay between learning and market objectives. We propose a statistical learning model in which one side of the market does not have a priori knowledge about its preferences for the other side and is required to learn these from stochastic rewards. Our model extends the standard multi-armed bandits framework to multiple players, with the added feature that arms have preferences over players. We study both centralized and decentralized approaches to this problem and show surprising exploration-exploitation trade-offs compared to the single player multi-armed bandits setting.


Model Similarity Mitigates Test Set Overuse

arXiv.org Machine Learning

Excessive reuse of test data has become commonplace in today's machine learning workflows. Popular benchmarks, competitions, industrial scale tuning, among other applications, all involve test data reuse beyond guidance by statistical confidence bounds. Nonetheless, recent replication studies give evidence that popular benchmarks continue to support progress despite years of extensive reuse. We proffer a new explanation for the apparent longevity of test data: Many proposed models are similar in their predictions and we prove that this similarity mitigates overfitting. Specifically, we show empirically that models proposed for the ImageNet ILSVRC benchmark agree in their predictions well beyond what we can conclude from their accuracy levels alone. Likewise, models created by large scale hyperparameter search enjoy high levels of similarity. Motivated by these empirical observations, we give a non-asymptotic generalization bound that takes similarity into account, leading to meaningful confidence bounds in practical settings.


Certainty Equivalent Control of LQR is Efficient

arXiv.org Machine Learning

One of the most straightforward methods for controlling a dynamical system with unknown transitions isbased on the certainty equivalence principle: a model of the system is fit by observing its time evolution, and a control policy is then designed by treating the fitted model as the truth [8]. Despite the simplicity of this method, it is challenging to guarantee its efficiency because small modeling errors may propagate to large, undesirable behaviors on long time horizons. As a result, most work on controlling systems with unknown dynamics has explicitly incorporated robustness against model uncertainty [11, 12, 20, 25, 35, 36]. In this work, we show that for the standard baseline of controlling an unknown linear dynamical system with a quadratic objective function, known as the Linear Quadratic Regulator (LQR), certainty equivalent control synthesis achieves better cost than prior methods that account for model uncertainty. In the case of offline control, where one collects some data and then designs a fixed control policy to be run on an infinite time horizon, we show that the gap between the performance of the certainty equivalent controller and the optimal control policy scales quadratically with the error in the model parameters.


Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator

Neural Information Processing Systems

We consider adaptive control of the Linear Quadratic Regulator (LQR), where an unknown linear system is controlled subject to quadratic costs. Leveraging recent developments in the estimation of linear systems and in robust controller synthesis, we present the first provably polynomial time algorithm that provides high probability guarantees of sub-linear regret on this problem. We further study the interplay between regret minimization and parameter estimation by proving a lower bound on the expected regret in terms of the exploration schedule used by any algorithm. Finally, we conduct a numerical study comparing our robust adaptive algorithm to other methods from the adaptive LQR literature, and demonstrate the flexibility of our proposed method by extending it to a demand forecasting problem subject to state constraints.