uniform concentration
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
- North America > United States > Ohio > Franklin County > Columbus (0.04)
- Asia > Middle East > Jordan (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Michigan > Wayne County > Detroit (0.04)
- North America > United States > Ohio > Franklin County > Columbus (0.04)
- Asia > Middle East > Jordan (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Michigan > Wayne County > Detroit (0.04)
Dimension-free uniform concentration bound for logistic regression
We provide a novel dimension-free uniform concentration bound for the empirical risk function of constrained logistic regression. Our bound yields a milder sufficient condition for a uniform law of large numbers than conditions derived by the Rademacher complexity argument and McDiarmid's inequality. The derivation is based on the PAC-Bayes approach with second-order expansion and Rademacher-complexity-based bounds for the residual term of the expansion.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Research Report > New Finding (0.87)
- Research Report > Experimental Study (0.86)
Provably Efficient Model-Free Constrained RL with Linear Function Approximation
Ghosh, Arnob, Zhou, Xingyu, Shroff, Ness
We study the constrained reinforcement learning problem, in which an agent aims to maximize the expected cumulative reward subject to a constraint on the expected total value of a utility function. In contrast to existing model-based approaches or model-free methods accompanied with a `simulator', we aim to develop the first model-free, simulator-free algorithm that achieves a sublinear regret and a sublinear constraint violation even in large-scale systems. To this end, we consider the episodic constrained Markov decision processes with linear function approximation, where the transition dynamics and the reward function can be represented as a linear function of some known feature mapping. We show that $\tilde{\mathcal{O}}(\sqrt{d^3H^3T})$ regret and $\tilde{\mathcal{O}}(\sqrt{d^3H^3T})$ constraint violation bounds can be achieved, where $d$ is the dimension of the feature mapping, $H$ is the length of the episode, and $T$ is the total number of steps. Our bounds are attained without explicitly estimating the unknown transition model or requiring a simulator, and they depend on the state space only through the dimension of the feature mapping. Hence our bounds hold even when the number of states goes to infinity. Our main results are achieved via novel adaptations of the standard LSVI-UCB algorithms. In particular, we first introduce primal-dual optimization into the LSVI-UCB algorithm to balance between regret and constraint violation. More importantly, we replace the standard greedy selection with respect to the state-action function in LSVI-UCB with a soft-max policy. This turns out to be key in establishing uniform concentration for the constrained case via its approximation-smoothness trade-off. We also show that one can achieve an even zero constraint violation while still maintaining the same order with respect to $T$.
- North America > United States > Ohio > Franklin County > Columbus (0.04)
- Asia > Middle East > Jordan (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Michigan > Wayne County > Detroit (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.61)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)
Improving Computational Complexity in Statistical Models with Second-Order Information
Ren, Tongzheng, Zhuo, Jiacheng, Sanghavi, Sujay, Ho, Nhat
It is known that when the statistical models are singular, i.e., the Fisher information matrix at the true parameter is degenerate, the fixed step-size gradient descent algorithm takes polynomial number of steps in terms of the sample size $n$ to converge to a final statistical radius around the true parameter, which can be unsatisfactory for the application. To further improve that computational complexity, we consider the utilization of the second-order information in the design of optimization algorithms. Specifically, we study the normalized gradient descent (NormGD) algorithm for solving parameter estimation in parametric statistical models, which is a variant of gradient descent algorithm whose step size is scaled by the maximum eigenvalue of the Hessian matrix of the empirical loss function of statistical models. When the population loss function, i.e., the limit of the empirical loss function when $n$ goes to infinity, is homogeneous in all directions, we demonstrate that the NormGD iterates reach a final statistical radius around the true parameter after a logarithmic number of iterations in terms of $n$. Therefore, for fixed dimension $d$, the NormGD algorithm achieves the optimal overall computational complexity $\mathcal{O}(n)$ to reach the final statistical radius. This computational complexity is cheaper than that of the fixed step-size gradient descent algorithm, which is of the order $\mathcal{O}(n^{\tau})$ for some $\tau > 1$, to reach the same statistical radius. We illustrate our general theory under two statistical models: generalized linear models and mixture models, and experimental results support our prediction with general theory.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (3 more...)
Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control
Angelopoulos, Anastasios N., Bates, Stephen, Candès, Emmanuel J., Jordan, Michael I., Lei, Lihua
We introduce Learn then Test, a framework for calibrating machine learning models so that their predictions satisfy explicit, finite-sample statistical guarantees regardless of the underlying model and (unknown) data-generating distribution. The framework addresses, among other examples, false discovery rate control in multi-label classification, intersection-over-union control in instance segmentation, and the simultaneous control of the type-1 error of outlier detection and confidence set coverage in classification or regression. To accomplish this, we solve a key technical challenge: the control of arbitrary risks that are not necessarily monotonic. Our main insight is to reframe the risk-control problem as multiple hypothesis testing, enabling techniques and mathematical arguments different from those in the previous literature. We use our framework to provide new calibration methods for several core machine learning tasks with detailed worked examples in computer vision.
- Asia > Middle East > Jordan (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)