AITopics

2402.01089

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Gambling (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

arXiv.org Artificial IntelligenceAug-17-2023

On Size-Independent Sample Complexity of ReLU Networks

Sellke, Mark

We study the sample complexity of learning ReLU neural networks from the point of view of generalization. Given norm constraints on the weight matrices, a common approach is to estimate the Rademacher complexity of the associated function class. Previously Golowich-Rakhlin-Shamir (2020) obtained a bound independent of the network size (scaling with a product of Frobenius norms) except for a factor of the square-root depth. We give a refinement which often has no explicit depth-dependence at all.

artificial intelligence, machine learning, neural network, (12 more...)

2306.01992

Country: Europe > United Kingdom > England (0.15)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.97)

arXiv.org Artificial IntelligenceJun-3-2023

Asymptotically Optimal Pure Exploration for Infinite-Armed Bandits

Gong, Xiao-Yue, Sellke, Mark

We study pure exploration with infinitely many bandit arms generated i.i.d. from an unknown distribution. Our goal is to efficiently select a single high quality arm whose average reward is, with probability $1-\delta$, within $\varepsilon$ of being among the top $\eta$-fraction of arms; this is a natural adaptation of the classical PAC guarantee for infinite action sets. We consider both the fixed confidence and fixed budget settings, aiming respectively for minimal expected and fixed sample complexity. For fixed confidence, we give an algorithm with expected sample complexity $O\left(\frac{\log (1/\eta)\log (1/\delta)}{\eta\varepsilon^2}\right)$. This is optimal except for the $\log (1/\eta)$ factor, and the $\delta$-dependence closes a quadratic gap in the literature. For fixed budget, we show the asymptotically optimal sample complexity as $\delta\to 0$ is $c^{-1}\log(1/\delta)\big(\log\log(1/\delta)\big)^2$ to leading order. Equivalently, the optimal failure probability given exactly $N$ samples decays as $\exp\big(-cN/\log^2 N\big)$, up to a factor $1\pm o_N(1)$ inside the exponent. The constant $c$ depends explicitly on the problem parameters (including the unknown arm distribution) through a certain Fisher information distance. Even the strictly super-linear dependence on $\log(1/\delta)$ was not known and resolves a question of Grossman and Moshkovitz (FOCS 2016, SIAM Journal on Computing 2020).

artificial intelligence, data mining, machine learning, (17 more...)

2306.01995

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.48)

arXiv.org Artificial IntelligenceJun-2-2023

Incentivizing Exploration with Linear Contexts and Combinatorial Actions

Sellke, Mark

We advance the study of incentivized bandit exploration, in which arm choices are viewed as recommendations and are required to be Bayesian incentive compatible. Recent work has shown under certain independence assumptions that after collecting enough initial samples, the popular Thompson sampling algorithm becomes incentive compatible. We give an analog of this result for linear bandits, where the independence of the prior is replaced by a natural convexity condition. This opens up the possibility of efficient and regret-optimal incentivized exploration in high-dimensional action spaces. In the semibandit model, we also improve the sample complexity for the pre-Thompson sampling phase of initial data collection.

artificial intelligence, exploration, machine learning, (17 more...)

2306.0199

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Greece (0.14)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.93)

arXiv.org Artificial IntelligenceMay-30-2023

When Does Adaptivity Help for Quantum State Learning?

Chen, Sitan, Huang, Brice, Li, Jerry, Liu, Allen, Sellke, Mark

We consider the classic question of state tomography: given copies of an unknown quantum state $\rho\in\mathbb{C}^{d\times d}$, output $\widehat{\rho}$ which is close to $\rho$ in some sense, e.g. trace distance or fidelity. When one is allowed to make coherent measurements entangled across all copies, $\Theta(d^2/\epsilon^2)$ copies are necessary and sufficient to get trace distance $\epsilon$. Unfortunately, the protocols achieving this rate incur large quantum memory overheads that preclude implementation on near-term devices. On the other hand, the best known protocol using incoherent (single-copy) measurements uses $O(d^3/\epsilon^2)$ copies, and multiple papers have posed it as an open question to understand whether or not this rate is tight. In this work, we fully resolve this question, by showing that any protocol using incoherent measurements, even if they are chosen adaptively, requires $\Omega(d^3/\epsilon^2)$ copies, matching the best known upper bound. We do so by a new proof technique which directly bounds the ``tilt'' of the posterior distribution after measurements, which yields a surprisingly short proof of our lower bound, and which we believe may be of independent interest. While this implies that adaptivity does not help for tomography with respect to trace distance, we show that it actually does help for tomography with respect to infidelity. We give an adaptive algorithm that outputs a state which is $\gamma$-close in infidelity to $\rho$ using only $\tilde{O}(d^3/\gamma)$ copies, which is optimal for incoherent measurements. In contrast, it is known that any nonadaptive algorithm requires $\Omega(d^3/\gamma^2)$ copies. While it is folklore that in $2$ dimensions, one can achieve a scaling of $O(1/\gamma)$, to the best of our knowledge, our algorithm is the first to achieve the optimal rate in all dimensions.

algorithm, artificial intelligence, machine learning, (18 more...)

2206.05265

Genre: Research Report (0.50)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

arXiv.org Machine LearningJun-18-2021

Iterative Feature Matching: Toward Provable Domain Generalization with Logarithmic Environments

Chen, Yining, Rosenfeld, Elan, Sellke, Mark, Ma, Tengyu, Risteski, Andrej

Domain generalization aims at performing well on unseen environments using labeled data from a limited number of training environments [Blanchard et al., 2011]. In contrast to transfer learning or domain adaptation, domain generalization assumes that neither labeled or unlabeled data from the test environments is available at training time. For example, a medical diagnostic system may have access to training datasets from only a few hospitals, but will be deployed on test cases from many other hospitals [Choudhary et al., 2020]; a traffic scene semantic segmentation system may be trained on data from some specific weather conditions, but will need to perform well under other conditions [Yue et al., 2019]. There are many algorithms for domain generalization, including Invariant Risk Minimization (IRM) [Arjovsky et al., 2019] and several variants. IRM is inspired by the principle of invariance of causal mechanisms [Pearl, 2009], which, under sufficiently strong assumptions, allows for provable identifiability of the features that achieve minimax domain generalization [Peters et al., 2016, Heinze-Deml et al., 2018]. However, empirical results for these algorithms are mixed; Gulrajani and Lopez-Paz [2021], Aubin et al. [2021] present experimental evidence that these methods do not consistently outperform ERM for either realistic or simple linear data models.

computer based training, dimension, educational technology, (22 more...)

2106.09913

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Diagnostic Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

arXiv.org Machine LearningJun-7-2021

A Universal Law of Robustness via Isoperimetry

Bubeck, Sébastien, Sellke, Mark

Classically, data interpolation with a parametrized model class is possible as long as the number of parameters is larger than the number of equations to be satisfied. A puzzling phenomenon in deep learning is that models are trained with many more parameters than what this classical theory would suggest. We propose a theoretical explanation for this phenomenon. We prove that for a broad class of data distributions and model classes, overparametrization is necessary if one wants to interpolate the data smoothly. Namely we show that smooth interpolation requires $d$ times more parameters than mere interpolation, where $d$ is the ambient data dimension. We prove this universal law of robustness for any smoothly parametrized function class with polynomial size weights, and any covariate distribution verifying isoperimetry. In the case of two-layers neural networks and Gaussian covariates, this law was conjectured in prior work by Bubeck, Li and Nagaraj. We also give an interpretation of our result as an improved generalization bound for model classes consisting of smooth functions.

deep learning, neural network, robustness, (17 more...)

2105.12806

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

arXiv.org Machine LearningNov-7-2020

Cooperative and Stochastic Multi-Player Multi-Armed Bandit: Optimal Regret With Neither Communication Nor Collisions

Bubeck, Sébastien, Budzinski, Thomas, Sellke, Mark

We consider the cooperative multi-player version of the stochastic multi-armed bandit problem. We study the regime where the players cannot communicate but have access to shared randomness. In prior work by the first two authors, a strategy for this regime was constructed for two players and three arms, with regret $\tilde{O}(\sqrt{T})$, and with no collisions at all between the players (with very high probability). In this paper we show that these properties (near-optimal regret and no collisions at all) are achievable for any number of players and arms. At a high level, the previous strategy heavily relied on a $2$-dimensional geometric intuition that was difficult to generalize in higher dimensions, while here we take a more combinatorial route to build the new strategy.

algorithm 1, artificial intelligence, big data, (16 more...)

2011.03896

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence (1.00)

arXiv.org Machine LearningMay-1-2019

Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without

Bubeck, Sébastien, Li, Yuanzhi, Peres, Yuval, Sellke, Mark

We consider the non-stochastic version of the (cooperative) multi-player multi-armed bandit problem. The model assumes no communication at all between the players, and furthermore when two (or more) players select the same action this results in a maximal loss. We prove the first $\sqrt{T}$-type regret guarantee for this problem, under the feedback model where collisions are announced to the colliding players. Such a bound was not known even for the simpler stochastic version. We also prove the first sublinear guarantee for the feedback model where collision information is not available, namely $T^{1-\frac{1}{2m}}$ where $m$ is the number of players.

artificial intelligence, big data, collision information, (17 more...)

1904.12233

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence (1.00)

arXiv.org Machine LearningFeb-2-2019

First-Order Regret Analysis of Thompson Sampling

Bubeck, Sébastien, Sellke, Mark

We address online combinatorial optimization when the player has a prior over the adversary's sequence of losses. In this framework, Russo and Van Roy proposed an information-theoretic analysis of Thompson Sampling based on the {\em information ratio}, resulting in optimal worst-case regret bounds. In this paper we introduce three novel ideas to this line of work. First we propose a new quantity, the scale-sensitive information ratio, which allows us to obtain more refined first-order regret bounds (i.e., bounds of the form $\sqrt{L^*}$ where $L^*$ is the loss of the best combinatorial action). Second we replace the entropy over combinatorial actions by a coordinate entropy, which allows us to obtain the first optimal worst-case bound for Thompson Sampling in the combinatorial setting. Finally, we introduce a novel link between Bayesian agents and frequentist confidence intervals. Combining these ideas we show that the classical multi-armed bandit first-order regret bound $\tilde{O}(\sqrt{d L^*})$ still holds true in the more challenging and more general semi-bandit scenario. This latter result improves the previous state of the art bound $\tilde{O}(\sqrt{(d+m^3)L^*})$ by Lykouris, Sridharan and Tardos.

artificial intelligence, machine learning, thompson, (18 more...)

1902.00681

Genre: Research Report > Promising Solution (0.34)

Industry: Information Technology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.91)