AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.40)

Neural Information Processing SystemsOct-7-2024, 06:45:28 GMT

Reviews: Learning convex polytopes with margin

The paper considers the problem of PAC learning of fat convex polytopes in the realizable case. This hypothesis class is given by intersections of t fat hyperplanes, i.e., hyperplanes with margin gamma. Using standard results, the authors derive that the VC dimension of this class is quadratic in the inverse margin size and thus that the sample complexity is polylogerithmic in this quantity. As their main result, they provide two algorithms for finding with high probability a consistent fat polytope: one with exponential runtime in t and one polynomial greedy algorithm that, however, only guarantees to find a (t log n)-polytope. Complementary, the paper states two hardness of approximation results: one for finding an approximately consistent fat hyperplane, i.e., one with the minimum number of negative points on wrong side (and all positive correctly classified), and one for finding a consistent fat polytope with the minimum number of hyperplanes. Finally, the authors also show how their results relate to the alternative class of polytopes that separate points outside of their gamma-envelope (area with Euclidean distance less or equal to gamma from the polytope boundary).

algorithm, hyperplane, polytope, (12 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.59)

arXiv.org Machine LearningOct-4-2024

Sequential Probability Assignment with Contexts: Minimax Regret, Contextual Shtarkov Sums, and Contextual Normalized Maximum Likelihood

Liu, Ziyi, Attias, Idan, Roy, Daniel M.

We study the fundamental problem of sequential probability assignment, also known as online learning with logarithmic loss, with respect to an arbitrary, possibly nonparametric hypothesis class. Our goal is to obtain a complexity measure for the hypothesis class that characterizes the minimax regret and to determine a general, minimax optimal algorithm. Notably, the sequential $\ell_{\infty}$ entropy, extensively studied in the literature (Rakhlin and Sridharan, 2015, Bilodeau et al., 2020, Wu et al., 2023), was shown to not characterize minimax risk in general. Inspired by the seminal work of Shtarkov (1987) and Rakhlin, Sridharan, and Tewari (2010), we introduce a novel complexity measure, the \emph{contextual Shtarkov sum}, corresponding to the Shtarkov sum after projection onto a multiary context tree, and show that the worst case log contextual Shtarkov sum equals the minimax regret. Using the contextual Shtarkov sum, we derive the minimax optimal strategy, dubbed \emph{contextual Normalized Maximum Likelihood} (cNML). Our results hold for sequential experts, beyond binary labels, which are settings rarely considered in prior work. To illustrate the utility of this characterization, we provide a short proof of a new regret upper bound in terms of sequential $\ell_{\infty}$ entropy, unifying and sharpening state-of-the-art bounds by Bilodeau et al. (2020) and Wu et al. (2023).

contextual shtarkov sum, minimax regret, shtarkov sum, (15 more...)

2410.03849

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Education > Educational Setting > Online (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.61)

Neural Information Processing SystemsOct-2-2024, 22:59:11 GMT

Online Learning of Optimal Bidding Strategy in Repeated Multi-Commodity Auctions

M. Sevi Baltaoglu, Lang Tong, Qing Zhao

We study the online learning problem of a bidder who participates in repeated auctions. With the goal of maximizing his T-period payoff, the bidder determines the optimal allocation of his budget among his bids for K goods at each period. As a bidding strategy, we propose a polynomial-time algorithm, inspired by the dynamic programming approach to the knapsack problem. The proposed algorithm, referred to as dynamic programming on discrete set (DPDS), achieves a regret order of O( T log T). By showing that the regret is lower bounded by Ω( T) for any strategy, we conclude that DPDS is order optimal up to a log T term. We evaluate the performance of DPDS empirically in the context of virtual trading in wholesale electricity markets by using historical data from the New York market. Empirical results show that DPDS consistently outperforms benchmark heuristic methods that are derived from machine learning and online learning approaches.

algorithm, auction, payoff function, (14 more...)

Country:

North America > United States > New York > Tompkins County > Ithaca (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report (0.34)

Industry:

Banking & Finance > Trading (1.00)
Education > Educational Setting > Online (0.91)
Energy > Power Industry (0.70)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (0.91)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.68)

Omer Ben-Porat, Moshe Tennenholtz

Best Response Regression

Neural Information Processing SystemsOct-2-2024, 21:22:11 GMT

Neural Information Processing Systems http://nips.cc/

agent, best response, opponent, (17 more...)

Country:

Asia > Middle East > Israel > Haifa District > Haifa (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Neural Information Processing SystemsOct-2-2024, 20:32:57 GMT

Collaborative PAC Learning

Avrim Blum, Nika Haghtalab, Ariel D. Procaccia, Mingda Qiao

We consider a collaborative PAC learning model, in which k players attempt to learn the same underlying concept. We ask how much more information is required to learn an accurate classifier for all players simultaneously. We refer to the ratio between the sample complexity of collaborative PAC learning and its non-collaborative (single-player) counterpart as the overhead.

algorithm, classifier, sample complexity, (13 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Kalavasis, Alkis, Mehrotra, Anay, Zampetakis, Manolis

Smaller Confidence Intervals From IPW Estimators via Data-Dependent Coarsening

arXiv.org Machine LearningOct-2-2024

Inverse propensity-score weighted (IPW) estimators are prevalent in causal inference for estimating average treatment effects in observational studies. Under unconfoundedness, given accurate propensity scores and $n$ samples, the size of confidence intervals of IPW estimators scales down with $n$, and, several of their variants improve the rate of scaling. However, neither IPW estimators nor their variants are robust to inaccuracies: even if a single covariate has an $\varepsilon>0$ additive error in the propensity score, the size of confidence intervals of these estimators can increase arbitrarily. Moreover, even without errors, the rate with which the confidence intervals of these estimators go to zero with $n$ can be arbitrarily slow in the presence of extreme propensity scores (those close to 0 or 1). We introduce a family of Coarse IPW (CIPW) estimators that captures existing IPW estimators and their variants. Each CIPW estimator is an IPW estimator on a coarsened covariate space, where certain covariates are merged. Under mild assumptions, e.g., Lipschitzness in expected outcomes and sparsity of extreme propensity scores, we give an efficient algorithm to find a robust estimator: given $\varepsilon$-inaccurate propensity scores and $n$ samples, its confidence interval size scales with $\varepsilon+1/\sqrt{n}$. In contrast, under the same assumptions, existing estimators' confidence interval sizes are $\Omega(1)$ irrespective of $\varepsilon$ and $n$. Crucially, our estimator is data-dependent and we show that no data-independent CIPW estimator can be robust to inaccuracies.

estimator, partition, propensity score, (11 more...)

2410.01658

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre:

Research Report > New Finding (0.67)
Research Report > Strength Medium (0.48)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Epidemiology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.45)

Lee, Jane H., Mehrotra, Anay, Zampetakis, Manolis

Efficient Statistics With Unknown Truncation, Polynomial Time Algorithms, Beyond Gaussians

arXiv.org Machine LearningOct-2-2024

We study the estimation of distributional parameters when samples are shown only if they fall in some unknown set $S \subseteq \mathbb{R}^d$. Kontonis, Tzamos, and Zampetakis (FOCS'19) gave a $d^{\mathrm{poly}(1/\varepsilon)}$ time algorithm for finding $\varepsilon$-accurate parameters for the special case of Gaussian distributions with diagonal covariance matrix. Recently, Diakonikolas, Kane, Pittas, and Zarifis (COLT'24) showed that this exponential dependence on $1/\varepsilon$ is necessary even when $S$ belongs to some well-behaved classes. These works leave the following open problems which we address in this work: Can we estimate the parameters of any Gaussian or even extend beyond Gaussians? Can we design $\mathrm{poly}(d/\varepsilon)$ time algorithms when $S$ is a simple set such as a halfspace? We make progress on both of these questions by providing the following results: 1. Toward the first question, we give a $d^{\mathrm{poly}(\ell/\varepsilon)}$ time algorithm for any exponential family that satisfies some structural assumptions and any unknown set $S$ that is $\varepsilon$-approximable by degree-$\ell$ polynomials. This result has two important applications: 1a) The first algorithm for estimating arbitrary Gaussian distributions from samples truncated to an unknown $S$; and 1b) The first algorithm for linear regression with unknown truncation and Gaussian features. 2. To address the second question, we provide an algorithm with runtime $\mathrm{poly}(d/\varepsilon)$ that works for a set of exponential families (containing all Gaussians) when $S$ is a halfspace or an axis-aligned rectangle. Along the way, we develop tools that may be of independent interest, including, a reduction from PAC learning with positive and unlabeled samples to PAC learning with positive and negative samples that is robust to certain covariate shifts.

algorithm, assumption 3, poly, (13 more...)

2410.01656

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)
(6 more...)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.66)

Patil, Pratik, Du, Jin-Hong, Tibshirani, Ryan J.

Revisiting Optimism and Model Complexity in the Wake of Overparameterized Machine Learning

arXiv.org Machine LearningOct-2-2024

Common practice in modern machine learning involves fitting a large number of parameters relative to the number of observations. These overparameterized models can exhibit surprising generalization behavior, e.g., ``double descent'' in the prediction error curve when plotted against the raw number of model parameters, or another simplistic notion of complexity. In this paper, we revisit model complexity from first principles, by first reinterpreting and then extending the classical statistical concept of (effective) degrees of freedom. Whereas the classical definition is connected to fixed-X prediction error (in which prediction error is defined by averaging over the same, nonrandom covariate points as those used during training), our extension of degrees of freedom is connected to random-X prediction error (in which prediction error is averaged over a new, random sample from the covariate distribution). The random-X setting more naturally embodies modern machine learning problems, where highly complex models, even those complex enough to interpolate the training data, can still lead to desirable generalization performance under appropriate conditions. We demonstrate the utility of our proposed complexity measures through a mix of conceptual arguments, theory, and experiments, and illustrate how they can be used to interpret and compare arbitrary prediction models.

freedom, random-x degree, regression, (17 more...)

2410.01259

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre:

Research Report (0.50)
Overview (0.45)

Industry: Education (0.34)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Pabbaraju, Chirag, Sarmasarkar, Sahasrajit

A Characterization of List Regression

arXiv.org Machine LearningSep-27-2024

There has been a recent interest in understanding and characterizing the sample complexity of list learning tasks, where the learning algorithm is allowed to make a short list of $k$ predictions, and we simply require one of the predictions to be correct. This includes recent works characterizing the PAC sample complexity of standard list classification and online list classification. Adding to this theme, in this work, we provide a complete characterization of list PAC regression. We propose two combinatorial dimensions, namely the $k$-OIG dimension and the $k$-fat-shattering dimension, and show that they optimally characterize realizable and agnostic $k$-list regression respectively. These quantities generalize known dimensions for standard regression. Our work thus extends existing list learning characterizations from classification to regression.

dimension, hypothesis class, lemma 3, (16 more...)

2409.19218

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)
North America > United States > District of Columbia > Washington (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Industry: Education > Educational Setting > Online (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)