AITopics | isotropic position

Collaborating Authors

isotropic position

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Forster Decomposition and Learning Halfspaces with Noise

Neural Information Processing SystemsApr-25-2026, 14:24:29 GMT

AForster transform is an operation that turns a distribution into one with good anticoncentration properties. While a Forster transform does not always exist, we show that any distribution can be efficiently decomposed as a disjoint mixture of few distributions for which a Forster transform exists and can be computed efficiently. As the main application of this result, we obtain the first polynomial-time algorithm for distribution-independent PAC learning of halfspaces in the Massart noise model with strongly polynomial sample complexity, i.e., independent of the bit complexity of the examples. Previous algorithms for this learning problem incurred sample complexity scaling polynomially with the bit complexity, even though such a dependence is not information-theoretically necessary.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.46)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Add feedback

Dimension Reduction via Sum-of-Squares and Improved Clustering Algorithms for Non-Spherical Mixtures

Anderson, Prashanti, Bafna, Mitali, Buhai, Rares-Darius, Kothari, Pravesh K., Steurer, David

arXiv.org Machine LearningNov-19-2024

We develop a new approach for clustering non-spherical (i.e., arbitrary component covariances) Gaussian mixture models via a subroutine, based on the sum-of-squares method, that finds a low-dimensional separation-preserving projection of the input data. Our method gives a non-spherical analog of the classical dimension reduction, based on singular value decomposition, that forms a key component of the celebrated spherical clustering algorithm of Vempala and Wang [VW04] (in addition to several other applications). As applications, we obtain an algorithm to (1) cluster an arbitrary total-variation separated mixture of $k$ centered (i.e., zero-mean) Gaussians with $n\geq \operatorname{poly}(d) f(w_{\min}^{-1})$ samples and $\operatorname{poly}(n)$ time, and (2) cluster an arbitrary total-variation separated mixture of $k$ Gaussians with identical but arbitrary unknown covariance with $n \geq d^{O(\log w_{\min}^{-1})} f(w_{\min}^{-1})$ samples and $n^{O(\log w_{\min}^{-1})}$ time. Here, $w_{\min}$ is the minimum mixing weight of the input mixture, and $f$ does not depend on the dimension $d$. Our algorithms naturally extend to tolerating a dimension-independent fraction of arbitrary outliers. Before this work, the techniques in the state-of-the-art non-spherical clustering algorithms needed $d^{O(k)} f(w_{\min}^{-1})$ time and samples for clustering such mixtures. Our results may come as a surprise in the context of the $d^{\Omega(k)}$ statistical query lower bound [DKS17] for clustering non-spherical Gaussian mixtures. While this result is usually thought to rule out $d^{o(k)}$ cost algorithms for the problem, our results show that the lower bounds can in fact be circumvented for a remarkably general class of Gaussian mixtures.

algorithm, isotropic position, subspace, (17 more...)

arXiv.org Machine Learning

2411.12438

Country:

North America > United States > New York (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report > New Finding (0.74)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Beyond Parallel Pancakes: Quasi-Polynomial Time Guarantees for Non-Spherical Gaussian Mixtures

Buhai, Rares-Darius, Steurer, David

arXiv.org Machine LearningDec-10-2021

We consider mixtures of $k\geq 2$ Gaussian components with unknown means and unknown covariance (identical for all components) that are well-separated, i.e., distinct components have statistical overlap at most $k^{-C}$ for a large enough constant $C\ge 1$. Previous statistical-query lower bounds [DKS17] give formal evidence that even distinguishing such mixtures from (pure) Gaussians may be exponentially hard (in $k$). We show that this kind of hardness can only appear if mixing weights are allowed to be exponentially small, and that for polynomially lower bounded mixing weights non-trivial algorithmic guarantees are possible in quasi-polynomial time. Concretely, we develop an algorithm based on the sum-of-squares method with running time quasi-polynomial in the minimum mixing weight. The algorithm can reliably distinguish between a mixture of $k\ge 2$ well-separated Gaussian components and a (pure) Gaussian distribution. As a certificate, the algorithm computes a bipartition of the input sample that separates a pair of mixture components, i.e., both sides of the bipartition contain most of the sample points of at least one component. For the special case of colinear means, our algorithm outputs a $k$ clustering of the input sample that is approximately consistent with the components of the mixture. A significant challenge for our results is that they appear to be inherently sensitive to small fractions of adversarial outliers unlike most previous results for Gaussian mixtures. The reason is that such outliers can simulate exponentially small mixing weights even for mixtures with polynomially lower bounded mixing weights. A key technical ingredient is a characterization of separating directions for well-separated Gaussian components in terms of ratios of polynomials that correspond to moments of two carefully chosen orders logarithmic in the minimum mixing weight.

algorithm, lemma, theorem 5, (15 more...)

arXiv.org Machine Learning

2112.05445

Country:

North America > United States > New York (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
(4 more...)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

Forster Decomposition and Learning Halfspaces with Noise

Diakonikolas, Ilias, Kane, Daniel M., Tzamos, Christos

arXiv.org Machine LearningJul-12-2021

The motivating application for this paper is the problem of(distribution-independent) PAC learning of halfspaces in the presence of label noise, and more specifically in the Massart (or bounded noise) model. Recent work [DGT19] obtained the first computationally efficient learning algorithm with non-trivial error guarantee for this problem. Interestingly, the sample complexity of the [DGT19] algorithm scales polynomially with the bit complexity of the examples (in addition, of course, to the dimension and the inverse of desired accuracy). This bit-complexity dependence in the sample complexity is an artifact of the algorithmic approach in [DGT19]. Information-theoretically, no such dependence is needed -- alas, the standard VC-dimension-based sample upper bound [MN06] is non-constructive. Motivated by this qualitative gap in our understanding, here we develop a methodology that leads to a computationally efficient learning algorithm for Massart halfspaces (matching the error guarantee of [DGT19]) with "strongly polynomial" sample complexity, i.e., sample complexity completely independent of the bit complexity of the examples. Halfspaces and Efficient Learnability We study the binary classification setting, where the goal is to learn a Boolean function from random labeled examples with noisy labels. Our focus is on the problem of learning halfspaces in Valiant's PAC learning model [Val84] when the labels have been corrupted by Massart noise [MN06].

algorithm, complexity, halfspace, (16 more...)

arXiv.org Machine Learning

2107.05582

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > New York (0.04)
(4 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)

Add feedback

Faster algorithms for polytope rounding, sampling, and volume computation via a sublinear "Ball Walk''

Mangoubi, Oren, Vishnoi, Nisheeth K.

arXiv.org Machine LearningMay-5-2019

We study the problem of "isotropically rounding" a polytope $K\subseteq\mathbb{R}^n$, that is, computing a linear transformation which makes the uniform distribution on the polytope have roughly identity covariance matrix. We assume that $K$ is defined by $m$ linear inequalities, with guarantee that $rB\subseteq K\subseteq RB$, where $B$ is the unit ball. We introduce a new variant of the ball walk Markov chain and show that, roughly, the expected number of arithmetic operations per-step of this Markov chain is $O(m)$ that is sublinear in the input size $mn$--the per-step time of all prior Markov chains. Subsequently, we give a rounding algorithm that succeeds with probability $1-\varepsilon$ in $\tilde{O}(mn^{4.5}\mathrm{polylog}(\frac{1}{\varepsilon},\frac{R}{r}))$ arithmetic operations. This gives a factor of $\sqrt{n}$ improvement on the previous bound of $\tilde{O}(mn^{5} \mathrm{polylog}(\frac{1}{\varepsilon},\frac{R}{r}))$ for rounding, which uses the hit-and-run algorithm. Since the cost of the rounding preprocessing step is in many cases the bottleneck in improving sampling or volume computation, our results imply these tasks can also be achieved in roughly $\tilde{O}(mn^{4.5}\mathrm{polylog}(\frac{1}{\varepsilon},\frac{R}{r})+mn^4\delta^{-2})$ operations for computing the volume of $K$ up to a factor $1+\delta$ and $\tilde{O}(m n^{4.5}\mathrm{polylog}(\frac{1}{\varepsilon},\frac{R}{r})))$ for uniformly sampling on $K$ with TV error $\varepsilon$. This improves on the previous bounds of $\tilde{O}(mn^{5}\mathrm{polylog}(\frac{1}{\varepsilon},\frac{R}{r})+mn^4\delta^{-2})$ for volume computation and $\tilde{O}(mn^{5}\mathrm{polylog}(\frac{1}{\varepsilon},\frac{R}{r}))$ for sampling. We achieve this improvement by a novel method of computing polytope membership, where one avoids checking inequalities which are estimated to have a very low probability of being violated.

artificial intelligence, machine learning, polytope, (19 more...)

arXiv.org Machine Learning

1905.01745

Genre:

Research Report (1.00)
Workflow (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds

Bassily, Raef, Smith, Adam, Thakurta, Abhradeep

arXiv.org Machine LearningOct-17-2014

In this paper, we initiate a systematic investigation of differentially private algorithms for convex empirical risk minimization. Various instantiations of this problem have been studied before. We provide new algorithms and matching lower bounds for private ERM assuming only that each data point's contribution to the loss function is Lipschitz bounded and that the domain of optimization is bounded. We provide a separate set of algorithms and matching lower bounds for the setting in which the loss functions are known to also be strongly convex. Our algorithms run in polynomial time, and in some cases even match the optimal non-private running time (as measured by oracle complexity). We give separate algorithms (and lower bounds) for $(\epsilon,0)$- and $(\epsilon,\delta)$-differential privacy; perhaps surprisingly, the techniques used for designing optimal algorithms in the two cases are completely different. Our lower bounds apply even to very simple, smooth function families, such as linear and quadratic functions. This implies that algorithms from previous work can be used to obtain optimal error rates, under the additional assumption that the contributions of each data point to the loss function is smooth. We show that simple approaches to smoothing arbitrary loss functions (in order to apply previous techniques) do not yield optimal error rates. In particular, optimal algorithms were not previously known for problems such as training support vector machines and the high-dimensional median.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1405.7085

Country: North America > United States (0.27)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)

Add feedback

On Zeroth-Order Stochastic Convex Optimization via Random Walks

Liang, Tengyuan, Narayanan, Hariharan, Rakhlin, Alexander

arXiv.org Machine LearningFeb-11-2014

We propose a method for zeroth order stochastic convex optimization that attains the suboptimality rate of $\tilde{\mathcal{O}}(n^{7}T^{-1/2})$ after $T$ queries for a convex bounded function $f:{\mathbb R}^n\to{\mathbb R}$. The method is based on a random walk (the \emph{Ball Walk}) on the epigraph of the function. The randomized approach circumvents the problem of gradient estimation, and appears to be less sensitive to noisy function evaluations compared to noiseless zeroth order methods.

algorithm, artificial intelligence, convex body, (12 more...)

arXiv.org Machine Learning

1402.2667

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence (0.68)

Add feedback