AITopics | sum-of-square proof

Collaborating Authors

sum-of-square proof

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Dimension Reduction via Sum-of-Squares and Improved Clustering Algorithms for Non-Spherical Mixtures

Anderson, Prashanti, Bafna, Mitali, Buhai, Rares-Darius, Kothari, Pravesh K., Steurer, David

arXiv.org Machine LearningNov-19-2024

We develop a new approach for clustering non-spherical (i.e., arbitrary component covariances) Gaussian mixture models via a subroutine, based on the sum-of-squares method, that finds a low-dimensional separation-preserving projection of the input data. Our method gives a non-spherical analog of the classical dimension reduction, based on singular value decomposition, that forms a key component of the celebrated spherical clustering algorithm of Vempala and Wang [VW04] (in addition to several other applications). As applications, we obtain an algorithm to (1) cluster an arbitrary total-variation separated mixture of $k$ centered (i.e., zero-mean) Gaussians with $n\geq \operatorname{poly}(d) f(w_{\min}^{-1})$ samples and $\operatorname{poly}(n)$ time, and (2) cluster an arbitrary total-variation separated mixture of $k$ Gaussians with identical but arbitrary unknown covariance with $n \geq d^{O(\log w_{\min}^{-1})} f(w_{\min}^{-1})$ samples and $n^{O(\log w_{\min}^{-1})}$ time. Here, $w_{\min}$ is the minimum mixing weight of the input mixture, and $f$ does not depend on the dimension $d$. Our algorithms naturally extend to tolerating a dimension-independent fraction of arbitrary outliers. Before this work, the techniques in the state-of-the-art non-spherical clustering algorithms needed $d^{O(k)} f(w_{\min}^{-1})$ time and samples for clustering such mixtures. Our results may come as a surprise in the context of the $d^{\Omega(k)}$ statistical query lower bound [DKS17] for clustering non-spherical Gaussian mixtures. While this result is usually thought to rule out $d^{o(k)}$ cost algorithms for the problem, our results show that the lower bounds can in fact be circumvented for a remarkably general class of Gaussian mixtures.

algorithm, isotropic position, subspace, (17 more...)

arXiv.org Machine Learning

2411.12438

Country:

North America > United States > New York (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report > New Finding (0.74)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Private Edge Density Estimation for Random Graphs: Optimal, Efficient and Robust

Chen, Hongjie, Ding, Jingqiu, Hua, Yiding, Steurer, David

arXiv.org Machine LearningJun-3-2024

We give the first polynomial-time, differentially node-private, and robust algorithm for estimating the edge density of Erd\H{o}s-R\'enyi random graphs and their generalization, inhomogeneous random graphs. We further prove information-theoretical lower bounds, showing that the error rate of our algorithm is optimal up to logarithmic factors. Previous algorithms incur either exponential running time or suboptimal error rates. Two key ingredients of our algorithm are (1) a new sum-of-squares algorithm for robust edge density estimation, and (2) the reduction from privacy to robustness based on sum-of-squares exponential mechanisms due to Hopkins et al. (STOC 2023).

algorithm, graph, random graph, (15 more...)

arXiv.org Machine Learning

2405.16663

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > New York > New York County > New York City (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.61)

Add feedback

Efficient Certificates of Anti-Concentration Beyond Gaussians

Bakshi, Ainesh, Kothari, Pravesh, Rajendran, Goutham, Tulsiani, Madhur, Vijayaraghavan, Aravindan

arXiv.org Machine LearningMay-23-2024

A set of high dimensional points $X=\{x_1, x_2,\ldots, x_n\} \subset R^d$ in isotropic position is said to be $\delta$-anti concentrated if for every direction $v$, the fraction of points in $X$ satisfying $|\langle x_i,v \rangle |\leq \delta$ is at most $O(\delta)$. Motivated by applications to list-decodable learning and clustering, recent works have considered the problem of constructing efficient certificates of anti-concentration in the average case, when the set of points $X$ corresponds to samples from a Gaussian distribution. Their certificates played a crucial role in several subsequent works in algorithmic robust statistics on list-decodable learning and settling the robust learnability of arbitrary Gaussian mixtures, yet remain limited to rotationally invariant distributions. This work presents a new (and arguably the most natural) formulation for anti-concentration. Using this formulation, we give quasi-polynomial time verifiable sum-of-squares certificates of anti-concentration that hold for a wide class of non-Gaussian distributions including anti-concentrated bounded product distributions and uniform distributions over $L_p$ balls (and their affine transformations). Consequently, our method upgrades and extends results in algorithmic robust statistics e.g., list-decodable learning and clustering, to such distributions. Our approach constructs a canonical integer program for anti-concentration and analysis a sum-of-squares relaxation of it, independent of the intended application. We rely on duality and analyze a pseudo-expectation on large subsets of the input points that take a small value in some direction. Our analysis uses the method of polynomial reweightings to reduce the problem to analyzing only analytically dense or sparse directions.

algorithm, certificate, polynomial, (14 more...)

arXiv.org Machine Learning

2405.15084

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(3 more...)

Genre:

Research Report (0.63)
Workflow (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Modeling & Simulation (0.68)

Add feedback

Learning quantum Hamiltonians at any temperature in polynomial time

Bakshi, Ainesh, Liu, Allen, Moitra, Ankur, Tang, Ewin

arXiv.org Artificial IntelligenceOct-3-2023

We study the problem of learning a local quantum Hamiltonian $H$ given copies of its Gibbs state $\rho = e^{-\beta H}/\textrm{tr}(e^{-\beta H})$ at a known inverse temperature $\beta>0$. Anshu, Arunachalam, Kuwahara, and Soleimanifar (arXiv:2004.07266) gave an algorithm to learn a Hamiltonian on $n$ qubits to precision $\epsilon$ with only polynomially many copies of the Gibbs state, but which takes exponential time. Obtaining a computationally efficient algorithm has been a major open problem [Alhambra'22 (arXiv:2204.08349)], [Anshu, Arunachalam'22 (arXiv:2204.08349)], with prior work only resolving this in the limited cases of high temperature [Haah, Kothari, Tang'21 (arXiv:2108.04842)] or commuting terms [Anshu, Arunachalam, Kuwahara, Soleimanifar'21]. We fully resolve this problem, giving a polynomial time algorithm for learning $H$ to precision $\epsilon$ from polynomially many copies of the Gibbs state at any constant $\beta > 0$. Our main technical contribution is a new flat polynomial approximation to the exponential function, and a translation between multi-variate scalar polynomials and nested commutators. This enables us to formulate Hamiltonian learning as a polynomial system. We then show that solving a low-degree sum-of-squares relaxation of this polynomial system suffices to accurately learn the Hamiltonian.

constraint, hamiltonian, polynomial, (17 more...)

arXiv.org Artificial Intelligence

2310.02243

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.45)

Add feedback

Higher degree sum-of-squares relaxations robust against oblivious outliers

d'Orsi, Tommaso, Nasser, Rajai, Novikov, Gleb, Steurer, David

arXiv.org Artificial IntelligenceNov-14-2022

We consider estimation models of the form $Y=X^*+N$, where $X^*$ is some $m$-dimensional signal we wish to recover, and $N$ is symmetrically distributed noise that may be unbounded in all but a small $\alpha$ fraction of the entries. We introduce a family of algorithms that under mild assumptions recover the signal $X^*$ in all estimation problems for which there exists a sum-of-squares algorithm that succeeds in recovering the signal $X^*$ when the noise $N$ is Gaussian. This essentially shows that it is enough to design a sum-of-squares algorithm for an estimation problem with Gaussian noise in order to get the algorithm that works with the symmetric noise model. Our framework extends far beyond previous results on symmetric noise models and is even robust to adversarial perturbations. As concrete examples, we investigate two problems for which no efficient algorithms were known to work for heavy-tailed noise: tensor PCA and sparse PCA. For the former, our algorithm recovers the principal component in polynomial time when the signal-to-noise ratio is at least $\tilde{O}(n^{p/4}/\alpha)$, that matches (up to logarithmic factors) current best known algorithmic guarantees for Gaussian noise. For the latter, our algorithm runs in quasipolynomial time and matches the state-of-the-art guarantees for quasipolynomial time algorithms in the case of Gaussian noise. Using a reduction from the planted clique problem, we provide evidence that the quasipolynomial time is likely to be necessary for sparse PCA with symmetric noise. In our proofs we use bounds on the covering numbers of sets of pseudo-expectations, which we obtain by certifying in sum-of-squares upper bounds on the Gaussian complexities of sets of solutions. This approach for bounding the covering numbers of sets of pseudo-expectations may be interesting in its own right and may find other application in future works.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2211.07327

Country:

Europe > Switzerland > Zürich > Zürich (0.04)
North America > United States > Arizona > Maricopa County > Phoenix (0.04)
Europe > Netherlands > South Holland > Dordrecht (0.04)
(8 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Beyond Parallel Pancakes: Quasi-Polynomial Time Guarantees for Non-Spherical Gaussian Mixtures

Buhai, Rares-Darius, Steurer, David

arXiv.org Machine LearningDec-10-2021

We consider mixtures of $k\geq 2$ Gaussian components with unknown means and unknown covariance (identical for all components) that are well-separated, i.e., distinct components have statistical overlap at most $k^{-C}$ for a large enough constant $C\ge 1$. Previous statistical-query lower bounds [DKS17] give formal evidence that even distinguishing such mixtures from (pure) Gaussians may be exponentially hard (in $k$). We show that this kind of hardness can only appear if mixing weights are allowed to be exponentially small, and that for polynomially lower bounded mixing weights non-trivial algorithmic guarantees are possible in quasi-polynomial time. Concretely, we develop an algorithm based on the sum-of-squares method with running time quasi-polynomial in the minimum mixing weight. The algorithm can reliably distinguish between a mixture of $k\ge 2$ well-separated Gaussian components and a (pure) Gaussian distribution. As a certificate, the algorithm computes a bipartition of the input sample that separates a pair of mixture components, i.e., both sides of the bipartition contain most of the sample points of at least one component. For the special case of colinear means, our algorithm outputs a $k$ clustering of the input sample that is approximately consistent with the components of the mixture. A significant challenge for our results is that they appear to be inherently sensitive to small fractions of adversarial outliers unlike most previous results for Gaussian mixtures. The reason is that such outliers can simulate exponentially small mixing weights even for mixtures with polynomially lower bounded mixing weights. A key technical ingredient is a characterization of separating directions for well-separated Gaussian components in terms of ratios of polynomials that correspond to moments of two carefully chosen orders logarithmic in the minimum mixing weight.

algorithm, lemma, theorem 5, (15 more...)

arXiv.org Machine Learning

2112.05445

Country:

North America > United States > New York (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
(4 more...)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

Conditional Linear Regression for Heterogeneous Covariances

Juba, Brendan, Liang, Leda

arXiv.org Machine LearningNov-15-2021

Linear regression is a technique frequently used in statistical and data analysis. The task for standard linear regression is to fit a linear relationship among variables in a data set. Often, the goal is to find the most parsimonious model that can describe the majority of the data. In this work, we consider the situation where only a small portion of the data can be accurately modeled using linear regression. More generally, in many kinds of real-world data, portions of the data of significant size can be predicted significantly more accurately than by the best linear model for the overall data distribution: Rosenfeld et al. (2015) showed that there are attributes that are significant risk factors for gastrointestinal cancer in certain subpopulations, but not in the overall population. Hainline et al. (2019) demonstrated that a variety of standard (real-world) regression benchmarks have portions that are fit significantly better by a different linear model than the best model for the overall data set; Calderon et al. (2020) presented further, similar findings. We will consider cases where linear regression fits well when the data set is conditioned on a simple condition, which is unknown to us. We study the task of finding such a linear model, together with a formula on the data attributes describing the condition, i.e., the portion of the data for which the linear model is accurate. This problem was introduced by Juba (2017), who gave an algorithm for conditional sparse linear regression, using the maximum residual as the objective.

algorithm, bakshi and kothari, probability, (14 more...)

arXiv.org Machine Learning

2111.07834

Country:

Africa > South Sudan > Equatoria > Central Equatoria > Juba (0.25)
North America > United States > New York > New York County > New York City (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

SoS Degree Reduction with Applications to Clustering and Robust Moment Estimation

Steurer, David, Tiegel, Stefan

arXiv.org Machine LearningJan-5-2021

The Sum-of-Squares hierarchy is a hierarchy of semidefinite programs which has proven to be a powerful tool in the theory of approximation algorithms [GW95, ARV09]. More recently it has also given rise to a flurry of algorithms for estimation problems such as various tensor [BKS15, BM16, MSS16, HSS15], clustering [HL18], and robust estimation problems [KSS18, KKK19, KKM18], often yielding significant improvements over existing algorithms and in some cases even the first efficient ones. The hierarchy is based on the sum-of-squares proof system which on a high-level allows to argue about non-negativity of polynomials by manipulating a set of polynomial inequalities. Most importantly, it can be algorithmically exploited in the sense that certain proofs in this proof system directly certify approximation guarantees of algorithms based on the hierarchy. The running time of these algorithms depends mainly on the number of variables involved and the maximum degree of the polynomials occurring in the inequalities mentioned above. In general using a higher degree often leads to more accurate solutions but also requires more time. In this work, we show how we can significantly reduce the degree of a wide range of sum-of-squares proofs in an almost black-box manner while still certifying similar guarantees and thus giving a direct speedup for concrete algorithms. As two examples we will consider estimation algorithms for clustering and outlier-robust moment estimation. We hope that this technique can inform future algorithms based on the sum-of-squares hierarchy.

algorithm, polynomial, sum-of-square proof, (14 more...)

arXiv.org Machine Learning

2101.01509

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Germany (0.04)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
(7 more...)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Outlier-Robust Clustering of Non-Spherical Mixtures

Bakshi, Ainesh, Kothari, Pravesh

arXiv.org Machine LearningMay-13-2020

We give the first outlier-robust efficient algorithm for clustering a mixture of $k$ statistically separated d-dimensional Gaussians (k-GMMs). Concretely, our algorithm takes input an $\epsilon$-corrupted sample from a $k$-GMM and whp in $d^{\text{poly}(k/\eta)}$ time, outputs an approximate clustering that misclassifies at most $k^{O(k)}(\epsilon+\eta)$ fraction of the points whenever every pair of mixture components are separated by $1-\exp(-\text{poly}(k/\eta)^k)$ in total variation (TV) distance. Such a result was not previously known even for $k=2$. TV separation is the statistically weakest possible notion of separation and captures important special cases such as mixed linear regression and subspace clustering. Our main conceptual contribution is to distill two simple analytic properties - (certifiable) hypercontractivity and anti-concentration - that are necessary and sufficient for mixture models to be (efficiently) clusterable. As a consequence, our results extend to clustering mixtures of arbitrary affine transforms of the uniform distribution on the $d$-dimensional unit sphere. Even the information theoretic clusterability of separated distributions satisfying these two analytic assumptions was not known prior to our work and is likely to be of independent interest. Our algorithms build on the recent sequence of works relying on certifiable anti-concentration first introduced in [KKK'19,RY'20]. Our techniques expand the sum-of-squares toolkit to show robust certifiability of TV-separated Gaussian clusters in data. This involves giving a low-degree sum-of-squares proof of statements that relate parameter (i.e. mean and covariances) distance to total variation distance by relying only on hypercontractivity and anti-concentration.

artificial intelligence, lemma 4, machine learning, (15 more...)

arXiv.org Machine Learning

2005.0297

Country:

North America > United States > New York (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
(9 more...)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

List-Decodable Linear Regression

Karmalkar, Sushrut, Klivans, Adam R., Kothari, Pravesh K.

arXiv.org Machine LearningMay-30-2019

We give the first polynomial-time algorithm for robust regression in the list-decodable setting where an adversary can corrupt a greater than $1/2$ fraction of examples. For any $\alpha < 1$, our algorithm takes as input a sample $\{(x_i,y_i)\}_{i \leq n}$ of $n$ linear equations where $\alpha n$ of the equations satisfy $y_i = \langle x_i,\ell^*\rangle +\zeta$ for some small noise $\zeta$ and $(1-\alpha)n$ of the equations are {\em arbitrarily} chosen. It outputs a list $L$ of size $O(1/\alpha)$ - a fixed constant - that contains an $\ell$ that is close to $\ell^*$. Our algorithm succeeds whenever the inliers are chosen from a \emph{certifiably} anti-concentrated distribution $D$. In particular, this gives a $(d/\alpha)^{O(1/\alpha^8)}$ time algorithm to find a $O(1/\alpha)$ size list when the inlier distribution is standard Gaussian. For discrete product distributions that are anti-concentrated only in \emph{regular} directions, we give an algorithm that achieves similar guarantee under the promise that $\ell^*$ has all coordinates of the same magnitude. To complement our result, we prove that the anti-concentration assumption on the inliers is information-theoretically necessary. Our algorithm is based on a new framework for list-decodable learning that strengthens the `identifiability to algorithms' paradigm based on the sum-of-squares method. In an independent and concurrent work, Raghavendra and Yau also used the Sum-of-Squares method to give a similar result for list-decodable regression.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1905.05679

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(9 more...)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.51)

Add feedback