AITopics | Bandeira, Afonso S.

Collaborating Authors

Bandeira, Afonso S.

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Injectivity of ReLU networks: perspectives from statistical physics

Maillard, Antoine, Bandeira, Afonso S., Belius, David, Dokmanić, Ivan, Nakajima, Shuta

arXiv.org Artificial IntelligenceFeb-27-2023

When can the input of a ReLU neural network be inferred from its output? In other words, when is the network injective? We consider a single layer, $x \mapsto \mathrm{ReLU}(Wx)$, with a random Gaussian $m \times n$ matrix $W$, in a high-dimensional setting where $n, m \to \infty$. Recent work connects this problem to spherical integral geometry giving rise to a conjectured sharp injectivity threshold for $\alpha = \frac{m}{n}$ by studying the expected Euler characteristic of a certain random set. We adopt a different perspective and show that injectivity is equivalent to a property of the ground state of the spherical perceptron, an important spin glass model in statistical physics. By leveraging the (non-rigorous) replica symmetry-breaking theory, we derive analytical equations for the threshold whose solution is at odds with that from the Euler characteristic. Furthermore, we use Gordon's min--max theorem to prove that a replica-symmetric upper bound refutes the Euler characteristic prediction. Along the way we aim to give a tutorial-style introduction to key ideas from statistical physics in an effort to make the exposition accessible to a broad audience. Our analysis establishes a connection between spin glasses and integral geometry but leaves open the problem of explaining the discrepancies.

artificial intelligence, machine learning, prediction, (15 more...)

arXiv.org Artificial Intelligence

2302.14112

Country:

North America > United States (0.45)
Europe > United Kingdom > England (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre:

Research Report (0.64)
Instructional Material > Course Syllabus & Notes (0.48)

Industry: Energy > Oil & Gas > Upstream (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science (0.92)

Add feedback

On free energy barriers in Gaussian priors and failure of cold start MCMC for high-dimensional unimodal distributions

Bandeira, Afonso S., Maillard, Antoine, Nickl, Richard, Wang, Sven

arXiv.org Machine LearningNov-19-2022

Markov Chain Monte Carlo (MCMC) methods are the workhorse of Bayesian computation when closed formulas for estimators or probability distributions are not available. For this reason they have been central to the development and success of high-dimensional Bayesian statistics in the last decades, where one attempts to generate samples from some posterior distribution Π( |data) arising from a prior Π on D-dimensional Euclidean space and the observed data vector. MCMC methods tend to perform well in a large variety of problems, are very flexible and user-friendly, and enjoy many theoretical guarantees. Under mild assumptions, they are known to converge to their stationary'target' distributions as a consequence of the ergodic theorem, albeit perhaps at a slow speed, requiring a large number of iterations to provide numerically accurate algorithms. When the target distribution is log-concave, MCMC algorithms are known to mix rapidly, even in high dimensions.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

2209.02001

Country:

North America > United States (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (0.50)

Add feedback

The Franz-Parisi Criterion and Computational Trade-offs in High Dimensional Statistics

Bandeira, Afonso S., Alaoui, Ahmed El, Hopkins, Samuel B., Schramm, Tselil, Wein, Alexander S., Zadik, Ilias

arXiv.org Machine LearningOct-13-2022

Many high-dimensional statistical inference problems are believed to possess inherent computational hardness. Various frameworks have been proposed to give rigorous evidence for such hardness, including lower bounds against restricted models of computation (such as low-degree functions), as well as methods rooted in statistical physics that are based on free energy landscapes. This paper aims to make a rigorous connection between the seemingly different low-degree and free-energy based approaches. We define a free-energy based criterion for hardness and formally connect it to the well-established notion of low-degree hardness for a broad class of statistical problems, namely all Gaussian additive models and certain models with a sparse planted signal. By leveraging these rigorous connections we are able to: establish that for Gaussian additive models the "algebraic" notion of low-degree hardness implies failure of "geometric" local MCMC algorithms, and provide new low-degree lower bounds for sparse linear regression which seem difficult to prove directly. These results provide both conceptual insights into the connections between different notions of hardness, as well as concrete technical tools such as new methods for proving low-degree lower bounds.

artificial intelligence, gaussian additive model, machine learning, (14 more...)

arXiv.org Machine Learning

2205.09727

Country:

North America > United States > California (0.27)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.45)

Add feedback

Community Detection with a Subsampled Semidefinite Program

Abdalla, Pedro, Bandeira, Afonso S.

arXiv.org Machine LearningFeb-3-2021

Semidefinite programming is an important tool to tackle several problems in data science and signal processing, including clustering and community detection. However, semidefinite programs are often slow in practice, so speed up techniques such as sketching are often considered. In the context of community detection in the stochastic block model, Mixon and Xie [9] have recently proposed a sketching framework in which a semidefinite program is solved only on a subsampled subgraph of the network, giving rise to significant computational savings. In this short paper, we provide a positive answer to a conjecture of Mixon and Xie about the statistical limits of this technique for the stochastic block model with two balanced communities.

artificial intelligence, data mining, semidefinite program, (16 more...)

arXiv.org Machine Learning

2102.01419

Country: Europe (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.96)

Add feedback

The Average-Case Time Complexity of Certifying the Restricted Isometry Property

Ding, Yunzi, Kunisky, Dmitriy, Wein, Alexander S., Bandeira, Afonso S.

arXiv.org Machine LearningJun-13-2020

In compressed sensing, the restricted isometry property (RIP) on $M \times N$ sensing matrices (where $M < N$) guarantees efficient reconstruction of sparse vectors. A matrix has the $(s,\delta)$-$\mathsf{RIP}$ property if behaves as a $\delta$-approximate isometry on $s$-sparse vectors. It is well known that an $M\times N$ matrix with i.i.d. $\mathcal{N}(0,1/M)$ entries is $(s,\delta)$-$\mathsf{RIP}$ with high probability as long as $s\lesssim \delta^2 M/\log N$. On the other hand, most prior works aiming to deterministically construct $(s,\delta)$-$\mathsf{RIP}$ matrices have failed when $s \gg \sqrt{M}$. An alternative way to find an RIP matrix could be to draw a random gaussian matrix and certify that it is indeed RIP. However, there is evidence that this certification task is computationally hard when $s \gg \sqrt{M}$, both in the worst case and the average case. In this paper, we investigate the exact average-case time complexity of certifying the RIP property for $M\times N$ matrices with i.i.d. $\mathcal{N}(0,1/M)$ entries, in the "possible but hard" regime $\sqrt{M} \ll s\lesssim M/\log N$, assuming that $M$ scales proportional to $N$. Based on analysis of the low-degree likelihood ratio, we give rigorous evidence that subexponential runtime $N^{\tilde\Omega(s^2/N)}$ is required, demonstrating a smooth tradeoff between the maximum tolerated sparsity and the required computational power. The lower bound is essentially tight, matching the runtime of an existing algorithm due to Koiran and Zouzias. Our hardness result allows $\delta$ to take any constant value in $(0,1)$, which captures the relevant regime for compressed sensing. This improves upon the existing average-case hardness result of Wang, Berthet, and Plan, which is limited to $\delta = o(1)$.

artificial intelligence, machine learning, matrix, (16 more...)

arXiv.org Machine Learning

2005.1127

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

Computationally efficient sparse clustering

Löffler, Matthias, Wein, Alexander S., Bandeira, Afonso S.

arXiv.org Machine LearningMay-25-2020

We study statistical and computational limits of clustering when the means of the centres are sparse and their dimension is possibly much larger than the sample size. Our theoretical analysis focuses on the simple model $X_i = z_i \theta + \varepsilon_i$, $z_i \in \{-1,1\}$, $\varepsilon_i \thicksim \mathcal{N}(0, I)$, which has two clusters with centres $\theta$ and $-\theta$. We provide a finite sample analysis of a new sparse clustering algorithm based on sparse PCA and show that it achieves the minimax optimal misclustering rate in the regime $\|\theta\| \rightarrow \infty$, matching asymptotically the Bayes error. Our results require the sparsity to grow slower than the square root of the sample size. Using a recent framework for computational lower bounds---the low-degree likelihood ratio---we give evidence that this condition is necessary for any polynomial-time clustering algorithm to succeed below the BBP threshold. This complements existing evidence based on reductions and statistical query lower bounds. Compared to these existing results, we cover a wider set of parameter regimes and give a more precise understanding of the runtime required and the misclustering error achievable. We also discuss extensions of our results to more than two clusters.

artificial intelligence, bandeira sparse, machine learning, (16 more...)

arXiv.org Machine Learning

2005.10817

Country:

North America > United States (0.14)
Europe (0.14)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.86)

Add feedback

Notes on Computational Hardness of Hypothesis Testing: Predictions using the Low-Degree Likelihood Ratio

Kunisky, Dmitriy, Wein, Alexander S., Bandeira, Afonso S.

arXiv.org Machine LearningJul-26-2019

These notes survey and explore an emerging method, which we call the low-degree method, for predicting and understanding statistical-versus-computational tradeoffs in high-dimensional inference problems. In short, the method posits that a certain quantity -- the second moment of the low-degree likelihood ratio -- gives insight into how much computational time is required to solve a given hypothesis testing problem, which can in turn be used to predict the computational hardness of a variety of statistical inference tasks. While this method originated in the study of the sum-of-squares (SoS) hierarchy of convex programs, we present a self-contained introduction that does not require knowledge of SoS. In addition to showing how to carry out predictions using the method, we include a discussion investigating both rigorous and conjectural consequences of these predictions. These notes include some new results, simplified proofs, and refined conjectures. For instance, we point out a formal connection between spectral methods and the low-degree likelihood ratio, and we give a sharp low-degree lower bound against subexponential-time algorithms for tensor PCA.

artificial intelligence, polynomial, scientific discovery, (18 more...)

arXiv.org Machine Learning

1907.11636

Country:

North America > United States (0.46)
Europe > United Kingdom > England (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.60)

Add feedback

Subexponential-Time Algorithms for Sparse PCA

Ding, Yunzi, Kunisky, Dmitriy, Wein, Alexander S., Bandeira, Afonso S.

arXiv.org Machine LearningJul-26-2019

We study the computational cost of recovering a unit-norm sparse principal component $x \in \mathbb{R}^n$ planted in a random matrix, in either the Wigner or Wishart spiked model (observing either $W + \lambda xx^\top$ with $W$ drawn from the Gaussian orthogonal ensemble, or $N$ independent samples from $\mathcal{N}(0, I_n + \beta xx^\top)$, respectively). Prior work has shown that when the signal-to-noise ratio ($\lambda$ or $\beta\sqrt{N/n}$, respectively) is a small constant and the fraction of nonzero entries in the planted vector is $\|x\|_0 / n = \rho$, it is possible to recover $x$ in polynomial time if $\rho \lesssim 1/\sqrt{n}$. While it is possible to recover $x$ in exponential time under the weaker condition $\rho \ll 1$, it is believed that polynomial-time recovery is impossible unless $\rho \lesssim 1/\sqrt{n}$. We investigate the precise amount of time required for recovery in the "possible but hard" regime $1/\sqrt{n} \ll \rho \ll 1$ by exploring the power of subexponential-time algorithms, i.e., algorithms running in time $\exp(n^\delta)$ for some constant $\delta \in (0,1)$. For any $1/\sqrt{n} \ll \rho \ll 1$, we give a recovery algorithm with runtime roughly $\exp(\rho^2 n)$, demonstrating a smooth tradeoff between sparsity and runtime. Our family of algorithms interpolates smoothly between two existing algorithms: the polynomial-time diagonal thresholding algorithm and the $\exp(\rho n)$-time exhaustive search algorithm. Furthermore, by analyzing the low-degree likelihood ratio, we give rigorous evidence suggesting that the tradeoff achieved by our algorithms is optimal.

artificial intelligence, machine learning, null 1, (14 more...)

arXiv.org Machine Learning

1907.11635

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Optimality and Sub-optimality of PCA I: Spiked Random Matrix Models

Perry, Amelia, Wein, Alexander S., Bandeira, Afonso S., Moitra, Ankur

arXiv.org Machine LearningJul-2-2018

A central problem of random matrix theory is to understand the eigenvalues of spiked random matrix models, introduced by Johnstone, in which a prominent eigenvector (or "spike") is planted into a random matrix. These distributions form natural statistical models for principal component analysis (PCA) problems throughout the sciences. Baik, Ben Arous and Peche showed that the spiked Wishart ensemble exhibits a sharp phase transition asymptotically: when the spike strength is above a critical threshold, it is possible to detect the presence of a spike based on the top eigenvalue, and below the threshold the top eigenvalue provides no information. Such results form the basis of our understanding of when PCA can detect a low-rank signal in the presence of noise. However, under structural assumptions on the spike, not all information is necessarily contained in the spectrum. We study the statistical limits of tests for the presence of a spike, including non-spectral tests. Our results leverage Le Cam's notion of contiguity, and include: i) For the Gaussian Wigner ensemble, we show that PCA achieves the optimal detection threshold for certain natural priors for the spike. ii) For any non-Gaussian Wigner ensemble, PCA is sub-optimal for detection. However, an efficient variant of PCA achieves the optimal threshold (for natural priors) by pre-transforming the matrix entries. iii) For the Gaussian Wishart ensemble, the PCA threshold is optimal for positive spikes (for natural priors) but this is not always the case for negative spikes.

artificial intelligence, machine learning, threshold, (17 more...)

arXiv.org Machine Learning

1807.00891

Country:

Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.24)
North America > United States > Massachusetts > Middlesex County (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

Notes on computational-to-statistical gaps: predictions using statistical physics

Bandeira, Afonso S., Perry, Amelia, Wein, Alexander S.

arXiv.org Machine LearningMar-29-2018

In these notes we describe heuristics to predict computational-to-statistical gaps in certain statistical problems. These are regimes in which the underlying statistical problem is information-theoretically possible although no efficient algorithm exists, rendering the problem essentially unsolvable for large instances. The methods we describe here are based on mature, albeit non-rigorous, tools from statistical physics. These notes are based on a lecture series given by the authors at the Courant Institute of Mathematical Sciences in New York City, on May 16th, 2017.

algorithm, artificial intelligence, bayesian inference, (14 more...)

arXiv.org Machine Learning

1803.11132

Country: North America > United States > New York (0.24)

Genre:

Research Report (0.64)
Instructional Material > Course Syllabus & Notes (0.54)

Industry:

Energy > Oil & Gas (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback