AITopics | sigma

Collaborating Authors

sigma

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Global Convergence of Gradient Descent for Asymmetric Low-Rank Matrix Factorization

Neural Information Processing SystemsDec-23-2025, 18:11:49 GMT

This is a canonical problem that admits two difficulties in optimization: 1) non-convexity and 2) non-smoothness (due to unbalancedness of $\mathbf{U}$ and $\mathbf{V}$). This is also a prototype for more complex problems such as asymmetric matrix sensing and matrix completion. Despite being non-convex and non-smooth, it has been observed empirically that the randomly initialized gradient descent algorithm can solve this problem in polynomial time. Existing theories to explain this phenomenon all require artificial modifications of the algorithm, such as adding noise in each iteration and adding a balancing regularizer to balance the $\mathbf{U}$ and $\mathbf{V}$.This paper presents the first proof that shows randomly initialized gradient descent converges to a global minimum of the asymmetric low-rank factorization problem with a polynomial rate. For the proof, we develop 1) a new symmetrization technique to capture the magnitudes of the symmetry and asymmetry, and 2) a quantitative perturbation analysis to approximate matrix derivatives. We believe both are useful for other related non-convex problems.

global convergence, gradient descent, mathbf, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.82)

Add feedback

Dimension-free Private Mean Estimation for Anisotropic Distributions

Neural Information Processing SystemsMay-27-2025, 19:02:28 GMT

Previous private estimators on distributions over \mathbb{R} d suffer from a curse of dimensionality, as they require \Omega(d {1/2}) samples to achieve non-trivial error, even in cases where O(1) samples suffice without privacy. This rate is unavoidable when the distribution is isotropic, namely, when the covariance is a multiple of the identity matrix. Yet, real-world data is often highly anisotropic, with signals concentrated on a small number of principal components. We develop estimators that are appropriate for such signals---our estimators are (\varepsilon,\delta) -differentially private and have sample complexity that is dimension-independent for anisotropic subgaussian distributions. We show that this is the optimal sample complexity for this task up to logarithmic factors.

anisotropic distribution, dimension-free private mean estimation, sample complexity, (5 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.08)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.42)

Add feedback

Oja's Algorithm for Streaming Sparse PCA

Neural Information Processing SystemsMay-27-2025, 07:44:31 GMT

Oja's algorithm for Streaming Principal Component Analysis (PCA) for n data-points in a d dimensional space achieves the same sin-squared error O(r_{\mathsf{eff}}/n) as the offline algorithm in O(d) space and O(nd) time and a single pass through the datapoints. Here r_{\mathsf{eff}} is the effective rank (ratio of the trace and the principal eigenvalue of the population covariance matrix \Sigma). Under this computational budget, we consider the problem of sparse PCA, where the principal eigenvector of \Sigma is s -sparse, and r_{\mathsf{eff}} can be large. In this setting, to our knowledge, *there are no known single-pass algorithms* that achieve the minimax error bound in O(d) space and O(nd) time without either requiring strong initialization conditions or assuming further structure (e.g., spiked) of the covariance matrix.We show that a simple single-pass procedure that thresholds the output of Oja's algorithm (the Oja vector) can achieve the minimax error bound under some regularity conditions in O(d) space and O(nd) time. We present a nontrivial and novel analysis of the entries of the unnormalized Oja vector, which involves the projection of a product of independent random matrices on a random initial vector.

mathsf, oja, streaming sparse pca, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsFeb-6-2025, 12:52:09 GMT

I have read the paper fast classification rates for high-dimensional conditional Gaussian models". The paper studies the problem of binary classification using a Gaussian model and provides some theoretical results on the convergence of the classification error rates (compared to the Bayes classifier). The paper presents some nice theoretical results and is interesting to some extent. I am generally positive about the paper but I have the following concerns. First, it is about the practical relevance.

author feedback and meta-review, classifier, convergence rate, (9 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Review for NeurIPS paper: Reinforced Molecular Optimization with Neighborhood-Controlled Grammars

Neural Information Processing SystemsJan-24-2025, 21:19:15 GMT

Additional Feedback: I appreciate the authors for addressing most of my concerns. I have updated my score from 4 to 6. i) For the empirical evaluation, I understand that the proposed method performs better than the method I found, when compared in fair settings. I think the experimental setting is sound enough, because the evaluation score is independent of the classifier. I wish the authors mention the existence of such benchmark environments in the main text so that following papers can use them. I would like the authors to clarify that the valency-preserving property comes from the inference algorithm rather than the definition of the molecular NCE grammar, because Definition 1 does not much specify the embedding function phi. For example, if we add phi(1, 6) "..." in the production rule shown in the top of Figure 2, this production rule does not preserve the degree of node 1, while the embedding function with phi(1, 6) "..." is still legal.

inference algorithm, production rule, reinforced molecular optimization, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Reviews: Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Neural Information Processing SystemsOct-8-2024, 10:28:12 GMT

The fundamental claim [line 101 & 239] is that asymptotically, for streaming PCA, the delay tau is allowed to scale as (1 - mu) 2 / sqrt(eta), where mu is the step size and mu the momentum parameter. Major Comments Before we discuss the proof, I think the introduction is somewhat misleading. In line 76, the authors point out previous work all focus on analyzing convergence to a first order optimal solution. The readers can be confused that this paper improved the results of previous work. However, the problems studies in those paper and streaming PCA are different.

acceleration tradeoff, momentum and asynchrony, nonconvex stochastic optimization, (8 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.37)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.40)

Add feedback

Reviews: Neural Tangent Kernel: Convergence and Generalization in Neural Networks

Neural Information Processing SystemsOct-7-2024, 10:49:35 GMT

The authors prove that networks of infinite width trained with SGD and (infinitely) small step size evolve according to a differential equation, the solution of which depends only on the covariance kernel of the data and, in the case of L2 regression, on the eigenspectrum of the Kernel. I believe this is a breakthrough result in the field of neural network theory. It elevates the analysis of infinitely wide networks from the study of the static initial function to closely predicting the entire training path. There are a plethora of powerful consequences about infinitely wide, fully-connected networks: - They cannot learn information not contained in the covariance matrix - Change to latent representation and parameters tends to zero as width goes to infinity. Therefore choosing nonlinearities in all layers reduces to choosing a single 1d function.

covariance sigma, sigma, variance, (12 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.71)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)

Add feedback

Provable ICA with Unknown Gaussian Noise, with Implications for Gaussian Mixtures and Autoencoders

Arora, Sanjeev, Ge, Rong, Moitra, Ankur, Sachdeva, Sushant

Neural Information Processing SystemsFeb-14-2020, 23:43:49 GMT

We present a new algorithm for Independent Component Analysis (ICA) which has provable performance guarantees. In particular, suppose we are given samples of the form $y Ax \eta$ where $A$ is an unknown $n \times n$ matrix and $x$ is chosen uniformly at random from $\{ 1, -1\} n$, $\eta$ is an $n$-dimensional Gaussian random variable with unknown covariance $\Sigma$: We give an algorithm that provable recovers $A$ and $\Sigma$ up to an additive $\epsilon$ whose running time and sample complexity are polynomial in $n$ and $1 / \epsilon$. To accomplish this, we introduce a novel quasi-whitening'' step that may be useful in other contexts in which the covariance of Gaussian noise is not known in advance. We also give a general framework for finding all local optima of a function (given an oracle for approximately finding just one) and this is a crucial step in our algorithm, one that has been overlooked in previous attempts, and allows us to control the accumulation of error when we find the columns of $A$ one by one via local search. Papers published at the Neural Information Processing Systems Conference.

gaussian mixture and autoencoder, provable ica, unknown gaussian noise, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

Add feedback

two-sigma-s-siegel-says-artificial-intelligence-lacks-smarts

#artificialintelligenceSep-28-2016, 16:50:15 GMT

David Siegel, a quantitative hedge fund pioneer, issued a warning to investors: Artificial intelligence lacks common sense. Siegel, who has used AI to build his Two Sigma Investments into a 37 billion hedge fund firm, said algorithms are limited by the scant amount of training data available to instruct them on how to identify everything from objects in images to trading opportunities. Hedge funds are embracing a form of AI called machine learning years after Two Sigma deployed the technology and as stock and bond pickers struggle to outperform markets. A unit of the firm, called Two Sigma Ventures, seeks to invest in companies focused on data science, machine learning, artificial intelligence and advanced hardware.

artificial intelligence, big data, Siegel, (17 more...)

#artificialintelligence

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.57)

Add feedback