AITopics | Sachdeva, Sushant

Collaborating Authors

Sachdeva, Sushant

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PREM: Privately Answering Statistical Queries with Relative Error

Ghazi, Badih, Guzmán, Cristóbal, Kamath, Pritish, Knop, Alexander, Kumar, Ravi, Manurangsi, Pasin, Sachdeva, Sushant

arXiv.org Artificial IntelligenceFeb-20-2025

We introduce $\mathsf{PREM}$ (Private Relative Error Multiplicative weight update), a new framework for generating synthetic data that achieves a relative error guarantee for statistical queries under $(\varepsilon, \delta)$ differential privacy (DP). Namely, for a domain ${\cal X}$, a family ${\cal F}$ of queries $f : {\cal X} \to \{0, 1\}$, and $\zeta > 0$, our framework yields a mechanism that on input dataset $D \in {\cal X}^n$ outputs a synthetic dataset $\widehat{D} \in {\cal X}^n$ such that all statistical queries in ${\cal F}$ on $D$, namely $\sum_{x \in D} f(x)$ for $f \in {\cal F}$, are within a $1 \pm \zeta$ multiplicative factor of the corresponding value on $\widehat{D}$ up to an additive error that is polynomial in $\log |{\cal F}|$, $\log |{\cal X}|$, $\log n$, $\log(1/\delta)$, $1/\varepsilon$, and $1/\zeta$. In contrast, any $(\varepsilon, \delta)$-DP mechanism is known to require worst-case additive error that is polynomial in at least one of $n, |{\cal F}|$, or $|{\cal X}|$. We complement our algorithm with nearly matching lower bounds.

algorithm, artificial intelligence, query, (17 more...)

arXiv.org Artificial Intelligence

2502.14809

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)

Add feedback

A Provably Convergent and Practical Algorithm for Min-Max Optimization with Applications to GANs

Mangoubi, Oren, Sachdeva, Sushant, Vishnoi, Nisheeth K.

arXiv.org Machine LearningNov-2-2020

We present a first-order algorithm for nonconvex-nonconcave min-max optimization problems such as those that arise in training GANs. Our algorithm provably converges in time polynomial in the dimension and smoothness parameters of the loss function. To achieve convergence, we 1) give a novel approximation to the global strategy of the max-player based on first-order algorithms such as gradient ascent, and 2) empower the min-player to look ahead and simulate the max-player's response for arbitrarily many steps, but restrict the min-player to move according to updates sampled from a stochastic gradient oracle. Our algorithm, when used to train GANs on synthetic and real-world datasets, does not cycle, results in GANs that seem to avoid mode collapse, and achieves a training time per iteration and memory requirement similar to gradient descent-ascent.

algorithm, artificial intelligence, neural network, (19 more...)

arXiv.org Machine Learning

2006.12376

Country:

North America > United States > California (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.88)

Add feedback

Faster Graph Embeddings via Coarsening

Fahrbach, Matthew, Goranci, Gramoz, Peng, Richard, Sachdeva, Sushant, Wang, Chi

arXiv.org Machine LearningOct-22-2020

Graph embeddings are a ubiquitous tool for machine learning tasks, such as node classification and link prediction, on graph-structured data. However, computing the embeddings for large-scale graphs is prohibitively inefficient even if we are interested only in a small subset of relevant vertices. To address this, we present an efficient graph coarsening approach, based on Schur complements, for computing the embedding of the relevant vertices. We prove that these embeddings are preserved exactly by the Schur complement graph that is obtained via Gaussian elimination on the non-relevant vertices. As computing Schur complements is expensive, we give a nearly-linear time algorithm that generates a coarsened graph on the relevant vertices that provably matches the Schur complement in expectation in each iteration. Our experiments involving prediction tasks on graphs demonstrate that computing embeddings on the coarsened graph, rather than the entire graph, leads to significant time savings without sacrificing accuracy.

graph, information management, social media, (21 more...)

arXiv.org Machine Learning

2007.02817

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications > Social Media (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.71)

Add feedback

Regularized linear autoencoders recover the principal components, eventually

Bao, Xuchan, Lucas, James, Sachdeva, Sushant, Grosse, Roger

arXiv.org Machine LearningJul-13-2020

Our understanding of learning input-output relationships with neural nets has improved rapidly in recent years, but little is known about the convergence of the underlying representations, even in the simple case of linear autoencoders (LAEs). We show that when trained with proper regularization, LAEs can directly learn the optimal representation -- ordered, axis-aligned principal components. We analyze two such regularization schemes: non-uniform $\ell_2$ regularization and a deterministic variant of nested dropout [Rippel et al, ICML' 2014]. Though both regularization schemes converge to the optimal representation, we show that this convergence is slow due to ill-conditioning that worsens with increasing latent dimension. We show that the inefficiency of learning the optimal representation is not inevitable -- we present a simple modification to the gradient descent update that greatly speeds up convergence empirically.

artificial intelligence, neural network, representation, (18 more...)

arXiv.org Machine Learning

2007.06731

Country: North America > Canada > Ontario > Toronto (0.46)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)

Add feedback

Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model

Zhang, Guodong, Li, Lala, Nado, Zachary, Martens, James, Sachdeva, Sushant, Dahl, George E., Shallue, Christopher J., Grosse, Roger

arXiv.org Machine LearningJul-9-2019

Increasing the batch size is a popular way to speed up neural network training, but beyond some critical batch size, larger batch sizes yield diminishing returns. In this work, we study how the critical batch size changes based on properties of the optimization algorithm, including acceleration and preconditioning, through two different lenses: large scale experiments, and analysis of a simple noisy quadratic model (NQM). We experimentally demonstrate that optimization algorithms that employ preconditioning, specifically Adam and K-FAC, result in much larger critical batch sizes than stochastic gradient descent with momentum. We also demonstrate that the NQM captures many of the essential features of real neural network training, despite being drastically simpler to work with. The NQM predicts our results with preconditioned optimizers, previous results with accelerated gradient descent, and other results around optimal learning rates and large batch training, making it a useful tool to generate testable predictions about neural network optimization.

artificial intelligence, batch size, neural network, (16 more...)

arXiv.org Machine Learning

1907.04164

Genre: Research Report > New Finding (0.34)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.75)

Add feedback

Iterative Refinement for $\ell_p$-norm Regression

Adil, Deeksha, Kyng, Rasmus, Peng, Richard, Sachdeva, Sushant

arXiv.org Machine LearningJan-20-2019

We give improved algorithms for the $\ell_{p}$-regression problem, $\min_{x} \|x\|_{p}$ such that $A x=b,$ for all $p \in (1,2) \cup (2,\infty).$ Our algorithms obtain a high accuracy solution in $\tilde{O}_{p}(m^{\frac{|p-2|}{2p + |p-2|}}) \le \tilde{O}_{p}(m^{\frac{1}{3}})$ iterations, where each iteration requires solving an $m \times m$ linear system, $m$ being the dimension of the ambient space. By maintaining an approximate inverse of the linear systems that we solve in each iteration, we give algorithms for solving $\ell_{p}$-regression to $1 / \text{poly}(n)$ accuracy that run in time $\tilde{O}_p(m^{\max\{\omega, 7/3\}}),$ where $\omega$ is the matrix multiplication constant. For the current best value of $\omega > 2.37$, we can thus solve $\ell_{p}$ regression as fast as $\ell_{2}$ regression, for all constant $p$ bounded away from $1.$ Our algorithms can be combined with fast graph Laplacian linear equation solvers to give minimum $\ell_{p}$-norm flow / voltage solutions to $1 / \text{poly}(n)$ accuracy on an undirected graph with $m$ edges in $\tilde{O}_{p}(m^{1 + \frac{|p-2|}{2p + |p-2|}}) \le \tilde{O}_{p}(m^{\frac{4}{3}})$ time. For sparse graphs and for matrices with similar dimensions, our iteration counts and running times improve on the $p$-norm regression algorithm by [Bubeck-Cohen-Lee-Li STOC`18] and general-purpose convex optimization algorithms. At the core of our algorithms is an iterative refinement scheme for $\ell_{p}$-norms, using the smoothed $\ell_{p}$-norms introduced in the work of Bubeck et al. Given an initial solution, we construct a problem that seeks to minimize a quadratically-smoothed $\ell_{p}$ norm over a subspace, such that a crude solution to this problem allows us to improve the initial solution by a constant factor, leading to algorithms with fast convergence.

algorithm, optimization problem, survey article, (21 more...)

arXiv.org Machine Learning

1901.06764

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)
North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Add feedback

Fast, Provable Algorithms for Isotonic Regression in all L_p-norms

Kyng, Rasmus, Rao, Anup, Sachdeva, Sushant

Neural Information Processing SystemsDec-31-2015

Given a directed acyclic graph $G,$ and a set of values $y$ on the vertices, the Isotonic Regression of $y$ is a vector $x$ that respects the partial order described by $G,$ and minimizes $\|x-y\|,$ for a specified norm. This paper gives improved algorithms for computing the Isotonic Regression for all weighted $\ell_{p}$-norms with rigorous performance guarantees. Our algorithms are quite practical, and their variants can be implemented to run fast in practice.

artificial intelligence, isotonic regression, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.93)

Add feedback

Provable ICA with Unknown Gaussian Noise, with Implications for Gaussian Mixtures and Autoencoders

Arora, Sanjeev, Ge, Rong, Moitra, Ankur, Sachdeva, Sushant

Neural Information Processing SystemsDec-31-2012

We present a new algorithm for Independent Component Analysis (ICA) which has provable performance guarantees. In particular, suppose we are given samples of the form $y = Ax + \eta$ where $A$ is an unknown $n \times n$ matrix and $x$ is chosen uniformly at random from $\{+1, -1\}^n$, $\eta$ is an $n$-dimensional Gaussian random variable with unknown covariance $\Sigma$: We give an algorithm that provable recovers $A$ and $\Sigma$ up to an additive $\epsilon$ whose running time and sample complexity are polynomial in $n$ and $1 / \epsilon$. To accomplish this, we introduce a novel ``quasi-whitening'' step that may be useful in other contexts in which the covariance of Gaussian noise is not known in advance. We also give a general framework for finding all local optima of a function (given an oracle for approximately finding just one) and this is a crucial step in our algorithm, one that has been overlooked in previous attempts, and allows us to control the accumulation of error when we find the columns of $A$ one by one via local search.

algorithm, neural network, optimization problem, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Workflow (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback