AITopics | Jadbabaie, Ali

Plotting

Jadbabaie, Ali

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unifying Epidemic Models with Mixtures

Sarker, Arnab, Jadbabaie, Ali, Shah, Devavrat

arXiv.org Machine LearningJan-7-2022

The COVID-19 pandemic has emphasized the need for a robust understanding of epidemic models. Current models of epidemics are classified as either mechanistic or non-mechanistic: mechanistic models make explicit assumptions on the dynamics of disease, whereas non-mechanistic models make assumptions on the form of observed time series. Here, we introduce a simple mixture-based model which bridges the two approaches while retaining benefits of both. The model represents time series of cases and fatalities as a mixture of Gaussian curves, providing a flexible function class to learn from data compared to traditional mechanistic models. Although the model is non-mechanistic, we show that it arises as the natural outcome of a stochastic process based on a networked SIR framework. This allows learned parameters to take on a more meaningful interpretation compared to similar non-mechanistic models, and we validate the interpretations using auxiliary mobility data collected during the COVID-19 pandemic. We provide a simple learning algorithm to identify model parameters and establish theoretical results which show the model can be efficiently learned from data. Empirically, we find the model to have low prediction error. The model is available live at covidpredictions.mit.edu. Ultimately, this allows us to systematically understand the impacts of interventions on COVID-19, which is critical in developing data-driven solutions to controlling epidemics.

artificial intelligence, health & medicine, machine learning, (19 more...)

arXiv.org Machine Learning

2201.0496

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.24)
Europe > United Kingdom > England (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Federated Optimization of Smooth Loss Functions

Jadbabaie, Ali, Makur, Anuran, Shah, Devavrat

arXiv.org Machine LearningJan-6-2022

In this work, we study empirical risk minimization (ERM) within a federated learning framework, where a central server minimizes an ERM objective function using training data that is stored across $m$ clients. In this setting, the Federated Averaging (FedAve) algorithm is the staple for determining $\epsilon$-approximate solutions to the ERM problem. Similar to standard optimization algorithms, the convergence analysis of FedAve only relies on smoothness of the loss function in the optimization parameter. However, loss functions are often very smooth in the training data too. To exploit this additional smoothness, we propose the Federated Low Rank Gradient Descent (FedLRGD) algorithm. Since smoothness in data induces an approximate low rank structure on the loss function, our method first performs a few rounds of communication between the server and clients to learn weights that the server can use to approximate clients' gradients. Then, our method solves the ERM problem at the server using inexact gradient descent. To show that FedLRGD can have superior performance to FedAve, we present a notion of federated oracle complexity as a counterpart to canonical oracle complexity. Under some assumptions on the loss function, e.g., strong convexity in parameter, $\eta$-H\"older smoothness in data, etc., we prove that the federated oracle complexity of FedLRGD scales like $\phi m(p/\epsilon)^{\Theta(d/\eta)}$ and that of FedAve scales like $\phi m(p/\epsilon)^{3/4}$ (neglecting sub-dominant factors), where $\phi\gg 1$ is a "communication-to-computation ratio," $p$ is the parameter dimension, and $d$ is the data dimension. Then, we show that when $d$ is small and the loss function is sufficiently smooth in the data, FedLRGD beats FedAve in federated oracle complexity. Finally, in the course of analyzing FedLRGD, we also establish a result on low rank approximation of latent variable models.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

2201.01954

Country:

Europe (1.00)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Time varying regression with hidden linear dynamics

Jadbabaie, Ali, Mania, Horia, Shah, Devavrat, Sra, Suvrit

arXiv.org Machine LearningDec-29-2021

The distribution of labels given the covariates changes over time in a variety of applications of regression. Some example domains where such problems arise include economics, marketing, fashion, and supply chain optimization, where market properties evolve over time. Motivated by such problems, we revisit a model for time-varying linear regression that assumes the unknown parameters evolve according to a linear dynamical system. One way to account for distribution change in linear regression is to assume that the unknown model parameters change slowly with time [2, 15, 37]. While this assumption simplifies the problem and makes it tractable, it misses on exploiting additional structure available and it also fails to model periodicity (e.g., due to seasonality) present in some problems. As an alternative, we are interested in a dynamic model previously studied by Chow [7], Carraro [5], and Shumway et al. [26].

artificial intelligence, banking & finance, machine learning, (19 more...)

arXiv.org Machine Learning

2112.14862

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.82)

Industry: Banking & Finance > Economy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

On Convergence of Training Loss Without Reaching Stationary Points

Zhang, Jingzhao, Li, Haochuan, Sra, Suvrit, Jadbabaie, Ali

arXiv.org Machine LearningOct-12-2021

It is a well-known fact that nonconvex optimization is computationally intractable in the worst case. As a result, theoretical analysis of optimization algorithms such as gradient descent often focuses on local convergence to stationary points where the gradient norm is zero or negligible. In this work, we examine the disconnect between the existing theoretical analysis of gradient-based algorithms and actual practice. Specifically, we provide numerical evidence that in large-scale neural network training, such as in ImageNet, ResNet, and WT103 TransformerXL models, the Neural Network weight variables do not converge to stationary points where the gradient of the loss function vanishes. Remarkably, however, we observe that while weights do not converge to stationary points, the value of the loss function converges. Inspired by this observation, we propose a new perspective based on ergodic theory of dynamical systems. We prove convergence of the distribution of weight values to an approximate invariant measure (without smoothness and assumptions) that explains how the training loss can stabilize without weights necessarily converging to stationary points. We further discuss how this perspective can better align the theory with empirical observations.

artificial intelligence, machine learning, neural network, (17 more...)

arXiv.org Machine Learning

2110.06256

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)

Add feedback

Complexity Lower Bounds for Nonconvex-Strongly-Concave Min-Max Optimization

Li, Haochuan, Tian, Yi, Zhang, Jingzhao, Jadbabaie, Ali

arXiv.org Machine LearningApr-18-2021

We provide a first-order oracle complexity lower bound for finding stationary points of min-max optimization problems where the objective function is smooth, nonconvex in the minimization variable, and strongly concave in the maximization variable. We establish a lower bound of $\Omega\left(\sqrt{\kappa}\epsilon^{-2}\right)$ for deterministic oracles, where $\epsilon$ defines the level of approximate stationarity and $\kappa$ is the condition number. Our analysis shows that the upper bound achieved in (Lin et al., 2020b) is optimal in the $\epsilon$ and $\kappa$ dependence up to logarithmic factors. For stochastic oracles, we provide a lower bound of $\Omega\left(\sqrt{\kappa}\epsilon^{-2} + \kappa^{1/3}\epsilon^{-4}\right)$. It suggests that there is a significant gap between the upper bound $\mathcal{O}(\kappa^3 \epsilon^{-4})$ in (Lin et al., 2020a) and our lower bound in the condition number dependence.

artificial intelligence, optimization, optimization problem, (17 more...)

arXiv.org Machine Learning

2104.08708

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

A General Framework for Distributed Inference with Uncertain Models

Hare, James Z., Uribe, Cesar A., Kaplan, Lance, Jadbabaie, Ali

arXiv.org Artificial IntelligenceNov-20-2020

This paper studies the problem of distributed classification with a network of heterogeneous agents. The agents seek to jointly identify the underlying target class that best describes a sequence of observations. The problem is first abstracted to a hypothesis-testing framework, where we assume that the agents seek to agree on the hypothesis (target class) that best matches the distribution of observations. Non-Bayesian social learning theory provides a framework that solves this problem in an efficient manner by allowing the agents to sequentially communicate and update their beliefs for each hypothesis over the network. Most existing approaches assume that agents have access to exact statistical models for each hypothesis. However, in many practical applications, agents learn the likelihood models based on limited data, which induces uncertainty in the likelihood function parameters. In this work, we build upon the concept of uncertain models to incorporate the agents' uncertainty in the likelihoods by identifying a broad set of parametric distribution that allows the agents' beliefs to converge to the same result as a centralized approach. Furthermore, we empirically explore extensions to non-parametric models to provide a generalized framework of uncertain models in non-Bayesian social learning.

agent, artificial intelligence, bayesian inference, (18 more...)

arXiv.org Artificial Intelligence

2011.10669

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (0.84)

Industry:

Education (0.57)
Government > Military (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Gradient-Based Empirical Risk Minimization using Local Polynomial Regression

Jadbabaie, Ali, Makur, Anuran, Shah, Devavrat

arXiv.org Machine LearningNov-4-2020

In this paper, we consider the problem of empirical risk minimization (ERM) of smooth, strongly convex loss functions using iterative gradient-based methods. A major goal of this literature has been to compare different algorithms, such as gradient descent (GD) or stochastic gradient descent (SGD), by analyzing their rates of convergence to $\epsilon$-approximate solutions. For example, the oracle complexity of GD is $O(n\log(\epsilon^{-1}))$, where $n$ is the number of training samples. When $n$ is large, this can be expensive in practice, and SGD is preferred due to its oracle complexity of $O(\epsilon^{-1})$. Such standard analyses only utilize the smoothness of the loss function in the parameter being optimized. In contrast, we demonstrate that when the loss function is smooth in the data, we can learn the oracle at every iteration and beat the oracle complexities of both GD and SGD in important regimes. Specifically, at every iteration, our proposed algorithm performs local polynomial regression to learn the gradient of the loss function, and then estimates the true gradient of the ERM objective function. We establish that the oracle complexity of our algorithm scales like $\tilde{O}((p \epsilon^{-1})^{d/(2\eta)})$ (neglecting sub-dominant factors), where $d$ and $p$ are the data and parameter space dimensions, respectively, and the gradient of the loss function belongs to a $\eta$-H\"{o}lder class with respect to the data. Our proof extends the analysis of local polynomial regression in non-parametric statistics to provide interpolation guarantees in multivariate settings, and also exploits tools from the inexact GD literature. Unlike GD and SGD, the complexity of our method depends on $d$ and $p$. However, when $d$ is small and the loss function exhibits modest smoothness in the data, our algorithm beats GD and SGD in oracle complexity for a very broad range of $p$ and $\epsilon$.

artificial intelligence, health & medicine, oracle complexity, (12 more...)

arXiv.org Machine Learning

2011.02522

Country:

Europe (0.92)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

A Distributed Cubic-Regularized Newton Method for Smooth Convex Optimization over Networks

Uribe, César A., Jadbabaie, Ali

arXiv.org Machine LearningJul-7-2020

We propose a distributed, cubic-regularized Newton method for large-scale convex optimization over networks. The proposed method requires only local computations and communications and is suitable for federated learning applications over arbitrary network topologies. We show a $O(k^{{-}3})$ convergence rate when the cost function is convex with Lipschitz gradient and Hessian, with $k$ being the number of iterations. We further provide network-dependent bounds for the communication required in each step of the algorithm. We provide numerical experiments that validate our theoretical results.

artificial intelligence, optimization, optimization problem, (18 more...)

arXiv.org Machine Learning

2007.03562

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.62)

Add feedback

GAT-GMM: Generative Adversarial Training for Gaussian Mixture Models

Farnia, Farzan, Wang, William, Das, Subhro, Jadbabaie, Ali

arXiv.org Machine LearningJun-18-2020

Generative adversarial networks (GANs) learn the distribution of observed samples through a zero-sum game between two machine players, a generator and a discriminator. While GANs achieve great success in learning the complex distribution of image, sound, and text data, they perform suboptimally in learning multi-modal distribution-learning benchmarks including Gaussian mixture models (GMMs). In this paper, we propose Generative Adversarial Training for Gaussian Mixture Models (GAT-GMM), a minimax GAN framework for learning GMMs. Motivated by optimal transport theory, we design the zero-sum game in GAT-GMM using a random linear generator and a softmax-based quadratic discriminator architecture, which leads to a non-convex concave minimax optimization problem. We show that a Gradient Descent Ascent (GDA) method converges to an approximate stationary minimax point of the GAT-GMM optimization problem. In the benchmark case of a mixture of two symmetric, well-separated Gaussians, we further show this stationary point recovers the true parameters of the underlying GMM. We numerically support our theoretical findings by performing several experiments, which demonstrate that GAT-GMM can perform as well as the expectation-maximization algorithm in learning mixtures of two Gaussians.

artificial intelligence, exp, machine learning, (13 more...)

arXiv.org Machine Learning

2006.10293

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.86)

Add feedback

Robust Federated Learning: The Case of Affine Distribution Shifts

Reisizadeh, Amirhossein, Farnia, Farzan, Pedarsani, Ramtin, Jadbabaie, Ali

arXiv.org Machine LearningJun-15-2020

Federated learning is a distributed paradigm that aims at training models using samples distributed across multiple users in a network while keeping the samples on users' devices with the aim of efficiency and protecting users privacy. In such settings, the training data is often statistically heterogeneous and manifests various distribution shifts across users, which degrades the performance of the learnt model. The primary goal of this paper is to develop a robust federated learning algorithm that achieves satisfactory performance against distribution shifts in users' samples. To achieve this goal, we first consider a structured affine distribution shift in users' data that captures the device-dependent data heterogeneity in federated settings. This perturbation model is applicable to various federated learning problems such as image classification where the images undergo device-dependent imperfections, e.g. different intensity, contrast, and brightness. To address affine distribution shifts across users, we propose a Federated Learning framework Robust to Affine distribution shifts (FLRA) that is provably robust against affine Wasserstein shifts to the distribution of observed samples. To solve the FLRA's distributed minimax problem, we propose a fast and efficient optimization method and provide convergence guarantees via a gradient Descent Ascent (GDA) method. We further prove generalization error bounds for the learnt classifier to show proper generalization from empirical distribution of samples to the true underlying distribution. We perform several numerical experiments to empirically support FLRA. We show that an affine distribution shift indeed suffices to significantly decrease the performance of the learnt classifier in a new test user, and our proposed algorithm achieves a significant gain in comparison to standard federated learning and adversarial training methods.

arxiv preprint arxiv, deep learning, neural network, (15 more...)

arXiv.org Machine Learning

2006.08907

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Add feedback