AITopics

2006.08189

Country:

North America > United States > New York > New York County > New York City (0.34)
Europe > United Kingdom > England (0.28)
North America > United States > Massachusetts (0.28)
North America > United States > Pennsylvania (0.27)

Genre: Research Report > New Finding (0.48)

Industry:

Leisure & Entertainment > Sports > Soccer (1.00)
Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Neural Information Processing SystemsFeb-14-2020, 17:57:36 GMT

Online Learning of Dynamic Parameters in Social Networks

Shahrampour, Shahin, Rakhlin, Sasha, Jadbabaie, Ali

This paper addresses the problem of online learning in a dynamic setting. We consider a social network in which each individual observes a private signal about the underlying state of the world and communicates with her neighbors at each time period. Unlike many existing approaches, the underlying state is dynamic, and evolves according to a geometric random walk. We view the scenario as an optimization problem where agents aim to learn the true state while suffering the smallest possible loss. Based on the decomposition of the global loss function, we introduce two update mechanisms, each of which generates an estimate of the true state.

computer based training, educational technology, online learning, (10 more...)

Industry:

Information Technology > Services (0.64)
Education > Educational Setting > Online (0.64)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (0.64)
Information Technology > Communications > Social Media (0.64)
Information Technology > Artificial Intelligence (0.47)

arXiv.org Machine LearningOct-11-2019

FedPAQ: A Communication-Efficient Federated Learning Method with Periodic Averaging and Quantization

Reisizadeh, Amirhossein, Mokhtari, Aryan, Hassani, Hamed, Jadbabaie, Ali, Pedarsani, Ramtin

Federated Learning is a novel paradigm that aims to train a statistical model at the "edge" nodes as opposed to the traditional distributed computing systems such as data centers [Konečn y et al., 2016, Li et al., 2019a]. The main objective of federated learning is to fit a model to data generated from network devices without continuous transfer of the massive amount of collected data from edge of the network to back-end servers for processing. Federated learning has been deployed by major technology companies with the goal of providing privacy-preserving services using users' data [Bonawitz et al., 2019]. Examples of such applications are learning from wearable devices [Huang et al., 2018], learning sentiment [Smith et al., 2017], and location-based services [Samarakoon et al., 2018]. While federated learning is a promising paradigm for such applications, there are several challenges that remain to be resolved. In this paper, we focus on two significant challenges of federated learning, and propose a novel federated learning algorithm that addresses the following two challenges: (i) Communication bottleneck. Communication bandwidth is a major bottleneck in federated learning as a large number of devices attempt to communicate their local updates to a central parameter server. Thus, at a high level, for a communication-efficient federated learning algorithm, it is crucial that such updates are sent in a compressed manner and infrequently.

neural network, null 2, optimization problem, (14 more...)

1909.13014

Country:

North America > United States > California (0.14)
North America > United States > Texas (0.14)
North America > United States > Massachusetts (0.14)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

arXiv.org Artificial IntelligenceSep-9-2019

Non-Bayesian Social Learning with Uncertain Models

Hare, James Z., Uribe, Cesar A., Kaplan, Lance, Jadbabaie, Ali

Non-Bayesian social learning theory provides a framework that models distributed inference for a group of agents interacting over a social network. In this framework, each agent iteratively forms and communicates beliefs about an unknown state of the world with their neighbors using a learning rule. Existing approaches assume agents have access to precise statistical models (in the form of likelihoods) for the state of the world. However in many situations, such models must be learned from finite data. We propose a social learning rule that takes into account uncertainty in the statistical models using second-order probabilities. Therefore, beliefs derived from uncertain models are sensitive to the amount of past evidence collected for each hypothesis. We characterize how well the hypotheses can be tested on a social network, as consistent or not with the state of the world. We explicitly show the dependency of the generated beliefs with respect to the amount of prior evidence. Moreover, as the amount of prior evidence goes to infinity, learning occurs and is consistent with traditional social learning theory.

bayesian inference, hypothesis, social development & welfare, (18 more...)

arXiv.org Artificial Intelligence

1909.09228

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report (1.00)

Industry: Education > Curriculum (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

arXiv.org Machine LearningJul-8-2019

Are deep ResNets provably better than linear predictors?

Yun, Chulhee, Sra, Suvrit, Jadbabaie, Ali

Recently, a residual network (ResNet) with a single residual block has been shown to outperform linear predictors, in the sense that all its local minima are at least as good as the best linear predictor. We take a step towards extending this result to deep ResNets. As motivation, we first show that there exist datasets for which all local minima of a fully-connected ReLU network are no better than the best linear predictor, while a ResNet can have strictly better local minima. Second, we show that even at its global minimum, the representation obtained from the residual blocks of a 2-block ResNet does not necessarily improve monotonically as more blocks are added, highlighting a fundamental difficulty in analyzing deep ResNets. Our main result on deep ResNets shows that (under some geometric conditions) any critical point is either (i) at least as good as the best linear predictor; or (ii) the Hessian at this critical point has a strictly negative eigenvalue. Finally, we complement our results by analyzing near-identity regions of deep ResNets, obtaining size-independent upper bounds for the risk attained at critical points as well as the Rademacher complexity.

deep learning, linear predictor, neural network, (16 more...)

1907.03922

Country:

Europe > Albania > Durrës County (0.24)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Escaping Saddle Points in Constrained Optimization

Mokhtari, Aryan, Ozdaglar, Asuman, Jadbabaie, Ali

In this paper, we study the problem of escaping from saddle points in smooth nonconvex optimization problems subject to a convex set $\mathcal{C}$. We propose a generic framework that yields convergence to a second-order stationary point of the problem, if the convex set $\mathcal{C}$ is simple for a quadratic objective function. Specifically, our results hold if one can find a $\rho$-approximate solution of a quadratic program subject to $\mathcal{C}$ in polynomial time, where $\rho<1$ is a positive constant that depends on the structure of the set $\mathcal{C}$. Under this condition, we show that the sequence of iterates generated by the proposed framework reaches an $(\epsilon,\gamma)$-second order stationary point (SOSP) in at most $\mathcal{O}(\max\{\epsilon^{-2},\rho^{-3}\gamma^{-3}\})$ iterations. We further characterize the overall complexity of reaching an SOSP when the convex set $\mathcal{C}$ can be written as a set of quadratic constraints and the objective function Hessian has a specific structure over the convex $\mathcal{C}$. Finally, we extend our results to the stochastic setting and characterize the number of stochastic gradient and Hessian evaluations to reach an $(\epsilon,\gamma)$-SOSP.

artificial intelligence, optimization problem, stationary point, (17 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.50)

Direct Runge-Kutta Discretization Achieves Acceleration

Zhang, Jingzhao, Mokhtari, Aryan, Sra, Suvrit, Jadbabaie, Ali

We study gradient-based optimization methods obtained by directly discretizing a second-order ordinary differential equation (ODE) related to the continuous limit of Nesterov's accelerated gradient method. When the function is smooth enough, we show that acceleration can be achieved by a stable discretization of this ODE using standard Runge-Kutta integrators. Specifically, we prove that under Lipschitz-gradient, convexity and order-$(s+2)$ differentiability assumptions, the sequence of iterates generated by discretizing the proposed second-order ODE converges to the optimal solution at a rate of $\mathcal{O}({N^{-2\frac{s}{s+1}}})$, where $s$ is the order of the Runge-Kutta numerical integrator. Furthermore, we introduce a new local flatness condition on the objective, under which rates even faster than $\mathcal{O}(N^{-2})$ can be achieved with low-order integrators and only gradient information. Notably, this flatness condition is satisfied by several standard loss functions used in machine learning. We provide numerical experiments that verify the theoretical rates predicted by our results.

artificial intelligence, integrator, optimization problem, (16 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)

Escaping Saddle Points in Constrained Optimization

Mokhtari, Aryan, Ozdaglar, Asuman, Jadbabaie, Ali

In this paper, we study the problem of escaping from saddle points in smooth nonconvex optimization problems subject to a convex set $\mathcal{C}$. We propose a generic framework that yields convergence to a second-order stationary point of the problem, if the convex set $\mathcal{C}$ is simple for a quadratic objective function. Specifically, our results hold if one can find a $\rho$-approximate solution of a quadratic program subject to $\mathcal{C}$ in polynomial time, where $\rho<1$ is a positive constant that depends on the structure of the set $\mathcal{C}$. Under this condition, we show that the sequence of iterates generated by the proposed framework reaches an $(\epsilon,\gamma)$-second order stationary point (SOSP) in at most $\mathcal{O}(\max\{\epsilon^{-2},\rho^{-3}\gamma^{-3}\})$ iterations. We further characterize the overall complexity of reaching an SOSP when the convex set $\mathcal{C}$ can be written as a set of quadratic constraints and the objective function Hessian has a specific structure over the convex $\mathcal{C}$. Finally, we extend our results to the stochastic setting and characterize the number of stochastic gradient and Hessian evaluations to reach an $(\epsilon,\gamma)$-SOSP.

artificial intelligence, optimization problem, stationary point, (17 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.50)

Direct Runge-Kutta Discretization Achieves Acceleration

Zhang, Jingzhao, Mokhtari, Aryan, Sra, Suvrit, Jadbabaie, Ali

We study gradient-based optimization methods obtained by directly discretizing a second-order ordinary differential equation (ODE) related to the continuous limit of Nesterov's accelerated gradient method. When the function is smooth enough, we show that acceleration can be achieved by a stable discretization of this ODE using standard Runge-Kutta integrators. Specifically, we prove that under Lipschitz-gradient, convexity and order-$(s+2)$ differentiability assumptions, the sequence of iterates generated by discretizing the proposed second-order ODE converges to the optimal solution at a rate of $\mathcal{O}({N^{-2\frac{s}{s+1}}})$, where $s$ is the order of the Runge-Kutta numerical integrator. Furthermore, we introduce a new local flatness condition on the objective, under which rates even faster than $\mathcal{O}(N^{-2})$ can be achieved with low-order integrators and only gradient information. Notably, this flatness condition is satisfied by several standard loss functions used in machine learning. We provide numerical experiments that verify the theoretical rates predicted by our results.

artificial intelligence, integrator, optimization problem, (16 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)

arXiv.org Machine LearningOct-17-2018

Finite sample expressive power of small-width ReLU networks

Yun, Chulhee, Sra, Suvrit, Jadbabaie, Ali

We study universal finite sample expressivity of neural networks, defined as the capability to perfectly memorize arbitrary datasets. For scalar outputs, existing results require a hidden layer as wide as $N$ to memorize $N$ data points. In contrast, we prove that a 3-layer (2-hidden-layer) ReLU network with $4 \sqrt {N}$ hidden nodes can perfectly fit any arbitrary dataset. For $K$-class classification, we prove that a 4-layer ReLU network with $4 \sqrt{N} + 4K$ hidden neurons can memorize arbitrary datasets. For example, a 4-layer ReLU network with only 8,000 hidden nodes can memorize datasets with $N$ = 1M and $K$ = 1k (e.g., ImageNet). Our results show that even small networks already have tremendous overfitting capability, admitting zero empirical risk for any dataset. We also extend our results to deeper and narrower networks, and prove converse results showing necessity of $\Omega(N)$ parameters for shallow networks.

dataset, deep learning, neural network, (18 more...)

1810.0777

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)