Goto

Collaborating Authors

 Oceania


Federated Learning over Wireless Networks: Convergence Analysis and Resource Allocation

arXiv.org Machine Learning

--There is an increasing interest in a fast-growing machine learning technique called Federated Learning, in which the model training is distributed over mobile user equipments (UEs), exploiting UEs' local computation and training data. Despite its advantages in data privacy-preserving, Federated Learning (FL) still has challenges in heterogeneity across users' data and UE's characteristics. We first address the heterogeneous data challenge by proposing a FL algorithm that can bypass the independent and identically distributed (i.i.d.) UEs' data assumption for strongly convex and smooth problems. We provide the convergence rate characterizing the tradeoff between local computation rounds of UE to update its local model and global communication rounds to update the global model. We then employ the proposed FL algorithm in wireless networks as a resource allocation optimization problem that captures various tradeoffs between computation and communication latencies as well as between the Federated Learning time and UE energy consumption. Even though the wireless resource allocation problem of FL is non-convex, we exploit this problem's structure to decompose it into three sub-problems and analyze their closed-form solutions as well as insights to problem design. Finally, we illustrate the theoretical analysis for the new algorithm with T ensorflow experiments and extensive numerical results for the wireless resource allocation sub-problems. The experiment results not only verify the theoretical convergence but also show that our proposed algorithm converges significantly faster than the existing baseline approach. Index T erms --Distributed Machine Learning over Wireless Networks, Federated Learning, Optimization Decomposition. The significant increase in the number of cutting-edge mobiles and Internet of Things (IoT) devices results in the phenomenal growth of the data volume generated at the edge network. It has been predicted that in 2025 there will be 80 billion devices connected to the Internet and the global data will achieve 180 trillion gigabytes [2]. However, most of this data is privacy-sensitive in nature. It is not only risky to store this data in data centers but also costly in terms of communication. For example, location-based services such as the app Waze [3], can help users avoid heavy-traffic roads and thus reduce the congestion.


Scalable Inference for Nonparametric Hawkes Process Using P\'{o}lya-Gamma Augmentation

arXiv.org Machine Learning

In this paper, we consider the sigmoid Gaussian Hawkes process model: the baseline intensity and triggering kernel of Hawkes process are both modeled as the sigmoid transformation of random trajectories drawn from Gaussian processes (GP). By introducing auxiliary latent random variables (branching structure, P\'{o}lya-Gamma random variables and latent marked Poisson processes), the likelihood is converted to two decoupled components with a Gaussian form which allows for an efficient conjugate analytical inference. Using the augmented likelihood, we derive an expectation-maximization (EM) algorithm to obtain the maximum a posteriori (MAP) estimate. Furthermore, we extend the EM algorithm to an efficient approximate Bayesian inference algorithm: mean-field variational inference. We demonstrate the performance of two algorithms on simulated fictitious data. Experiments on real data show that our proposed inference algorithms can recover well the underlying prompting characteristics efficiently.


Efficiently avoiding saddle points with zero order methods: No gradients required

arXiv.org Machine Learning

We consider the case of derivative-free algorithms for non-convex optimization, also known as zero order algorithms, that use only function evaluations rather than gradients. For a wide variety of gradient approximators based on finite differences, we establish asymptotic convergence to second order stationary points using a carefully tailored application of the Stable Manifold Theorem. Regarding efficiency, we introduce a noisy zero-order method that converges to second order stationary points, i.e avoids saddle points. Our algorithm uses only $\tilde{\mathcal{O}}(1 / \epsilon^2)$ approximate gradient calculations and, thus, it matches the converge rate guarantees of their exact gradient counterparts up to constants. In contrast to previous work, our convergence rate analysis avoids imposing additional dimension dependent slowdowns in the number of iterations required for non-convex zero order optimization.


Poincar\'e Recurrence, Cycles and Spurious Equilibria in Gradient-Descent-Ascent for Non-Convex Non-Concave Zero-Sum Games

arXiv.org Machine Learning

We study a wide class of non-convex non-concave min-max games that generalizes over standard bilinear zero-sum games. In this class, players control the inputs of a smooth function whose output is being applied to a bilinear zero-sum game. This class of games is motivated by the indirect nature of the competition in Generative Adversarial Networks, where players control the parameters of a neural network while the actual competition happens between the distributions that the generator and discriminator capture. We establish theoretically, that depending on the specific instance of the problem gradient-descent-ascent dynamics can exhibit a variety of behaviors antithetical to convergence to the game theoretically meaningful min-max solution. Specifically, different forms of recurrent behavior (including periodicity and Poincar\'e recurrence) are possible as well as convergence to spurious (non-min-max) equilibria for a positive measure of initial conditions. At the technical level, our analysis combines tools from optimization theory, game theory and dynamical systems.


The Power of Graph Convolutional Networks to Distinguish Random Graph Models

arXiv.org Machine Learning

Graph convolutional networks (GCNs) are a widely used method for graph representation learning. We investigate the power of GCNs, as a function of their number of layers, to distinguish between different random graph models on the basis of the embeddings of their sample graphs. In particular, the graph models that we consider arise from graphons, which are the most general possible parameterizations of infinite exchangeable graph models and which are the central objects of study in the theory of dense graph limits. We exhibit an infinite class of graphons that are well-separated in terms of cut distance and are indistinguishable by a GCN with nonlinear activation functions coming from a certain broad class if its depth is at least logarithmic in the size of the sample graph, and furthermore show that, for this application, ReLU activation functions and non-identity weight matrices with non-negative entries do not help in terms of distinguishing power. These results theoretically match empirical observations of several prior works. Finally, we show that for pairs of graphons satisfying a degree profile separation property, a very simple GCN architecture suffices for distinguishability. To prove our results, we exploit a connection to random walks on graphs.


FD-Net with Auxiliary Time Steps: Fast Prediction of PDEs using Hessian-Free Trust-Region Methods

arXiv.org Machine Learning

Discovering the underlying physical behavior of complex systems is a crucial, but less well-understood topic in many engineering disciplines. This study proposes a finite-difference inspired convolutional neural network framework to learn hidden partial differential equations from given data and iteratively estimate future dynamical behavior. The methodology designs the filter sizes such that they mimic the finite difference between the neighboring points. By learning the governing equation, the network predicts the future evolution of the solution by using only a few trainable parameters. In this paper, we provide numerical results to compare the efficiency of the second-order Trust-Region Conjugate Gradient (TRCG) method with the first-order ADAM optimizer.


On the Global Convergence of (Fast) Incremental Expectation Maximization Methods

arXiv.org Machine Learning

The EM algorithm is one of the most popular algorithm for inference in latent data models. The original formulation of the EM algorithm does not scale to large data set, because the whole data set is required at each iteration of the algorithm. To alleviate this problem, Neal and Hinton have proposed an incremental version of the EM (iEM) in which at each iteration the conditional expectation of the latent data (E-step) is updated only for a mini-batch of observations. Another approach has been proposed by Capp\'e and Moulines in which the E-step is replaced by a stochastic approximation step, closely related to stochastic gradient. In this paper, we analyze incremental and stochastic version of the EM algorithm as well as the variance reduced-version of Chen et. al. in a common unifying framework. We also introduce a new version incremental version, inspired by the SAGA algorithm by Defazio et. al. We establish non-asymptotic convergence bounds for global convergence. Numerical applications are presented in this article to illustrate our findings.


Australia lags on AI, automation

#artificialintelligence

AI automation-Australian real estate companies lag their global counterparts in adopting productivity-boosting measures such as automation, saying the hurdles are greater and expressing more doubt that they will make a difference. While an estimated 50 per cent of all job tasks will be impacted by automation by 2030, Australian respondents to a global survey conducted this week said they were hiring fewer AI experts such as data scientists and less convinced it would benefit them than respondents from Europe, the Gulf States and the US. The sample space was limited – Australians accounted for just one-quarter of the 400 global respondents to an online poll conducted by consultancy EY and the Massachusetts Institute of Technology (MIT) Real Estate Innovation Lab – but the problem pointed to a lack of competitiveness, said Selena Scott, EY's global real estate and construction innovation leader. "Australian real estate companies are slightly behind their global counterparts when it comes to hiring AI and automation specialists," Ms Short said. "The fact that 23 per cent of Australian companies were unsure whether automation or AI would change their businesses is a worry."


Abbott Announces New Data That Shows Artificial Intelligence Technology Can Help Doctors Better Determine Which Patients are Having a Heart Attack - Sep 10, 2019

#artificialintelligence

Abbott (NYSE: ABT) announced today that new research, published in the journal Circulation, found its algorithm could help doctors in hospital emergency rooms more accurately determine if someone is having a heart attack or not, so that they can receive faster treatments or be safely discharged.1 In this study, researchers from the U.S., Germany, U.K., Switzerland, Australia and New Zealand looked at more than 11,000 patients to determine if Abbott's technology developed using artificial intelligence (AI) could provide a faster, more accurate determination that someone is having a heart attack or not. The study found that the algorithm provided doctors a more comprehensive analysis of the probability that a patient was having a heart attack or not, particularly for those who entered the hospital within the first three hours of when their symptoms started. "With machine learning technology, you can go from a one-size-fits-all approach for diagnosing heart attacks to an individualized and more precise risk assessment that looks at how all the variables interact at that moment in time," said Fred Apple, Ph.D., Hennepin HealthCare/ Hennepin County Medical Center, professor of Laboratory Medicine and Pathology at the University of Minnesota, and one of the study authors. "This could give doctors in the ER more personalized, timely and accurate information to determine if their patient is having a heart attack or not." A team of physicians and statisticians at Abbott developed the algorithm* using AI tools to analyze extensive data sets and identify the variables most predictive for determining a cardiac event, such as age, sex and a person's specific troponin levels (using a high sensitivity troponin-I blood test**) and blood sample timing.


Deep convolutional autoencoder for cryptocurrency market analysis

arXiv.org Machine Learning

This study attempts to analyze patterns in cryptocurrency markets using a special type of deep neural networks, namely a convolutional autoencoder. The method extracts the dominant features of market behavior and classifies the 40 studied cryptocurrencies into several classes for twelve 6-month periods starting from 15th May 2013. Transitions from one class to another with time are related to the maturement of cryptocurrencies. In speculative cryptocurrency markets, these findings have potential implications for investment and trading strategies. Introduction Cryptocurrencies have recently emerged as a digital alternative to traditional government-issued paper monies, secure electronic payment system, as well as financial and speculative assets.