Goto

Collaborating Authors

 Deep Learning


Unified Convergence Analysis of Stochastic Momentum Methods for Convex and Non-convex Optimization

arXiv.org Machine Learning

Recently, {\it stochastic momentum} methods have been widely adopted in training deep neural networks. However, their convergence analysis is still underexplored at the moment, in particular for non-convex optimization. This paper fills the gap between practice and theory by developing a basic convergence analysis of two stochastic momentum methods, namely stochastic heavy-ball method and the stochastic variant of Nesterov's accelerated gradient method. We hope that the basic convergence results developed in this paper can serve the reference to the convergence of stochastic momentum methods and also serve the baselines for comparison in future development of stochastic momentum methods. The novelty of convergence analysis presented in this paper is a unified framework, revealing more insights about the similarities and differences between different stochastic momentum methods and stochastic gradient method. The unified framework exhibits a continuous change from the gradient method to Nesterov's accelerated gradient method and finally the heavy-ball method incurred by a free parameter, which can help explain a similar change observed in the testing error convergence behavior for deep learning. Furthermore, our empirical results for optimizing deep neural networks demonstrate that the stochastic variant of Nesterov's accelerated gradient method achieves a good tradeoff (between speed of convergence in training error and robustness of convergence in testing error) among the three stochastic methods.


Classical Statistics and Statistical Learning in Imaging Neuroscience

arXiv.org Machine Learning

Neuroimaging research has predominantly drawn conclusions based on classical statistics, including null-hypothesis testing, t-tests, and ANOVA. Throughout recent years, statistical learning methods enjoy increasing popularity, including cross-validation, pattern classification, and sparsity-inducing regression. These two methodological families used for neuroimaging data analysis can be viewed as two extremes of a continuum. Yet, they originated from different historical contexts, build on different theories, rest on different assumptions, evaluate different outcome metrics, and permit different conclusions. This paper portrays commonalities and differences between classical statistics and statistical learning with their relation to neuroimaging research. The conceptual implications are illustrated in three common analysis scenarios. It is thus tried to resolve possible confusion between classical hypothesis testing and data-guided model estimation by discussing their ramifications for the neuroimaging access to neurobiology.


IoT and Machine Learning Experts Gather in Boston

#artificialintelligence

RE•WORK will host it's annual East Coast events on Deep Learning and the Internet of Things in Boston on 12 & 13 May. Over 300 machine learning and IoT enthusiasts and experts will come together to hear keynote presentations, panel discussions, fireside chats and to explore the startup showcase area. The Deep Learning Summit brings together leaders from industry, academia and startups to explore advances in deep learning methods and techniques, as well as their business applications in areas including finance, manufacturing, healthcare & transportation. The Connected Home Summit is the fifth installment in RE•WORK's Internet of Things series, following the Connected City Summit held in London earlier in 2016 and previous IoT Summits in San Francisco, London and Boston as well as dinners and meetups in 2015. The Summits are a unique opportunity to meet and interact with CTOs, founders, data scientists, engineers, designers and industry experts leading the connected home and deep learning revolutions.


Cadence DSP Targets Neural Network Development EE Times

#artificialintelligence

SAN FRANCSICO--Neural networks--artificial intelligence processing systems inspired by the human brain--are a hot topic in technology, as large companies like Facebook, Google and Microsoft are developing them and putting them into use. Most neural network technology in place today runs on graphics processing units (GPUs) from Nvidia Corp. and others. EDA and intellectual property vendor Cadence Design Systems Inc. stepped into the fray on on Monday (May 2), rolling out a new version of its Tensilica Vision processing core optimized specifically for vision/deep learning applications. "Everybody is spending a lot of time developing a lot of research and producing a lot of technology," said Pulin Desai, director of product marketing for Cadence's Imaging/Vision Group, in an interview with EE Times. "The market is very hot. Maybe it's hot because everything is being run on GPUs."


Qualcomm announces new deep learning SDK with support for Snapdragon 820, heterogeneous compute ExtremeTech

#artificialintelligence

The answers to these questions determines how you respond to the situation. If there are people moving in and out of the house and loud music playing, it's probably a party. If no one is visible and the house is dark, you might be witnessing a break-in -- or someone may simply have forgotten to latch the door properly. We assign "weights" to these probabilities and evaluate the situation accordingly -- and we do it unconsciously and at extraordinary speed compared with a conventional computer. Conventional neural networks try to duplicate this process.


New 'deep learning' technique enables robot mastery of skills via trial and error

#artificialintelligence

New'deep learning' technique enables robot mastery of skills via trial and error. UC Berkeley researchers have developed algorithms that enable robots to learn motor tasks through trial and error using a process that more closely approximates the way humans learn, marking a major milestone in the field of artificial intelligence. They demonstrated their technique, a type of reinforcement learning, by having a robot complete various tasks -- putting a clothes hanger on a rack, assembling a toy plane, screwing a cap on a water bottle, and more -- without pre-programmed details about its surroundings. "What we're reporting on here is a new approach to empowering a robot to learn," said Professor Pieter Abbeel of UC Berkeley's Department of Electrical Engineering and Computer Sciences. "The key is that when a robot is faced with something new, we won't have to reprogram it. The exact same software, which encodes how the robot can learn, was used to allow the robot to learn all the different tasks we gave it."


An evaluation of randomized machine learning methods for redundant data: Predicting short and medium-term suicide risk from administrative records and risk assessments

arXiv.org Machine Learning

Accurate prediction of suicide risk in mental health patients remains an open problem. Existing methods including clinician judgments have acceptable sensitivity, but yield many false positives. Exploiting administrative data has a great potential, but the data has high dimensionality and redundancies in the recording processes. We investigate the efficacy of three most effective randomized machine learning techniques - random forests, gradient boosting machines, and deep neural nets with dropout - in predicting suicide risk. Using a cohort of mental health patients from a regional Australian hospital, we compare the predictive performance with popular traditional approaches - clinician judgments based on a checklist, sparse logistic regression and decision trees. The randomized methods demonstrated robustness against data redundancies and superior predictive performance on AUC and F-measure. Keywords: Suicide risk, Electronic medical record, Predictive models, Randomized machine learning, Deep learning 1. Introduction Every year, about 2000 Australians die by suicide causing huge trauma to families, friends, workplaces and communities[1].


Optimizing Neural Networks with Kronecker-factored Approximate Curvature

arXiv.org Machine Learning

We propose an efficient method for approximating natural gradient descent in neural networks which we call Kronecker-Factored Approximate Curvature (K-FAC). K-FAC is based on an efficiently invertible approximation of a neural network's Fisher information matrix which is neither diagonal nor low-rank, and in some cases is completely non-sparse. It is derived by approximating various large blocks of the Fisher (corresponding to entire layers) as being the Kronecker product of two much smaller matrices. While only several times more expensive to compute than the plain stochastic gradient, the updates produced by K-FAC make much more progress optimizing the objective, which results in an algorithm that can be much faster than stochastic gradient descent with momentum in practice. And unlike some previously proposed approximate natural-gradient/Newton methods which use high-quality non-diagonal curvature matrices (such as Hessian-free optimization), K-FAC works very well in highly stochastic optimization regimes. This is because the cost of storing and inverting K-FAC's approximation to the curvature matrix does not depend on the amount of data used to estimate it, which is a feature typically associated only with diagonal or low-rank approximations to the curvature matrix.


Qualcomm's deep learning SDK will mean more AI on your smartphone – Digital Media Wire

#artificialintelligence

The Verge reports "The benefits of machine learning continue to trickle down to smartphones and gadgets, and chipmaker Qualcomm wants to help speed up the process. The company is launching a new software development kit for its "machine intelligence platform" Zeroth. This SDK will make it easier for companies to run deep learning programs directly on devices like smartphones and drones -- if they're powered by one of Qualcomm's chips, of course."


Scikit Flow: Easy Deep Learning with TensorFlow and Scikit-learn

#artificialintelligence

Google's TensorFlow has been publicly available since November, 2015, and there is no disputing that, in a few short months, it has made an impact on machine learning in general, and on deep learning specifically. There is evidence of widespread acceptance via blog posts, academic papers, and tutorials all over the web. It is, of course, difficult to estimate true adoption rates, but TensorFlow's Github repository has nearly twice the number of stars of both the next most-starred machine learning project, Scikit-learn, and closest deep learning project, Berkeley Vision and Learning Center's Caffe. While not concretely indicative of TensorFlow having become the leader in the space, it is fairly easy to surmise that, given its fairly recent release, there has been considerable interest in, and use of, Google's deep learning library. For the most part, TensorFlow is relatively straightforward to use, and neural network afficianados without experience using the library could look at a given network's code and get an intuititive sense of what is going on.