Goto

Collaborating Authors

 simple neural network


Figure 1 Projecting 50 dimensional obtained by training a simple neural network without SSE Left and

Neural Information Processing Systems

We thank the reviewers for their insightful feedback. In the following, we address their concerns and questions. It is indeed a great suggestion to examine concrete examples beyond the quantitative evaluation to get an intuition. That is likely due to the use of item graph. As shown in Theorem 1, SSE can'smooth' the Rademacher SSE-SE and perhaps we can further study how this is related to dropout in theory.



Review for NeurIPS paper: Complex Dynamics in Simple Neural Networks: Understanding Gradient Flow in Phase Retrieval

Neural Information Processing Systems

Weaknesses: One of the main assumptions used in characterizing the "threshold states" at which the gradient flow dynamics appear to get trapped is that the Hessian is positive semidefinite. On the other hand, in figure 2 as the training loss crosses the threshold energy the minimal eigenvalues of the Hessian appear clearly negative, unless I am misunderstanding the figure. The authors do not appear to address this point. Could the soundness of this assumption also account for the inaccuracy of the computed value of alpha at which the phase transition occurs which can be seen in figure 4? The main result of the paper - the computation of the relative sample size alpha at which the phase transition occurs, doesn't seem to be very accurate when compared to experiments in figure 3 and 5. It would have also been helpful to plot this value in the figure in order to make this point clear. The discrepancy could be a result of finite size effects as the authors claim, but could also be a result of say the assumption made about the Hessian at the threshold states or the accuracy of the 1RSB ansatz.


Review for NeurIPS paper: Complex Dynamics in Simple Neural Networks: Understanding Gradient Flow in Phase Retrieval

Neural Information Processing Systems

The paper makes interesting contributions towards understanding non-convex optimization by studying a problem that is simple enough to allow for analytical calculations. Overall, there is a decent, well-supported agreement between theory and experiment (in particular, between the leading moments of the distribution of the threshold states as evaluated empirically and the computed moments). This paper is a valuable contribution to NeurIPS and should be accepted. Overall, however, we recommend various lines along which the paper could improve further to reach a wider audience, and we recommend that the authors revisit the author feedback before they submit their final version. First, the paper presentation is somewhat unusually difficult to follow from the perspective of the machine learning audience and could be improved by providing more background on known results that were used in the paper (e.g., the BPP transition or replica theory), if necessary in the appendix.


Complex Dynamics in Simple Neural Networks: Understanding Gradient Flow in Phase Retrieval

Neural Information Processing Systems

Despite the widespread use of gradient-based algorithms for optimising high-dimensional non-convex functions, understanding their ability of finding good minima instead of being trapped in spurious ones remains to a large extent an open problem. Here we focus on gradient flow dynamics for phase retrieval from random measurements. When the ratio of the number of measurements over the input dimension is small the dynamics remains trapped in spurious minima with large basins of attraction. We find analytically that above a critical ratio those critical points become unstable developing a negative direction toward the signal. By numerical experiments we show that in this regime the gradient flow algorithm is not trapped; it drifts away from the spurious critical points along the unstable direction and succeeds in finding the global minimum.


Automatic Change-Point Detection in Time Series via Deep Learning

Li, Jie, Fearnhead, Paul, Fryzlewicz, Piotr, Wang, Tengyao

arXiv.org Machine Learning

Detecting change-points in data is challenging because of the range of possible types of change and types of behaviour of data when there is no change. Statistically efficient methods for detecting a change will depend on both of these features, and it can be difficult for a practitioner to develop an appropriate detection method for their application of interest. We show how to automatically generate new offline detection methods based on training a neural network. Our approach is motivated by many existing tests for the presence of a change-point being representable by a simple neural network, and thus a neural network trained with sufficient data should have performance at least as good as these methods. We present theory that quantifies the error rate for such an approach, and how it depends on the amount of training data. Empirical results show that, even with limited training data, its performance is competitive with the standard CUSUM-based classifier for detecting a change in mean when the noise is independent and Gaussian, and can substantially outperform it in the presence of auto-correlated or heavy-tailed noise. Our method also shows strong results in detecting and localising changes in activity based on accelerometer data.


The deep learning project which led me to burnout

#artificialintelligence

In this article, I will present you the deep learning project that I wanted to perform, then I'll present the techniques and approach that I used to tacle this. And I will end up that article with some meaningful reflections, that I hope would help some of you. I wanted to build a smartphone app which can recognize flower from taken picture. Basically the app is splitted into two parts, the front-end part which is basically the mobile development. I wanted to build from scratch a deep learning model without deep learning framework, to help me understand the inner working process of image classification (I know it sounds crazy).


Neural Networks. Article explaining each and everything…

#artificialintelligence

A neuron is a tiny but important cell in our brains that helps us think, feel, and move. It's like a little switch that turns on and off in response to signals from other neurons, allowing us to process and analyze information. Neurons work together to create the complex networks that make up our brains. When a neuron receives a signal from another neuron, it sends an electrical impulse down its long, thin tail-like structure called an axon. At the end of the axon, there are tiny structures called synapses that release chemicals called neurotransmitters.


Simple Neural Networks Can Precisely Control Robotic Prostheses

#artificialintelligence

Artificial neural networks that are inspired by natural nerve circuits in the human body give primates faster, more accurate control of brain-controlled prosthetic hands and fingers, researchers at the University of Michigan have shown. The finding could lead to more natural control over advanced prostheses for those dealing with the loss of a limb or paralysis. The team of engineers and doctors found that a feed-forward neural network improved peak finger velocity by 45% during control of robotic fingers when compared to traditional algorithms not using neural networks. This overturned an assumption that more complex neural networks, like those used in other fields of machine learning, would be needed to achieve this level of performance improvement. "This feed-forward network represents an older, simpler architecture--with information moving only in one direction, from input to output," said Cindy Chestek, Ph.D., an associate professor of biomedical engineering at U-M and corresponding author of the paper in Nature Communications.


What are parametric and Non-Parametric Machine Learning Models?

#artificialintelligence

Machine Learning algorithms are basically mathematical functions that try to find a relationship between input and output variables. If we have tabular data with columns'Experience' (input) and'Salary'(target), We are trying to find a relationship between input and target. As experience changes, salary also changes. The function y f(x) tries to find the relationship between the input feature x and the target y. But sometimes we may know or may not know the nature of the function.