Reviews: Learning ReLUs via Gradient Descent

Neural Information Processing Systems 

The analysis assumes the input samples are iid Gaussian and the output is realizable, i.e. the target value y_k for input x_k is constructed via y_k ReLU(w *.x_k). The paper studies a regression setting and uses least squares as the loss function.