to

### Learning Sparse Neural Networks through $L_0$ Regularization

We propose a practical method for $L_0$ norm regularization for neural networks: pruning the network during training by encouraging weights to become exactly zero. Such regularization is interesting since (1) it can greatly speed up training and inference, and (2) it can improve generalization. AIC and BIC, well-known model selection criteria, are special cases of $L_0$ regularization. However, since the $L_0$ norm of weights is non-differentiable, we cannot incorporate it directly as a regularization term in the objective function. We propose a solution through the inclusion of a collection of non-negative stochastic gates, which collectively determine which weights to set to zero. We show that, somewhat surprisingly, for certain distributions over the gates, the expected $L_0$ norm of the resulting gated weights is differentiable with respect to the distribution parameters. We further propose the \emph{hard concrete} distribution for the gates, which is obtained by "stretching" a binary concrete distribution and then transforming its samples with a hard-sigmoid. The parameters of the distribution over the gates can then be jointly optimized with the original network parameters. As a result our method allows for straightforward and efficient learning of model structures with stochastic gradient descent and allows for conditional computation in a principled way. We perform various experiments to demonstrate the effectiveness of the resulting approach and regularizer.

### The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables

The reparameterization trick enables optimizing large scale stochastic computation graphs via gradient descent. The essence of the trick is to refactor each stochastic node into a differentiable function of its parameters and a random variable with fixed distribution. After refactoring, the gradients of the loss propagated by the chain rule through the graph are low variance unbiased estimators of the gradients of the expected loss. While many continuous random variables have such reparameterizations, discrete random variables lack useful reparameterizations due to the discontinuous nature of discrete states. In this work we introduce Concrete random variables---continuous relaxations of discrete random variables. The Concrete distribution is a new family of distributions with closed form densities and a simple reparameterization. Whenever a discrete stochastic node of a computation graph can be refactored into a one-hot bit representation that is treated continuously, Concrete stochastic nodes can be used with automatic differentiation to produce low-variance biased gradients of objectives (including objectives that depend on the log-probability of latent stochastic nodes) on the corresponding discrete graph. We demonstrate the effectiveness of Concrete relaxations on density estimation and structured prediction tasks using neural networks.

### Concrete Truck Driver Charged in Fatal La Vista Crash

Investigators say Holloway was driving east on Giles Road in La Vista when he made a sharp right turn. That called the loaded concrete truck to tip and land on a northbound car stopped at a traffic light, killing driver Michael Dearden and passenger Phillip Hertel, both 23.

### Concrete Dream: Some Envision New I-14 Route Through South

The goal is to draw attention to the proposed new interstate and the belief by its backers that it would help reduce poverty, create businesses and improve the lives of residents in communities along its route, the Columbus newspaper reported.

### City Considers Bridge Sound After Falling Concrete Death

The St. Louis Post-Dispatch reports that 58-year-old Janet Torrisi-Mokwa died Monday when another woman drove into the bridge, causing the concrete to fall. City operations director Todd Waelterman says 57-year-old bridge wasn't compromised and has been reopened.