Africa
Minecraft Earth is coming – it will change the way you see your town
Six of us are huddled together in Cavendish Square Gardens in central London, fighting a horde of warrior skeletons. To passersby, however, we must look like a bunch of adults pointing our smartphones at nothing while shouting about incoming monsters. What we're doing is playing a beta version of Minecraft Earth, an augmented reality (AR) spinoff from the multimillion-selling block-building game – and very soon, parks all over the world will be filled with people just like us. This month, Minecraft is launching an early-access version of the game in a select few territories around the world, ahead of a global roll-out. Microsoft has yet to reveal exactly when and where, but soon thousands of fans used to playing on their console, PC or tablet, are going to be taking their creations to the streets.
Truth or Backpropaganda? An Empirical Investigation of Deep Learning Theory
Goldblum, Micah, Geiping, Jonas, Schwarzschild, Avi, Moeller, Michael, Goldstein, Tom
A BSTRACT We empirically evaluate common assumptions about neural networks that are widely held by practitioners and theorists alike. We study the prevalence of local minima in loss landscapes, whether small-norm parameter vectors generalize better (and whether this explains the advantages of weight decay), whether wide-network theories (like the neural tangent kernel) describe the behaviors of classifiers, and whether the rank of weight matrices can be linked to generalization and robustness in real-world networks. In statistical learning, principled kernel methods have vastly improved the performance of SVMs and PCA (Suykens & V andewalle, 1999; Sch olkopf et al., 1997), and boosting theory has enabled weak learners to generate strong classifiers (Schapire, 1990). Optimizers in deep learning are borrowed from the field of convex optimization, where momentum optimizers (Nesterov, 1983) and conjugate gradient methods provably solve ill-conditioned problems with high efficiency (Hestenes & Stiefel, 1952). Deep learning harnesses foundational tools from these mature parent fields. Despite its rigorous roots, deep learning has driven a wedge between theory and practice. Recent theoretical work has certainly made impressive strides towards understanding optimization and generalization in neural networks. But doing so has required researchers to make strong assumptions and study restricted model classes. In this paper, we seek to understand whether deep learning theories accurately capture the behaviors and network properties that make realistic deep networks work. Following a line of previous work, such as Swirszcz et al. (2016), Zhang et al. (2016), Balduzzi et al. (2017) and Santurkar et al. (2018), we put the assumptions and conclusions of deep learning theory to the test using experiments with both toy networks and realistic ones. We focus on the following important theoretical issues: - Local minima: Numerous theoretical works argue that all local minima of neural loss functions are globally optimal or that all local minima are nearly optimal. In practice, we find Authors contributed equally. 1 arXiv:1910.00359v1 Y et for neural networks, it is not at all clear which form of null 2-regularization is optimal.
The asymptotic spectrum of the Hessian of DNN throughout training
Jacot, Arthur, Gabriel, Franck, Hongler, Clément
The dynamics of DNNs during gradient descent is described by the so-called Neural Tangent Kernel (NTK). In this article, we show that the NTK allows one to gain precise insight into the Hessian of the cost of DNNs: we obtain a full characterization of the asymptotics of the spectrum of the Hessian, at initialization and during training.
Distance-Based Approaches to Repair Semantics in Ontology-based Data Access
Prouté, César, Yun, Bruno, Croitoru, Madalina
In the presence of inconsistencies, repair techniques thrive to restore consistency by reasoning with several repairs. However, since the number of repairs can be large, standard inconsistent tolerant semantics usually yield few answers. In this paper, we use the notion of syntactic distance between repairs following the intuition that it can allow us to cluster some repairs "close" to each other. In this way, we propose a generic framework to answer queries in a more personalise fashion.
Armed with artificial intelligence, scientists take on climate change
Science needs to understand and predict how climate change--and the growing onslaught of hurricanes, fires, and floods it's bringing--affects tropical forests. Will the forests respond to the assault with shorter trees? Will they store less carbon, or support less tree and plant diversity and fewer wildlife species? To better understand the effects a changing climate will have on tropical forests, Maria Uriarte, Columbia University professor of ecology, evolution, and environmental biology, needs to analyze images of forests. These bird's-eye view images are the size of a postage stamp.
A New Framework for Distance and Kernel-based Metrics in High Dimensions
Chakraborty, Shubhadeep, Zhang, Xianyang
The paper presents new metrics to quantify and test for (i) the equality of distributions and (ii) the independence between two high-dimensional random vectors. We show that the energy distance based on the usual Euclidean distance cannot completely characterize the homogeneity of two high-dimensional distributions in the sense that it only detects the equality of means and the traces of covariance matrices in the high-dimensional setup. We propose a new class of metrics which inherits the desirable properties of the energy distance and maximum mean discrepancy/(generalized) distance covariance and the Hilbert-Schmidt Independence Criterion in the low-dimensional setting and is capable of detecting the homogeneity of/completely characterizing independence between the low-dimensional marginal distributions in the high dimensional setup. We further propose t-tests based on the new metrics to perform high-dimensional two-sample testing/independence testing and study their asymptotic behavior under both high dimension low sample size (HDLSS) and high dimension medium sample size (HDMSS) setups. The computational complexity of the t-tests only grows linearly with the dimension and thus is scalable to very high dimensional data. We demonstrate the superior power behavior of the proposed tests for homogeneity of distributions and independence via both simulated and real datasets.
Non-Gaussian processes and neural networks at finite widths
Gaussian processes are ubiquitous in nature and engineering. A case in point is a class of neural networks in the infinite-width limit, whose priors correspond to Gaussian processes. Here we perturbatively extend this correspondence to finite-width neural networks, yielding non-Gaussian processes as priors. The methodology developed herein allows us to track the flow of preactivation distributions by progressively integrating out random variables from lower to higher layers, reminiscent of renormalization-group flow. We further develop a perturbative procedure to perform Bayesian inference with weakly non-Gaussian priors.
Natural representation of composite data with replicated autoencoders
Negri, Matteo, Bergamini, Davide, Baldassi, Carlo, Zecchina, Riccardo, Feinauer, Christoph
Generative processes in biology and other fields often produce data that can be regarded as resulting from a composition of basic features. Here we present an unsupervised method based on autoencoders for inferring these basic features of data. The main novelty in our approach is that the training is based on the optimization of the `local entropy' rather than the standard loss, resulting in a more robust inference, and enhancing the performance on this type of data considerably. Algorithmically, this is realized by training an interacting system of replicated autoencoders. We apply this method to synthetic and protein sequence data, and show that it is able to infer a hidden representation that correlates well with the underlying generative process, without requiring any prior knowledge.
A New Covariance Estimator for Sufficient Dimension Reduction in High-Dimensional and Undersized Sample Problems
Olorede, Kabir Opeyemi, Yahya, Waheed Babatunde
The application of standard sufficient dimension reduction methods for reducing the dimension space of predictors without losing regression information requires inverting the covariance matrix of the predictors. This has posed a number of challenges especially when analyzing high-dimensional data sets in which the number of predictors $\mathit{p}$ is much larger than number of samples $n,~(n\ll p)$. A new covariance estimator, called the \textit{Maximum Entropy Covariance} (MEC) that addresses loss of covariance information when similar covariance matrices are linearly combined using \textit{Maximum Entropy} (ME) principle is proposed in this work. By benefitting naturally from slicing or discretizing range of the response variable, y into \textit{H} non-overlapping categories, $\mathit{h_{1},\ldots ,h_{H}}$, MEC first combines covariance matrices arising from samples in each y slice $\mathit{h\in H}$ and then select the one that maximizes entropy under the principle of maximum uncertainty. The MEC estimator is then formed from convex mixture of such entropy-maximizing sample covariance $S_{\mbox{mec}}$ estimate and pooled sample covariance $\mathbf{S}_{\mathit{p}}$ estimate across the $\mathit{H}$ slices without requiring time-consuming covariance optimization procedures. MEC deals directly with singularity and instability of sample group covariance estimate in both regression and classification problems. The efficiency of the MEC estimator is studied with the existing sufficient dimension reduction methods such as \textit{Sliced Inverse Regression} (SIR) and \textit{Sliced Average Variance Estimator} (SAVE) as demonstrated on both classification and regression problems using real life Leukemia cancer data and customers' electricity load profiles from smart meter data sets respectively.
Impact of Low-bitwidth Quantization on the Adversarial Robustness for Embedded Neural Networks
Bernhard, Rémi, Moellic, Pierre-Alain, Dutertre, Jean-Max
As the will to deploy neural networks models on embedded systems grows, and considering the related memory footprint and energy consumption issues, finding lighter solutions to store neural networks such as weight quantization and more efficient inference methods become major research topics. Parallel to that, adversarial machine learning has risen recently with an impressive and significant attention, unveiling some critical flaws of machine learning models, especially neural networks. In particular, perturbed inputs called adversarial examples have been shown to fool a model into making incorrect predictions. In this article, we investigate the adversarial robustness of quantized neural networks under different threat models for a classical supervised image classification task. We show that quantization does not offer any robust protection, results in severe form of gradient masking and advance some hypotheses to explain it. However, we experimentally observe poor transferability capacities which we explain by quantization value shift phenomenon and gradient misalignment and explore how these results can be exploited with an ensemble-based defense.