Goto

Collaborating Authors

 Oceania


Optimal Resampling for Learning Small Models

arXiv.org Machine Learning

Models often need to be constrained to a certain size for them to be considered interpretable, for e.g., a decision tree of depth 5 is much easier to make sense of than one of depth 30. This suggests a trade-off between interpretability and accuracy. Our work tries to minimize this trade-off by suggesting the optimal distribution of the data to learn from, that surprisingly, may be different from the original distribution. We use an Infinite Beta Mixture Model (IBMM) to represent a specific set of sampling schemes. The parameters of the IBMM are learned using a Bayesian Optimizer (BO). While even under simplistic assumptions a distribution in the original $d$-dimensional space would need to optimize for $O(d)$ variables - cumbersome for most real-world data - our technique lowers this number significantly to a fixed set of 8 variables at the cost of some additional preprocessing. The proposed technique is \emph{model-agnostic}; it can be applied to any classifier. It also admits a general notion of model size. We demonstrate its effectiveness using multiple real-world datasets to construct decision trees, linear probability models and gradient boosted models.


Learning to Denoise Distantly-Labeled Data for Entity Typing

arXiv.org Artificial Intelligence

Distantly-labeled data can be used to scale up training of statistical models, but it is typically noisy and that noise can vary with the distant labeling technique. In this work, we propose a two-stage procedure for handling this type of data: denoise it with a learned model, then train our final model on clean and denoised distant data with standard supervised training. Our denoising approach consists of two parts. First, a filtering function discards examples from the distantly labeled data that are wholly unusable. Second, a relabeling function repairs noisy labels for the retained examples. Each of these components is a model trained on synthetically-noised examples generated from a small manually-labeled set. We investigate this approach on the ultra-fine entity typing task of Choi et al. (2018). Our baseline model is an extension of their model with pre-trained ELMo representations, which already achieves state-of-the-art performance. Adding distant data that has been denoised with our learned models gives further performance gains over this base model, outperforming models trained on raw distant data or heuristically-denoised distant data.


Best Practices for Preparing and Augmenting Image Data for Convolutional Neural Networks

#artificialintelligence

It is challenging to know how to best prepare image data when training a convolutional neural network. This involves both scaling the pixel values and use of augmentation techniques during both the training and evaluation of the model. Instead of testing a wide range of options, a useful shortcut is to consider the types of data preparation, train-time augmentation, and test-time augmentation used by state-of-the-art models that notably achieve the best performance on a challenging computer vision dataset, namely the Large Scale Visual Recognition Challenge, or ILSVRC, that uses the ImageNet dataset. In this tutorial, you will discover best practices for preparing and augmenting photographs for image classification tasks with convolutional neural networks. Best Practices for Preparing and Augmenting Image Data for Convolutional Neural Networks Photo by Mark in New Zealand, some rights reserved.


Here are the 7 requirements for building ethical AI, according to the EU commission

#artificialintelligence

In October, Amazon had to discontinue an artificial intelligenceโ€“powered recruiting tool after it discovered the system was biased against female applicants. In 2016, a ProPublica investigation revealed a recidivism assessment tool that used machine learning was biased against black defendants. More recently, the US Department of Housing and Urban Development sued Facebook because its ad-serving algorithms enabled advertisers to discriminate based on characteristics like gender and race. And Google refrained from renewing its AI contract with the Department of Defense after employees raised ethical concerns. Those are just a few of the many ethical controversies surrounding artificial intelligence algorithms in the past few years.


Formal Specification and Verification of Autonomous Robotic Systems: A Survey

arXiv.org Artificial Intelligence

An autonomous system is an artificially intelligent entity that makes decisions in response to input, independent of human interaction. Robotic systems are physical entities that interact with the physical world. Thus, we consider an autonomous robotic system as a machine that uses Artificial Intelligence (AI), has a physical presence in and interacts with the real world. They are complex, inherently hybrid, systems, combining both hardware and software; they often require close safety, legal, and ethical consideration. Autonomous robotics are increasingly being used in commonplace-scenarios, such as driverless cars [68], pilotless aircraft [176], and domestic assistants [174, 60]. While for many engineered systems, testing, either through real deployment or via simulation, is deemed sufficient; the unique challenges of autonomous robotics, their dependence on sophisticated software control and decision-making, and their increasing deployment in safety-critical scenarios, require a stronger form of verification. This leads us towards using formal methods, which are mathematically-based techniques for the specification and verification of software systems, to ensure the correctness of, and provide sufficient evidence for the certification of, robotic systems. We contribute an overview and analysis of the state-of-the-art in formal specification and verification of autonomous robotics.


Automatic Emotion Recognition (AER) System based on Two-Level Ensemble of Lightweight Deep CNN Models

arXiv.org Machine Learning

Emotions play a crucial role in human interaction, health care and security investigations and monitoring. Automatic emotion recognition (AER) using electroencephalogram (EEG) signals is an effective method for decoding the real emotions, which are independent of body gestures, but it is a challenging problem. Several automatic emotion recognition systems have been proposed, which are based on traditional hand-engineered approaches and their performances are very poor. Motivated by the outstanding performance of deep learning (DL) in many recognition tasks, we introduce an AER system (Deep-AER) based on EEG brain signals using DL. A DL model involves a large number of learnable parameters, and its training needs a large dataset of EEG signals, which is difficult to acquire for AER problem. To overcome this problem, we proposed a lightweight pyramidal one-dimensional convolutional neural network (LP-1D-CNN) model, which involves a small number of learnable parameters. Using LP-1D-CNN, we build a two level ensemble model. In the first level of the ensemble, each channel is scanned incrementally by LP-1D-CNN to generate predictions, which are fused using majority vote. The second level of the ensemble combines the predictions of all channels of an EEG signal using majority vote for detecting the emotion state. We validated the effectiveness and robustness of Deep-AER using DEAP, a benchmark dataset for emotion recognition research. The results indicate that FRONT plays dominant role in AER and over this region, Deep-AER achieved the accuracies of 98.43% and 97.65% for two AER problems, i.e., high valence vs low valence (HV vs LV) and high arousal vs low arousal (HA vs LA), respectively. The comparison reveals that Deep-AER outperforms the state-of-the-art systems with large margin. The Deep-AER system will be helpful in monitoring for health care and security investigations.


RadiX-Net: Structured Sparse Matrices for Deep Neural Networks

arXiv.org Machine Learning

The sizes of deep neural networks (DNNs) are rapidly outgrowing the capacity of hardware to store and train them. Research over the past few decades has explored the prospect of sparsifying DNNs before, during, and after training by pruning edges from the underlying topology. The resulting neural network is known as a sparse neural network. More recent work has demonstrated the remarkable result that certain sparse DNNs can train to the same precision as dense DNNs at lower runtime and storage cost. An intriguing class of these sparse DNNs is the X-Nets, which are initialized and trained upon a sparse topology with neither reference to a parent dense DNN nor subsequent pruning. We present an algorithm that deterministically generates RadiX-Nets: sparse DNN topologies that, as a whole, are much more diverse than X-Net topologies, while preserving X-Nets' desired characteristics. We further present a functional-analytic conjecture based on the longstanding observation that sparse neural network topologies can attain the same expressive power as dense counterparts


Constraint-Aware Neural Networks for Riemann Problems

arXiv.org Machine Learning

Neural networks are increasingly used in complex (data-driven) simulations as surrogates or for accelerating the computation of classical surrogates. In many applications physical constraints, such as mass or energy conservation, must be satisfied to obtain reliable results. However, standard machine learning algorithms are generally not tailored to respect such constraints. We propose two different strategies to generate constraint-aware neural networks. We test their performance in the context of front-capturing schemes for strongly nonlinear wave motion in compressible fluid flow. Precisely, in this context so-called Riemann problems have to be solved as surrogates. Their solution describes the local dynamics of the captured wave front in numerical simulations. Three model problems are considered: a cubic flux model problem, an isothermal two-phase flow model, and the Euler equations. We demonstrate that a decrease in the constraint deviation correlates with low discretization errors for all model problems, in addition to the structural advantage of fulfilling the constraint.


Improving Image-Based Localization with Deep Learning: The Impact of the Loss Function

arXiv.org Artificial Intelligence

This work formulates a novel loss term which can be appended to an RGB only image localization network's loss function to improve its performance. A common technique used when regressing a camera's pose from an image is to formulate the loss as a linear combination of positional and rotational error (using tuned hyperparameters as coefficients). In this work we observe that changes to rotation and position mutually affect the captured image, and in order to improve performance, a network's loss function should include a term which combines error in both position and rotation. To that end we design a geometric loss term which considers the similarity between the predicted and ground truth poses using both position and rotation, and use it to augment the existing image localization network PoseNet. The loss term is simply appended to the loss function of the already existing image localization network. We achieve improvements in the localization accuracy of the network for indoor scenes: with decreases of up to 9.64% and 2.99% in the median positional and rotational error when compared to similar pipelines.


SWALP : Stochastic Weight Averaging in Low-Precision Training

arXiv.org Artificial Intelligence

Low precision operations can provide scalability, memory savings, portability, and energy efficiency. This paper proposes SWALP, an approach to low precision training that averages low-precision SGD iterates with a modified learning rate schedule. SWALP is easy to implement and can match the performance of full-precision SGD even with all numbers quantized down to 8 bits, including the gradient accumulators. Additionally, we show that SWALP converges arbitrarily close to the optimal solution for quadratic objectives, and to a noise ball asymptotically smaller than low precision SGD in strongly convex settings.