Goto

Collaborating Authors

 Inductive Learning


Measuring Compositionality in Representation Learning

arXiv.org Machine Learning

Many machine learning algorithms represent input data with vector embeddings or discrete codes. When inputs exhibit compositional structure (e.g. objects built from parts or procedures from subroutines), it is natural to ask whether this compositional structure is reflected in the the inputs' learned representations. While the assessment of compositionality in languages has received significant attention in linguistics and adjacent fields, the machine learning literature lacks general-purpose tools for producing graded measurements of compositional structure in more general (e.g. vector-valued) representation spaces. We describe a procedure for evaluating compositionality by measuring how well the true representation-producing model can be approximated by a model that explicitly composes a collection of inferred representational primitives. We use the procedure to provide formal and empirical characterizations of compositional structure in a variety of settings, exploring the relationship between compositionality and learning dynamics, human judgments, representational similarity, and generalization.


On the consistency of supervised learning with missing values

arXiv.org Machine Learning

In many application settings, the data are plagued with missing features. These hinder data analysis. An abundant literature addresses missing values in an inferential framework, where the aim is to estimate parameters and their variance from incomplete tables. Here, we consider supervised-learning settings where the objective is to best predict a target when missing values appear in both training and test sets. We analyze which missing-values strategies lead to good prediction. We show the consistency of two approaches to estimating the prediction function. The most striking one shows that the widely-used mean imputation prior to learning method is consistent when missing values are not informative. This is in contrast with inferential settings as mean imputation is known to have serious drawbacks in terms of deformation of the joint and marginal distribution of the data. That such a simple approach can be consistent has important consequences in practice. This result holds asymptotically when the learning algorithm is consistent in itself. We contribute additional analysis on decision trees as they can naturally tackle empirical risk minimization with missing values. This is due to their ability to handle the half-discrete nature of variables with missing values. After comparing theoretically and empirically different missing-values strategies in trees, we recommend using the missing incorporated in attributes method as it can handle both non-informative and informative missing values.


Detecting and Diagnosing Incipient Building Faults Using Uncertainty Information from Deep Neural Networks

arXiv.org Machine Learning

Abstract--Early detection of incipient faults is of vital importance toreducing maintenance costs, saving energy, and enhancing occupant comfort in buildings. Popular supervised learning models such as deep neural networks are considered promising due to their ability to directly learn from labeled fault data; however, it is known that the performance of supervised learning approaches highly relies on the availability and quality of labeled training data. In Fault Detection and Diagnosis (FDD) applications, the lack of labeled incipient fault data has posed a major challenge to applying these supervised learning techniques to commercial buildings. To overcome this challenge, this paper proposes using Monte Carlo dropout (MCdropout) to enhance the supervised learning pipeline, so that the resulting neural network is able to detect and diagnose unseen incipient fault examples. We also examine the proposed MCdropout method on the RP-1043 dataset to demonstrate its effectiveness in indicating the most likely incipient fault types. I. INTRODUCTION Building faults whose impact are less perceivable and/or hinder regular operations are called soft faults [21], [32]. These soft faults, especially in their incipient phase, are hard to detect as their signatures are not generally obvious (due to their magnitudes) and are lurking under measurement/system noise or feedback control actions [10], [27]. Nevertheless, they will impact energy consumption, system performance, and maintenance costs adversely in the long-run if left undetected and unattended [14].


On Reinforcement Learning Using Monte Carlo Tree Search with Supervised Learning: Non-Asymptotic Analysis

arXiv.org Machine Learning

Inspired by the success of AlphaGo Zero (AGZ) which utilizes Monte Carlo Tree Search (MCTS) with Supervised Learning via Neural Network to learn the optimal policy and value function, in this work, we focus on establishing formally that such an approach indeed finds optimal policy asymptotically, as well as establishing non-asymptotic guarantees in the process. We shall focus on infinite-horizon discounted Markov Decision Process to establish the results. To start with, it requires establishing the MCTS's claimed property in the literature that for any given query state, MCTS provides approximate value function for the state with enough simulation steps of MDP. We provide non-asymptotic analysis establishing this property by analyzing a non-stationary multi-arm bandit setup. Our proof suggests that MCTS needs to be utilized with polynomial rather than logarithmic "upper confidence bound" for establishing its desired performance -- interestingly enough, AGZ chooses such polynomial bound. Using this as a building block, combined with nearest neighbor supervised learning, we argue that MCTS acts as a "policy improvement" operator; it has a natural "bootstrapping" property to iteratively improve value function approximation for all states, due to combining with supervised learning, despite evaluating at only finitely many states. In effect, we establish that to learn $\varepsilon$ approximation of value function in $\ell_\infty$ norm, MCTS combined with nearest-neighbors requires samples scaling as $\widetilde{O}\big(\varepsilon^{-(d+4)}\big)$, where $d$ is the dimension of the state space. This is nearly optimal due to a minimax lower bound of $\widetilde{\Omega}\big(\varepsilon^{-(d+2)}\big).$


A General Theory for Structured Prediction with Smooth Convex Surrogates

arXiv.org Machine Learning

In this work we provide a theoretical framework for structured prediction that generalizes the existing theory of surrogate methods for binary and multiclass classification based on estimating conditional probabilities with smooth convex surrogates (e.g. logistic regression). The theory relies on a natural characterization of structural properties of the task loss and allows to derive statistical guarantees for many widely used methods in the context of multilabeling, ranking, ordinal regression and graph matching. In particular, we characterize the smooth convex surrogates compatible with a given task loss in terms of a suitable Bregman divergence composed with a link function. This allows to derive tight bounds for the calibration function and to obtain novel results on existing surrogate frameworks for structured prediction such as conditional random fields and quadratic surrogates.


Seeds of Machine Learning - SageORB

#artificialintelligence

Machine learning is one of the most powerful forces in technology. Its development is shaping the forefront of the future in industries in artificial intelligence. Machine learning refers to the automated process by which machines extract meaningful patterns in data. Without machine learning, artificial intelligence as we know it wouldn't be possible. In 1959, MIT engineer Arthur Samuel coined the term "machine learning" and described machine learning as a "Field of study that gives computers the ability to learn without being explicitly programmed."


Stable multi-instance learning visa causal inference

arXiv.org Machine Learning

Multi-instance learning (MIL) deals with tasks where each example is represented by a bag of instances. Unlike traditional supervised learning, only the bag labels are observed whereas the label for each instance in the bags is not available. Previous MIL studies typically assume that training and the test data follow the same distribution, which is often violated in real-world applications. Existing methods address distribution changes by reweighting the training bags with the density ratio between the test and the training data. However, models are frequently trained without prior knowledge of the testing distribution which renders existing methods ineffective. In this paper, we propose a novel multi-instance learning algorithm which links MIL with causal inference to achieve stable prediction without knowing the distribution of the test dataset. Experimental results show that the performance of our approach is stable to the distribution changes.


An online supervised learning algorithm based on triple spikes for spiking neural networks

arXiv.org Artificial Intelligence

Using precise times of every spike, spiking supervised learning has more effects on complex spatial-temporal pattern than supervised learning only through neuronal firing rates. The purpose of spiking supervised learning after spatial-temporal encoding is to emit desired spike trains with precise times. Existing algorithms of spiking supervised learning have excellent performances, but mechanisms of them still have some problems, such as the limitation of neuronal types and complex computation. Based on an online regulative mechanism of biological synapses, this paper proposes an online supervised learning algorithm of multiple spike trains for spiking neural networks. The proposed algorithm with a spatial-temporal transformation can make a simple direct regulation of synaptic weights as soon as firing time of an output spike is obtained. Besides, it is also not restricted by types of spiking neuron models. Relationship among desired output, actual output and input spike trains is firstly analyzed and synthesized to simply select a unit of pair-spike for a direct regulation. And then a computational method is constructed based on simple triple spikes using this direct regulation. Compared with other learning algorithms, results of experiments show that proposed algorithm has higher learning accuracy and efficiency.


Semi-Supervised and Task-Driven Data Augmentation

arXiv.org Machine Learning

Supervised deep learning methods for segmentation require large amounts of labelled training data, without which they are prone to overfitting, not generalizing well to unseen images. In practice, obtaining a large number of annotations from clinical experts is expensive and time-consuming. One way to address scarcity of annotated examples is data augmentation using random spatial and intensity transformations. Recently, it has been proposed to use generative models to synthesize realistic training examples, complementing the random augmentation. So far, these methods have yielded limited gains over the random augmentation. However, there is potential to improve the approach by (i) explicitly modeling deformation fields (non-affine spatial transformation) and intensity transformations and (ii) leveraging unlabelled data during the generative process. With this motivation, we propose a novel task-driven data augmentation method where to synthesize new training examples, a generative network explicitly models and applies deformation fields and additive intensity masks on existing labelled data, modeling shape and intensity variations, respectively. Crucially, the generative model is optimized to be conducive to the task, in this case segmentation, and constrained to match the distribution of images observed from labelled and unlabelled samples. Furthermore, explicit modeling of deformation fields allow synthesizing segmentation masks and images in exact correspondence by simply applying the generated transformation to an input image and the corresponding annotation. Our experiments on cardiac magnetic resonance images (MRI) showed that, for the task of segmentation in small training data scenarios, the proposed method substantially outperforms conventional augmentation techniques.


Confidence-based Graph Convolutional Networks for Semi-Supervised Learning

arXiv.org Machine Learning

Predicting properties of nodes in a graph is an important problem with applications in a variety of domains. Graph-based Semi-Supervised Learning (SSL) methods aim to address this problem by labeling a small subset of the nodes as seeds and then utilizing the graph structure to predict label scores for the rest of the nodes in the graph. Recently, Graph Convolutional Networks (GCNs) have achieved impressive performance on the graph-based SSL task. In addition to label scores, it is also desirable to have confidence scores associated with them. Unfortunately, confidence estimation in the context of GCN has not been previously explored. We fill this important gap in this paper and propose ConfGCN, which estimates labels scores along with their confidences jointly in GCN-based setting. ConfGCN uses these estimated confidences to determine the influence of one node on another during neighborhood aggregation, thereby acquiring anisotropic capabilities. Through extensive analysis and experiments on standard benchmarks, we find that ConfGCN is able to outperform state-of-the-art baselines. We have made ConfGCN's source code available to encourage reproducible research.