Goto

Collaborating Authors

 Inductive Learning


Understanding Hinton's Capsule Networks. Part IV: CapsNet Architecture

#artificialintelligence

Encoder part of the network takes as input a 28 by 28 MNIST digit image and learns to encode it into a 16-dimensional vector of instantiation parameters (as explained in the previous posts of this series), this is where the capsules do their job. The output of the network during prediction is a 10-dimensional vectors of lengths of DigitCaps' outputs. The decoder has 3 layers: two of them are convolutional and the last one is fully connected. Convolutional layer's job is to detect basic features in the 2D image. In the CapsNet, the convolutional layer has 256 kernels with size of 9x9x1 and stride 1, followed by ReLU activation. If you don't know what this means, here are some awesome resources that will allow you to quickly pick up key ideas behind convolutions.


One-Class Kernel Spectral Regression for Outlier Detection

arXiv.org Machine Learning

The paper introduces a new efficient nonlinear one-class classifier formulated as the Rayleigh quotient criterion. The method, operating in a reproducing kernel Hilbert subspace, minimises the scatter of target distribution along an optimal projection direction while at the same time keeping projections of positive observations as distant as possible from the mean of the negative class. We provide a graph embedding view of the problem which can then be solved efficiently using the spectral regression approach. In this sense, unlike previous similar methods which often require costly eigen-computations of dense matrices, the proposed approach casts the problem under consideration into a regression framework which avoids eigen-decomposition computations. In particular, it is shown that the dominant complexity of the proposed method is the complexity of computing the kernel matrix. Additional appealing characteristics of the proposed one-class classifier are: 1-the ability to be trained in an incremental fashion (allowing for application in streaming data scenarios while also reducing computational complexity in a non-streaming operation mode); 2-being unsupervised while also providing the functionality for refining the solution using negative training examples, in case available; And last but not least 3-the deployment of the kernel trick allowing for nonlinearly mapping the data into a high-dimensional feature space. Extensive experiments conducted on several datasets verify the merits of the proposed approach in comparison with some other alternatives.


A New Variational Model for Binary Classification in the Supervised Learning Context

arXiv.org Machine Learning

We examine the supervised learning problem in its continuous setting and give a general optimality condition through techniques of functional analysis and the calculus of variations. This enables us to solve the optimality condition for the desired function u numerically and make several comparisons with other widely utilized supervised learning models. We employ the accuracy and area under the receiver operating characteristic curve as metrics of the performance. Finally, 3 analyses are conducted based on these two mentioned metrics where we compare the models and make conclusions to determine whether or not our method is competitive.


A Structured Prediction Approach for Label Ranking

arXiv.org Machine Learning

We propose to solve a label ranking problem as a structured output regression task. We adopt a least square surrogate loss approach that solves a supervised learning problem in two steps: the regression step in a well-chosen feature space and the pre-image step. We use specific feature maps/embeddings for ranking data, which convert any ranking/permutation into a vector representation. These embeddings are all well-tailored for our approach, either by resulting in consistent estimators, or by solving trivially the pre-image problem which is often the bottleneck in structured prediction. We also propose their natural extension to the case of partial rankings and prove their efficiency on real-world datasets.


Fully Scalable Gaussian Processes using Subspace Inducing Inputs

arXiv.org Machine Learning

We introduce fully scalable Gaussian processes, an implementation scheme that tackles the problem of treating a high number of training instances together with high dimensional input data. Our key idea is a representation trick over the inducing variables called subspace inducing inputs. This is combined with certain matrix-preconditioning based parametrizations of the variational distributions that lead to simplified and numerically stable variational lower bounds. Our illustrative applications are based on challenging extreme multi-label classification problems with the extra burden of the very large number of class labels. We demonstrate the usefulness of our approach by presenting predictive performances together with low computational times in datasets with extremely large number of instances and input dimensions.


Vicarious

#artificialintelligence

The ability to generalize from a few training examples is one of the hallmarks of human intelligence. This ability is required for robots to work effectively in a variety of environments without arduous reprogramming. Our algorithms learn models of the world that are then applied flexibly in a wide variety of situations. Our research emphasizes representations that enable task generality. Underscoring our research strategy is the aim to discover the underlying properties of intelligence from neuroscience and cognitive science.


Case Set for Review After Man Dies 10 Months After Shooting

U.S. News

The Vanderburgh County Coroner's office says Austin Smith died Friday. He was shot on Aug. 31, 2017. Twenty-two-year-old Travis Phelps is accused of firing several shots into Smith's car, causing him to crash.


Active Learning with Unbalanced Classes and Example-Generation Queries

AAAI Conferences

Machine learning in real-world high-skew domains is difficult, because traditional strategies for crowdsourcing labeled training examples are ineffective at locating the scarce minority-class examples. For example, both random sampling and traditional active learning (which reduces to random sampling when just starting) will most likely recover very few minority-class examples. To bootstrap the machine learning process, researchers have proposed tasking the crowd with finding or generating minority-class examples, but such strategies have their weaknesses as well. They are unnecessarily expensive in well-balanced domains, and they often yield samples from a biased distribution that is unrepresentative of the one being learned.This paper extends the traditional active learning framework by investigating the problem of intelligently switching between various crowdsourcing strategies for obtaining labeled training examples in order to optimally train a classifier. We start by analyzing several such strategies (e.g., annotate an example, generate a minority-class example, etc.), and then develop a novel, skew-robust algorithm, called MB-CB, for the control problem. Experiments show that our method outperforms state-of-the-art GL-Hybrid by up to 14.3 points in F1 AUC, across various domains and class-frequency settings.


A New Benchmark and Progress Toward Improved Weakly Supervised Learning

arXiv.org Machine Learning

In our work, we completely solve the previous Knowledge Matters problem using a generic model, pose a more difficult and scalable problem, All-Pairs, and advance this new problem by introducing a new learned, spatially-varying histogram model called TypeNet which outperforms conventional models on the problem. We present results on All-Pairs where our model achieves 100% test accuracy while the best ResNet models achieve 79% accuracy. In addition, our model is more than an order of magnitude smaller than Resnet-34. The challenge of solving larger-scale All-Pairs problems with high accuracy is presented to the community for investigation.


XGBoost: Scalable GPU Accelerated Learning

arXiv.org Machine Learning

We describe the multi-GPU gradient boosting algorithm implemented in the XGBoost library (https://github.com/dmlc/xgboost). Our algorithm allows fast, scalable training on multi-GPU systems with all of the features of the XGBoost library. We employ data compression techniques to minimise the usage of scarce GPU memory while still allowing highly efficient implementation. Using our algorithm we show that it is possible to process 115 million training instances in under three minutes on a publicly available cloud computing instance. The algorithm is implemented using end-to-end GPU parallelism, with prediction, gradient calculation, feature quantisation, decision tree construction and evaluation phases all computed on device.