Inductive Learning
Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning
Miyato, Takeru, Maeda, Shin-ichi, Koyama, Masanori, Ishii, Shin
We propose a new regularization method based on virtual adversarial loss: a new measure of local smoothness of the conditional label distribution given input. Virtual adversarial loss is defined as the robustness of the conditional label distribution around each input data point against local perturbation. Unlike adversarial training, our method defines the adversarial direction without label information and is hence applicable to semi-supervised learning. Because the directions in which we smooth the model are only "virtually" adversarial, we call our method virtual adversarial training (VAT). The computational cost of VAT is relatively low. For neural networks, the approximated gradient of virtual adversarial loss can be computed with no more than two pairs of forward- and back-propagations. In our experiments, we applied VAT to supervised and semi-supervised learning tasks on multiple benchmark datasets. With a simple enhancement of the algorithm based on the entropy minimization principle, our VAT achieves state-of-the-art performance for semi-supervised learning tasks on SVHN and CIFAR-10.
Modular meta-learning
Alet, Ferran, Lozano-Pรฉrez, Tomรกs, Kaelbling, Leslie P.
In many situations, such as robot-learning, training experience is very expensive. One strategy for reducing the amount of training data needed for a new task is to learn some form of prior or bias using data from several related tasks. The objective of this process is to extract information that will substantially reduce the training-data requirements for a new task. This problem is a form of transfer learning, sometimes also called meta-learning or "learning to learn" [1, 2]. Previous approaches to meta-learning for robotics have focused on finding distributions over [3] or initial values of [4, 5] parameters, based on a set of "training tasks," that will enable a new "test task" to be learned with many fewer training examples. Our objective is similar, but rather than focusing on transferring information about parameter values, we focus on finding a reusable set of modules that can form components of a solution to a new task, possibly with a small amount of tuning. Modular approaches to learning have been very successful in structured tasks such as naturallanguage sentence interpretation [6], in which the input signal gives relatively direct information about a good structural decomposition of the problem. We wish to address problems that may benefit from a modular decomposition but do not provide any task-level input from which the structure of a solution may be derived. Nonetheless, we adopt a similar modular structure and parameteradaptation method for learning our reusable modules, but use a general-purpose simulated-annealing search strategy to find an appropriate structural decomposition for each new task.
Manifold Structured Prediction
Rudi, Alessandro, Ciliberto, Carlo, Marconi, Gian Maria, Rosasco, Lorenzo
Regression and classification are probably the most classical machine learning problems and correspond to estimating a function with scalar and binary values, respectively. In practice, it is often interesting to estimate functions with more structured outputs. When the output space can be assumed to be a vector space, many ideas from regression can be extended, think for example to multivariate [14] or functional regression [23]. However, a lack of a natural vector structure is a feature of many practically interesting problems, such as ranking [11], quantile estimation [19] or graph prediction [28]. In this latter case, the outputs are typically provided only with some distance or similarity function that can be used to design appropriate loss function. Knowledge of the loss is sufficient to analyze an abstract empirical risk minimization approach within the framework of statistical learning theory, but deriving approaches that are at the same time statistically sound and computationally feasible is a key challenge. While ad-hoc solutions are available for many specific problems [7, 9, 18, 27], structured prediction [5] provides a unifying framework where a variety of problems can be tackled as special cases.
Cycle Consistent Adversarial Denoising Network for Multiphase Coronary CT Angiography
Kang, Eunhee, Koo, Hyun Jung, Yang, Dong Hyun, Seo, Joon Bum, Ye, Jong Chul
Abstract--In coronary CT angiography, a series of CT images are taken at different levels of radiation dose during the examination. Although this reduces the total radiation dose, the image quality during the low-dose phases is significantly degraded. T o address this problem, here we propose a novel semi-supervised learning technique that can remove the noises of the CT images obtained in the low-dose phases by learning from the CT images in the routine dose phases. Although a supervised learning approach is not possible due to the differences in the underlying heart structure in two phases, the images in the two phases are closely related so that we propose a cycle-consistent adversarial denoising network to learn the non-degenerate mapping between the low and high dose cardiac phases. Experimental results showed that the proposed method effectively reduces the noise in the low-dose CT image while the preserving detailed texture and edge information. Moreover, thanks to the cyclic consistency and identity loss, the proposed network does not create any artificial features that are not present in the input images. Visual grading and quality evaluation also confirm that the proposed method provides significant improvement in diagnostic quality.
Equalizing Financial Impact in Supervised Learning
Machine learning is revolutionizing the way we interact with the world. Popular websites use algorithms to analyze user data and recommend videos, customize social media feeds, and optimize advertisements. Unsurprisingly, machine learning is taking a large role in making decisions about human beings, ranging from credit to parole decisions, and is likely to be more and more widely used in the future. It is not hard to imagine that, even in cases where the final decisions are made by people, they will be doing so with advice from algorithms that make inferences from patterns in petabytes of data. Some proponents of machine learning have suggested that not only are these algorithms able to leverage the increasing amount of data we have access to, but also that they might be able to make these decisions more fairly, as they seem to not be subject to human biases. There is some truth to these claims.
Learning Instance Segmentation by Interaction
Pathak, Deepak, Shentu, Yide, Chen, Dian, Agrawal, Pulkit, Darrell, Trevor, Levine, Sergey, Malik, Jitendra
We present an approach for building an active agent that learns to segment its visual observations into individual objects by interacting with its environment in a completely self-supervised manner. The agent uses its current segmentation model to infer pixels that constitute objects and refines the segmentation model by interacting with these pixels. The model learned from over 50K interactions generalizes to novel objects and backgrounds. To deal with noisy training signal for segmenting objects obtained by self-supervised interactions, we propose robust set loss. A dataset of robot's interactions along-with a few human labeled examples is provided as a benchmark for future research. We test the utility of the learned segmentation model by providing results on a downstream vision-based control task of rearranging multiple objects into target configurations from visual inputs alone. Videos, code, and robotic interaction dataset are available at https://pathak22.
Unsupervised Imitation Learning
Curi, Sebastian, Levy, Kfir Y., Krause, Andreas
We introduce a novel method to learn a policy from unsupervised demonstrations of a process. Given a model of the system and a set of sequences of outputs, we find a policy that has a comparable performance to the original policy, without requiring access to the inputs of these demonstrations. We do so by first estimating the inputs of the system from observed unsupervised demonstrations. Then, we learn a policy by applying vanilla supervised learning algorithms to the (estimated)input-output pairs. For the input estimation, we present a new adaptive linear estimator (AdaL-IE) that explicitly trades-off variance and bias in the estimation. As we show empirically, AdaL-IE produces estimates with lower error compared to the state-of-the-art input estimation method, (UMV-IE) [Gillijns and De Moor, 2007]. Using AdaL-IE in conjunction with imitation learning enables us to successfully learn control policies that consistently outperform those using UMV-IE.
Report on FBI Actions in Clinton Email Case Set for Release
FILE - In this April 6, 2017, file photo, former Secretary of State Hillary Clinton speaks in New York. The Justice Department's internal watchdog is expected to criticize the FBI's handling of the Clinton email investigation, stepping into a political minefield as it details how a determinedly non-partisan law enforcement agency came to be entangled in the 2016 presidential race. President Donald Trump will look to the inspector general report to provide a fresh line of attack against the FBI's two former top officials, Director James Comey and his deputy, Andrew McCabe, as he claims that a politically tainted bureau tried to undermine his campaign and, through the Russia investigation, his presidency.
Sparse Stochastic Zeroth-Order Optimization with an Application to Bandit Structured Prediction
Sokolov, Artem, Hitschler, Julian, Riezler, Stefan
Stochastic zeroth-order (SZO), or gradient-free, optimization allows to optimize arbitrary functions by relying only on function evaluations under parameter perturbations, however, the iteration complexity of SZO methods suffers a factor proportional to the dimensionality of the perturbed function. We show that in scenarios with natural sparsity patterns as in structured prediction applications, this factor can be reduced to the expected number of active features over input-output pairs. We give a general proof that applies sparse SZO optimization to Lipschitz-continuous, nonconvex, stochastic objectives, and present an experimental evaluation on linear bandit structured prediction tasks with sparse word-based feature representations that confirm our theoretical results.
A One-Sided Classification Toolkit with Applications in the Analysis of Spectroscopy Data
This dissertation investigates the use of one-sided classification algorithms in the application of separating hazardous chlorinated solvents from other materials, based on their Raman spectra. The experimentation is carried out using a new one-sided classification toolkit that was designed and developed from the ground up. In the one-sided classification paradigm, the objective is to separate elements of the target class from all outliers. These one-sided classifiers are generally chosen, in practice, when there is a deficiency of some sort in the training examples. Sometimes outlier examples can be rare, expensive to label, or even entirely absent. However, this author would like to note that they can be equally applicable when outlier examples are plentiful but nonetheless not statistically representative of the complete outlier concept. It is this scenario that is explicitly dealt with in this research work. In these circumstances, one-sided classifiers have been found to be more robust that conventional multi-class classifiers. The term "unexpected" outliers is introduced to represent outlier examples, encountered in the test set, that have been taken from a different distribution to the training set examples. These are examples that are a result of an inadequate representation of all possible outliers in the training set. It can often be impossible to fully characterise outlier examples given the fact that they can represent the immeasurable quantity of "everything else" that is not a target. The findings from this research have shown the potential drawbacks of using conventional multi-class classification algorithms when the test data come from a completely different distribution to that of the training samples.