# inductive learning

### 10 Exciting Ideas of 2018 in NLP

This post gathers 10 ideas that I found exciting and impactful this year--and that we'll likely see more of in the future. For each idea, I will highlight 1-2 papers that execute them well. I tried to keep the list succinct, so apologies if I did not cover all relevant work. The list is necessarily subjective and covers ideas mainly related to transfer learning and generalization. Most of these (with some exceptions) are not trends (but I suspect that some might become more'trendy' in 2019).

### Cloud TPU Pods break AI training records Google Cloud Blog

Google Cloud's AI-optimized infrastructure makes it possible for businesses to train state-of-the-art machine learning models faster, at greater scale, and at lower cost. These advantages enabled Google Cloud Platform (GCP) to set three new performance records in the latest round of the MLPerf benchmark competition, the industry-wide standard for measuring ML performance. All three record-setting results ran on Cloud TPU v3 Pods, the latest generation of supercomputers that Google has built specifically for machine learning. These results showcased the speed of Cloud TPU Pods-- with each of the winning runs using less than two minutes of compute time. With these latest MLPerf benchmark results, Google Cloud is the first public cloud provider to outperform on-premise systems when running large-scale, industry-standard ML training workloads of Transformer, Single Shot Detector (SSD), and ResNet-50.

### This online game wants to teach the public about AI bias

Artificial intelligence might be coming for your next job, just not in the way you feared. The past few years have seen any number of articles that warn about a future where AI and automation drive humans into mass unemployment. To a considerable extent, those threats are overblown and distant. But a more imminent threat to jobs is that of algorithmic bias, the effect of machine learning models making decisions based on the wrong patterns in their training examples. A online game developed by computer science students at New York University aims to educate the public about the effects of AI bias in hiring.

### MixUp as Directional Adversarial Training

In this work, we explain the working mechanism of MixUp in terms of adversarial training. We introduce a new class of adversarial training schemes, which we refer to as directional adversarial training, or DAT. In a nutshell, a DAT scheme perturbs a training example in the direction of another example but keeps its original label as the training target. We prove that MixUp is equivalent to a special subclass of DAT, in that it has the same expected loss function and corresponds to the same optimization problem asymptotically. This understanding not only serves to explain the effectiveness of MixUp, but also reveals a more general family of MixUp schemes, which we call Untied MixUp. We prove that the family of Untied MixUp schemes is equivalent to the entire class of DAT schemes. We establish empirically the existence of Untied Mixup schemes which improve upon MixUp.

### What you need is a more professional teacher

We propose a simple and efficient method to combine semi-supervised learning with weakly-supervised learning for deep neural networks. Designing deep neural networks for weakly-supervised learning is always accompanied by a tradeoff between fine-information and coarse-level classification accuracy. While using unlabeled data for semi-supervised learning, in contrast to seeking for this tradeoff, we design two extremely different models for different targets, one of which just pursues finer information for the final target. Another one is more professional to achieve higher coarse-level classification accuracy so that it is regarded as a more professional teacher to teach the former model using unlabeled data. We present an end-to-end semi-supervised learning process termed guided learning for these two different models so that improve the training efficiency. Our approach improves the $1^{st}$ place result on Task4 of the DCASE2018 challenge from $32.4\%$ to $38.3\%$, achieving start-of-art performance.

### Self-organized inductive reasoning with NeMuS

In this direction, patterns of concepts can be used to justify (and explain) Neural Multi-Space (NeMuS) is a weighted multispace "shortcuts" to generate recursive hypothesis from very large representation for a portion of first-order sets of relations without the need to compute the entire path logic designed for use with machine learning and to justify it. This is critical when the background knowledge neural network methods. It was demonstrated that has huge amounts of data. It could be adequately handled it can be used to perform reasoning based on regions as regions of concepts and categories, similar to the human forming patterns of refutation and also in brain map organization. This will allow symbolic deduction the process of inductive learning in ILP-like style.

### Label-less supervised learning? Enter self-supervised learning.

High-capacity networks are solving many different machine learning tasks, ranging from large-scale image classification, segmentation and image generation, to natural speech understanding and realistic text-to-speech, arguably passing some formulations of a Turing Test. A few general trends are easily identified in academia and industry: deeper networks show increasingly better results, as long as they are fed with ever bigger amounts of data, and labelled data in particular. Computational and economic costs increase linearly with the size of the dataset and for this reason, starting 2015 a number of unsupervised approaches aiming at the exploitation of unlabelled data are growing in popularity. The intuition behind many of these techniques is to emulate the ability of human brains to self determine the goal of a task and improve towards it. Starting 2015 advancements in algorithms able to exploit labels inherently contained within an unlabelled dataset gave rise to what is now referenced as self-supervised learning.

### Efficient predicate invention using shared "NeMuS"

Amao is a cognitive agent framework that tackles the invention of predicates with a different strategy as compared to recent advances in Inductive Logic Programming (ILP) approaches like Meta-Intepretive Learning (MIL) technique. It uses a Neural Multi-Space (NeMuS) graph structure to anti-unify atoms from the Herbrand base, which passes in the inductive momentum check. Inductive Clause Learning (ICL), as it is called, is extended here by using the weights of logical components, already present in NeMuS, to support inductive learning by expanding clause candidates with anti-unified atoms. An efficient invention mechanism is achieved, including the learning of recursive hypotheses, while restricting the shape of the hypothesis by adding bias definitions or idiosyncrasies of the language.

### Contrastive Bidirectional Transformer for Temporal Representation Learning

This paper aims at learning representations for long sequences of continuous signals. Recently, the BERT model has demonstrated the effectiveness of stacked transformers for representing sequences of discrete signals (i.e. word tokens). Inspired by its success, we adopt the stacked transformer architecture, but generalize its training objective to maximize the mutual information between the masked signals, and the bidirectional context, via contrastive loss. This enables the model to handle continuous signals, such as visual features. We further consider the case when there are multiple sequences that are semantically aligned at the sequence-level but not at the element-level (e.g. video and ASR), where we propose to use a Transformer to estimate the mutual information between the two sequences, which is again maximized via contrastive loss. We demonstrate the effectiveness of the learned representations on modeling long video sequences for action anticipation and video captioning. The results show that our method, referred to by Contrastive Bidirectional Transformer ({\bf CBT}), outperforms various baselines significantly. Furthermore, we improve over the state of the art.