Inductive Learning
Bootstrapping Ternary Relation Extractors
Binary relation extraction methods have been widely studied in recent years. However, few methods have been developed for higher n-ary relation extraction. One limiting factor is the effort required to generate training data. For binary relations, one only has to provide a few dozen pairs of entities per relation, as training data. For ternary relations (n=3), each training instance is a triplet of entities, placing a greater cognitive load on people. For example, many people know that Google acquired Youtube but not the dollar amount or the date of the acquisition and many people know that Hillary Clinton is married to Bill Clinton by not the location or date of their wedding. This makes higher n-nary training data generation a time consuming exercise in searching the Web. We present a resource for training ternary relation extractors. This was generated using a minimally supervised yet effective approach. We present statistics on the size and the quality of the dataset.
10 Exciting Ideas of 2018 in NLP
This post gathers 10 ideas that I found exciting and impactful this year--and that we'll likely see more of in the future. For each idea, I will highlight 1-2 papers that execute them well. I tried to keep the list succinct, so apologies if I did not cover all relevant work. The list is necessarily subjective and covers ideas mainly related to transfer learning and generalization. Most of these (with some exceptions) are not trends (but I suspect that some might become more'trendy' in 2019).
Cloud TPU Pods break AI training records Google Cloud Blog
Google Cloud's AI-optimized infrastructure makes it possible for businesses to train state-of-the-art machine learning models faster, at greater scale, and at lower cost. These advantages enabled Google Cloud Platform (GCP) to set three new performance records in the latest round of the MLPerf benchmark competition, the industry-wide standard for measuring ML performance. All three record-setting results ran on Cloud TPU v3 Pods, the latest generation of supercomputers that Google has built specifically for machine learning. These results showcased the speed of Cloud TPU Pods-- with each of the winning runs using less than two minutes of compute time. With these latest MLPerf benchmark results, Google Cloud is the first public cloud provider to outperform on-premise systems when running large-scale, industry-standard ML training workloads of Transformer, Single Shot Detector (SSD), and ResNet-50.
Semi-Supervised Graph Embedding for Multi-Label Graph Node Classification
Gao, Kaisheng, Zhang, Jing, Zhou, Cangqi
The graph convolution network (GCN) is a widely-used facility to realize graph-based semi-supervised learning, which usually integrates node features and graph topologic information to build learning models. However, as for multi-label learning tasks, the supervision part of GCN simply minimizes the cross-entropy loss between the last layer outputs and the ground-truth label distribution, which tends to lose some useful information such as label correlations, so that prevents from obtaining high performance. In this paper, we pro-pose a novel GCN-based semi-supervised learning approach for multi-label classification, namely ML-GCN. ML-GCN first uses a GCN to embed the node features and graph topologic information. Then, it randomly generates a label matrix, where each row (i.e., label vector) represents a kind of labels. The dimension of the label vector is the same as that of the node vector before the last convolution operation of GCN. That is, all labels and nodes are embedded in a uniform vector space. Finally, during the ML-GCN model training, label vectors and node vectors are concatenated to serve as the inputs of the relaxed skip-gram model to detect the node-label correlation as well as the label-label correlation. Experimental results on several graph classification datasets show that the proposed ML-GCN outperforms four state-of-the-art methods.
Minimizers of the Empirical Risk and Risk Monotonicity
Loog, Marco, Viering, Tom, Mey, Alexander
Plotting a learner's average performance against the number of training samples results in a learning curve. Studying such curves on one or more data sets is a way to get to a better understanding of the generalization properties of this learner. The behavior of learning curves is, however, not very well understood and can display (for most researchers) quite unexpected behavior. Our work introduces the formal notion of \emph{risk monotonicity}, which asks the risk to not deteriorate with increasing training set sizes in expectation over the training samples. We then present the surprising result that various standard learners, specifically those that minimize the empirical risk, can act \emph{non}monotonically irrespective of the training sample size. We provide a theoretical underpinning for specific instantiations from classification, regression, and density estimation. Altogether, the proposed monotonicity notion opens up a whole new direction of research.
Amplifying R\'enyi Differential Privacy via Shuffling
Berthier, Eloรฏse, Karimireddy, Sai Praneeth
Differential privacy is a useful tool to build machine learning models which do not release too much information about the training data. We study the R\'enyi differential privacy of stochastic gradient descent when each training example is sampled without replacement (also known as cyclic SGD). Cyclic SGD is typically faster than traditional SGD and is the algorithm of choice in large-scale implementations. We recover privacy guarantees for cyclic SGD which are competitive with those known for sampling with replacement. Our proof techniques make no assumptions on the model or on the data and are hence widely applicable.
GraphSAINT: Graph Sampling Based Inductive Learning Method
Zeng, Hanqing, Zhou, Hongkuan, Srivastava, Ajitesh, Kannan, Rajgopal, Prasanna, Viktor
Graph Convolutional Networks (GCNs) are powerful models for learning representations of attributed graphs.To scale GCNs to large graphs, state-of-the-art methods use various layer sampling techniques to alleviate the "neighbor explosion" problem during minibatch training. Here we proposeGraphSAINT, a graph sampling based inductive learning method that improves training efficiency in a fundamentally different way. By a change of perspective, GraphSAINT constructs minibatches by sampling the training graph, rather than the nodes or edges across GCN layers. Each iteration, a complete GCN is built from the properly sampled subgraph. Thus, we ensure fixed number of well-connected nodes in all layers. We further propose normalization technique to eliminate bias, and sampling algorithms for variance reduction. Importantly, we can decouple the sampling process from the forward and backward propagation of training, and extend GraphSAINT with other graph samplers and GCN variants. Comparing with strong baselines using layer sampling, GraphSAINT demonstrates superior performance in both accuracy and training time on four large graphs.
Contextual One-Class Classification in Data Streams
Moulton, Richard Hugh, Viktor, Herna L., Japkowicz, Nathalie, Gama, Joรฃo
In machine learning, the one-class classification problem occurs when training instances are only available from one class. It has been observed that making use of this class's structure, or its different contexts, may improve one-class classifier performance. Although this observation has been demonstrated for static data, a rigorous application of the idea within the data stream environment is lacking. To address this gap, we propose the use of context to guide one-class classifier learning in data streams, paying particular attention to the challenges presented by the dynamic learning environment. We present three frameworks that learn contexts and conduct experiments with synthetic and benchmark data streams. We conclude that the paradigm of contexts in data streams can be used to improve the performance of streaming one-class classifiers.
This online game wants to teach the public about AI bias
Artificial intelligence might be coming for your next job, just not in the way you feared. The past few years have seen any number of articles that warn about a future where AI and automation drive humans into mass unemployment. To a considerable extent, those threats are overblown and distant. But a more imminent threat to jobs is that of algorithmic bias, the effect of machine learning models making decisions based on the wrong patterns in their training examples. A online game developed by computer science students at New York University aims to educate the public about the effects of AI bias in hiring.
Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty
Hendrycks, Dan, Mazeika, Mantas, Kadavath, Saurav, Song, Dawn
Self-supervision provides effective representations for downstream tasks without requiring labels. However, existing approaches lag behind fully supervised training and are often not thought beneficial beyond obviating the need for annotations. We find that self-supervision can benefit robustness in a variety of ways, including robustness to adversarial examples, label corruption, and common input corruptions. Additionally, self-supervision greatly benefits out-of-distribution detection on difficult, near-distribution outliers, so much so that it exceeds the performance of fully supervised methods. These results demonstrate the promise of self-supervision for improving robustness and uncertainty estimation and establish these tasks as new axes of evaluation for future self-supervised learning research.