Goto

Collaborating Authors

 Inductive Learning


CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages

arXiv.org Artificial Intelligence

Neural dependency parsing has achieved remarkable performance for low resource morphologically rich languages. It has also been well-studied that morphologically rich languages exhibit relatively free word order. This prompts a fundamental investigation: Is there a way to enhance dependency parsing performance, making the model robust to word order variations utilizing the relatively free word order nature of morphologically rich languages? In this work, we examine the robustness of graph-based parsing architectures on 7 relatively free word order languages. We focus on scrutinizing essential modifications such as data augmentation and the removal of position encoding required to adapt these architectures accordingly. To this end, we propose a contrastive self-supervised learning method to make the model robust to word order variations. Furthermore, our proposed modification demonstrates a substantial average gain of 3.03/2.95 points in 7 relatively free word order languages, as measured by the UAS/LAS Score metric when compared to the best performing baseline.


Reviews: Semi-Supervised Learning with Declaratively Specified Entropy Constraints

Neural Information Processing Systems

This paper proposes a method to combine (or ensemble) several SSL heuristics (regularizers) by using a Bayesian optimization approach. The basic idea of the proposed method borrowed from the previous method called D-Learner, which is declared in this paper. Therefore, the proposed method is basically a modification or extension of D-Learner, which seems not to be totally novel. In this perspective, this paper is rather incremental than innovative. The experimental results look fairly well comparing with the methods in previous studies including the baseline D-Learner on the tasks of text classification and relation extraction examined in this paper.


A Retrieve-and-Edit Framework for Predicting Structured Outputs

Neural Information Processing Systems

For the task of generating complex outputs such as source code, editing existing outputs can be easier than generating complex outputs from scratch. With this motivation, we propose an approach that first retrieves a training example based on the input (e.g., natural language description) and then edits it to the desired output (e.g., code). Our contribution is a computationally efficient method for learning a retrieval model that embeds the input in a task-dependent way without relying on a hand-crafted metric or incurring the expense of jointly training the retriever with the editor. Our retrieve-and-edit framework can be applied on top of any base model. We show that on a new autocomplete task for GitHub Python code and the Hearthstone cards benchmark, retrieve-and-edit significantly boosts the performance of a vanilla sequence-to-sequence model on both tasks.


Training Deep Models Faster with Robust, Approximate Importance Sampling

Neural Information Processing Systems

In practice, the cost of computing importances greatly limits the impact of importance sampling. We propose a robust, approximate importance sampling procedure (RAIS) for stochastic gradient de- scent. By approximating the ideal sampling distribution using robust optimization, RAIS provides much of the benefit of exact importance sampling with drastically reduced overhead. Empirically, we find RAIS-SGD and standard SGD follow similar learning curves, but RAIS moves faster through these paths, achieving speed-ups of at least 20% and sometimes much more.


The Sample Complexity of Semi-Supervised Learning with Nonparametric Mixture Models

Neural Information Processing Systems

We study the sample complexity of semi-supervised learning (SSL) and introduce new assumptions based on the mismatch between a mixture model learned from unlabeled data and the true mixture model induced by the (unknown) class conditional distributions. Under these assumptions, we establish an \Omega(K\log K) labeled sample complexity bound without imposing parametric assumptions, where K is the number of classes. Our results suggest that even in nonparametric settings it is possible to learn a near-optimal classifier using only a few labeled samples. Unlike previous theoretical work which focuses on binary classification, we consider general multiclass classification ( K 2), which requires solving a difficult permutation learning problem. This permutation defines a classifier whose classification error is controlled by the Wasserstein distance between mixing measures, and we provide finite-sample results characterizing the behaviour of the excess risk of this classifier.


Reviews: Manifold Structured Prediction

Neural Information Processing Systems

Summary: This paper is an extension of the results presented in "A Consistent Regularization Approach for Structured Prediction" by Ciliberto et al. It focuses on the specific case where the output space is a Riemannian manifold, and describes/proves sufficient conditions for loss functions defined over manifolds to have the properties of what is called a "Structure Encoding Loss Function" (SELF). Ciliberto et al presents an estimator that, when used with a SELF, has provable universal consistency and learning rates; this paper extends this estimator and these prior theoretical results to be used also with the aforementioned class of loss functions defined over manifolds, with a specific focus placed on the squared geodesic distance. After describing how inference can be achieved using the previously defined estimator for the specific output spaces defined here, experiments are run on a synthetic dataset with the goal of learning the inverse function over the set of positive-definite matrices and a real dataset consisting of fingerprint reconstruction. Comments: This work is well-written and well-organized, and it is easy to follow all of the concepts being presented.


Reviews: Semi-supervised Learning with GANs: Manifold Invariance with Improved Inference

Neural Information Processing Systems

The author(s) extend the idea of regularizing classifiers to be invariant to the tangent space of the learned manifold of the data to use GAN based architectures. This is a worthwhile idea to revisit as significant advances have been made in generative modeling in the intervening time since the last major paper in the area, the CAE was published. Crucial to the idea is the existence of an encoder learning an inverse mapping of the standard generator of GAN training. This is still an area of active research in the GAN literature that as of yet has no completely satisfactory approach. As current inference techniques for GANs are still quite poor, the authors propose two improvements to one technique, BiGAN, which are worthwhile contributions. 1) They adopt the feature matching loss proposed in "Improved techniques for training gans" and 2) they augment the BiGAN objective with another term that evaluates how the generator maps the inferred latent code for a given real example.


Reviews: Learning latent variable structured prediction models with Gaussian perturbations

Neural Information Processing Systems

UPDATE: Upon reading the author response, I have decided to leave my review unchanged. The topic of this work is of interest to the community, and it can make a decent publication. I am still concerned about novelty. While the author response argued that similar increments have been done in past works, I feel that there are some significant differences between the current case and the mentioned works. Accepting the paper is still a reasonable decision in my humble opinion.


Reviews: Self-supervised Learning of Motion Capture

Neural Information Processing Systems

In other words, providing the performance of the models pre-trained on synthetic data but fine-tuned on real-world datasets with different losses is necessary.


Reviews: A Retrieve-and-Edit Framework for Predicting Structured Outputs

Neural Information Processing Systems

This paper addresses the task of highly structured output prediction in the contexts of, here, source code generation. The authors propose a two-step process for this, consisting of a retrieval model that retrieves the closest seen training instance to the test instance, and an edit model, that edits the retrieved training instance. Quality: - The authors approach the relatively recently proposed task of generating Python source code. Unlike others, they work in a setting with less supervision, where they assume no access to the Abstract Syntax Tree. I like this idea and the proposed model a lot and can see how it would be useful beyond the immediate scope of the paper.