Goto

Collaborating Authors

 Country


A Massive Collection of Cross-Lingual Web-Document Pairs

arXiv.org Machine Learning

Cross-lingual document alignment aims to identify pairs of documents in two distinct languages that are of comparable content or translations of each other. Small-scale efforts have been made to collect aligned document level data on a limited set of language-pairs such as English-German or on limited comparable collections such as Wikipedia. In this paper, we mine twelve snapshots of the Common Crawl corpus and identify web document pairs that are translations of each other. We release a new web dataset consisting of 54 million URL pairs from Common Crawl covering documents in 92 languages paired with English. We evaluate the quality of the dataset by measuring the quality of machine translations from models that have been trained on mined parallel sentence pairs from this aligned corpora and introduce a simple yet effective baseline for identifying these aligned documents. The objective of this dataset and paper is to foster new research in cross-lingual NLP across a variety of low, mid, and high-resource languages.


Conductor Galloping Prediction on Imbalanced Datasets: SVM with Smart Sampling

arXiv.org Machine Learning

Conductor galloping is the high-amplitude, low-frequency oscillation of overhead power lines due to wind. Such movements may lead to severe damages to transmission lines, and hence pose significant risks to the power system operation. In this paper, we target to design a prediction framework for conductor galloping. The difficulty comes from imbalanced dataset as galloping happens rarely. By examining the impacts of data balance and data volume on the prediction performance, we propose to employ proper sample adjustment methods to achieve better performance. Numerical study suggests that using only three features, together with over sampling, the SVM based prediction framework achieves an F_1-score of 98.9%.


Modelling Bahdanau Attention using Election methods aided by Q-Learning

arXiv.org Machine Learning

Neural Machine Translation has lately gained a lot of "attention" with the advent of more and more sophisticated but drastically improved models. Attention mechanism has proved to be a boon in this direction by providing weights to the input words, making it easy for the decoder to identify words representing the present context. But by and by, as newer attention models with more complexity came into development, they involved large computation, making inference slow. In this paper, we have modelled the attention network using techniques resonating with social choice theory. Along with that, the attention mechanism, being a Markov Decision Process, has been represented by reinforcement learning techniques. Thus, we propose to use an election method ( k -Borda), fine-tuned using Q-learning, as a replacement for attention networks. The inference time for this network is less than a standard Bahdanau translator, and the results of the translation are comparable. This not only experimentally verifies the claims stated above but also helped provide a faster inference.


In Vitro Fertilization (IVF) Cumulative Pregnancy Rate Prediction from Basic Patient Characteristics

arXiv.org Machine Learning

Tens of millions of women suffer from infertility worldwide each year. In vitro fertilization (IVF) is the best choice for many such patients. However, IVF is expensive, time-consuming, and both physically and emotionally demanding. The first question that a patient usually asks before the IVF is how likely she will conceive, given her basic medical examination information. This paper proposes three approaches to predict the cumulative pregnancy rate after multiple oocyte pickup cycles. Experiments on 11,190 patients showed that first clustering the patients into different groups and then building a support vector machine model for each group can achieve the best overall performance. Our model could be a quick and economic approach for reliably estimating the cumulative pregnancy rate for a patient, given only her basic medical examination information, well before starting the actual IVF procedure. The predictions can help the patient make optimal decisions on whether to use her own oocyte or donor oocyte, how many oocyte pickup cycles she may need, whether to use embryo frozen, etc. They will also reduce the patient's cost and time to pregnancy, and improve her quality of life.


Manifold Denoising by Nonlinear Robust Principal Component Analysis

arXiv.org Machine Learning

This paper extends robust principal component analysis (RPCA) to nonlinear manifolds. Suppose that the observed data matrix is the sum of a sparse component and a component drawn from some low dimensional manifold. Is it possible to separate them by using similar ideas as RPCA? Is there any benefit in treating the manifold as a whole as opposed to treating each local region independently? We answer these two questions affirmatively by proposing and analyzing an optimization framework that separates the sparse component from the manifold under noisy data. Theoretical error bounds are provided when the tangent spaces of the manifold satisfy certain incoherence conditions. We also provide a near optimal choice of the tuning parameters for the proposed optimization formulation with the help of a new curvature estimation method. The efficacy of our method is demonstrated on both synthetic and real datasets.


Online Optimization with Predictions and Non-convex Losses

arXiv.org Machine Learning

We study online optimization in a setting where an online learner seeks to optimize a per-round hitting cost, which may be non-convex, while incurring a movement cost when changing actions between rounds. We ask: \textit{under what general conditions is it possible for an online learner to leverage predictions of future cost functions in order to achieve near-optimal costs?} Prior work has provided near-optimal online algorithms for specific combinations of assumptions about hitting and switching costs, but no general results are known. In this work, we give two general sufficient conditions that specify a relationship between the hitting and movement costs which guarantees that a new algorithm, Synchronized Fixed Horizon Control (SFHC), provides a $1+O(1/w)$ competitive ratio, where $w$ is the number of predictions available to the learner. Our conditions do not require the cost functions to be convex, and we also derive competitive ratio results for non-convex hitting and movement costs. Our results provide the first constant, dimension-free competitive ratio for online non-convex optimization with movement costs. Further, we give an example of a natural instance, Convex Body Chasing (CBC), where the sufficient conditions are not satisfied and we can prove that no online algorithm can have a competitive ratio that converges to 1.


Meta Label Correction for Learning with Weak Supervision

arXiv.org Machine Learning

Leveraging weak or noisy supervision for building effective machine learning models has long been an important research problem. The growing need for large-scale datasets to train deep learning models has increased its importance. Weak or noisy supervision could originate from multiple sources including non-expert annotators or automatic labeling based on heuristics or user interaction signals. Previous work on modeling and correcting weak labels have been focused on various aspects, including loss correction, training instance re-weighting, etc. In this paper, we approach this problem from a novel perspective based on meta-learning. We view the label correction procedure as a meta-process and propose a new meta-learning based framework termed MLC for learning with weak supervision. Experiments with different label noise levels on multiple datasets show that MLC can achieve large improvement over previous methods incorporating weak labels for learning.


ISLET: Fast and Optimal Low-rank Tensor Regression via Importance Sketching

arXiv.org Machine Learning

In this paper, we develop a novel procedure for low-rank tensor regression, namely \underline{I}mportance \underline{S}ketching \underline{L}ow-rank \underline{E}stimation for \underline{T}ensors (ISLET). The central idea behind ISLET is \emph{importance sketching}, i.e., carefully designed sketches based on both the responses and low-dimensional structure of the parameter of interest. We show that the proposed method is sharply minimax optimal in terms of the mean-squared error under low-rank Tucker assumptions and under randomized Gaussian ensemble design. In addition, if a tensor is low-rank with group sparsity, our procedure also achieves minimax optimality. Further, we show through numerical studies that ISLET achieves comparable or better mean-squared error performance to existing state-of-the-art methods whilst having substantial storage and run-time advantages including capabilities for parallel and distributed computing. In particular, our procedure performs reliable estimation with tensors of dimension $p = O(10^8)$ and is $1$ or $2$ orders of magnitude faster than baseline methods.


XceptionTime: A Novel Deep Architecture based on Depthwise Separable Convolutions for Hand Gesture Classification

arXiv.org Machine Learning

Capitalizing on the need for addressing the existing challenges associated with gesture recognition via sparse multichannel surface Electromyography (sEMG) signals, the paper proposes a novel deep learning model, referred to as the XceptionTime architecture. The proposed innovative XceptionTime is designed by integration of depthwise separable convolutions, adaptive average pooling, and a novel non-linear normalization technique. At the heart of the proposed architecture is several XceptionTime modules concatenated in series fashion designed to capture both temporal and spatial information-bearing contents of the sparse multichannel sEMG signals without the need for data augmentation and/or manual design of feature extraction. In addition, through integration of adaptive average pooling, Conv1D, and the non-linear normalization approach, XceptionTime is less prone to overfitting, more robust to temporal translation of the input, and more importantly is independent from the input window size. Finally, by utilizing the depthwise separable convolutions, the XceptionTime network has far fewer parameters resulting in a less complex network. The performance of XceptionTime is tested on a sub Ninapro dataset, DB1, and the results showed a superior performance in comparison to any existing counterparts. In this regard, 5:71% accuracy improvement, on a window size 200ms, is reported in this paper, for the first time.


Adaptive versus Standard Descent Methods and Robustness Against Adversarial Examples

arXiv.org Machine Learning

Since this phenomenon was first observed, researchers have attempted to develop methods which produce models that are robust to adversarial perturbations under specific attack models (Wong and Kolter (2018); Sinha et al. (2018); Raghunathan et al. (2018); Mirman et al. (2018); Madry et al. (2018); Zhang et al. (2019)). As machine learning proliferates into society, including security-critical settings like health care (Esteva et al. (2017)) or autonomous vehicles (Codevilla et al. (2018)), it is crucial to develop methods that allow us to understand the vulnerability of our models and design appropriate countermeasures. Additionally there is a growing literature on the theory of adversarial examples. Many of these results attempt to understand adversarial examples by constructing examples of learning problems for which it is difficult to construct a classifier that is robust to adversarial perturbations. This difficultly may arise due to sample complexity (Schmidt et al. (2018)), computational constraints (Bubeck et al. (2019); Degwekar et al. (2019)), or the high-dimensional geometry of the initial feature space (Shafahi et al. (2019); Khoury and Hadfield-Menell (2018)).