Directed Networks
Evaluating and Characterizing Incremental Learning from Non-Stationary Data
Cervantes, Alejandro, Gagné, Christian, Isasi, Pedro, Parizeau, Marc
Incremental learning from non-stationary data poses special challenges to the field of machine learning. Although new algorithms have been developed for this, assessment of results and comparison of behaviors are still open problems, mainly because evaluation metrics, adapted from more traditional tasks, can be ineffective in this context. Overall, there is a lack of common testing practices. This paper thus presents a testbed for incremental non-stationary learning algorithms, based on specially designed synthetic datasets. Also, test results are reported for some well-known algorithms to show that the proposed methodology is effective at characterizing their strengths and weaknesses. It is expected that this methodology will provide a common basis for evaluating future contributions in the field.
Incremental Sparse Bayesian Ordinal Regression
Ordinal Regression (OR) aims to model the ordering information between different data categories, which is a crucial topic in multi-label learning. An important class of approaches to OR models the problem as a linear combination of basis functions that map features to a high dimensional non-linear space. However, most of the basis function-based algorithms are time consuming. We propose an incremental sparse Bayesian approach to OR tasks and introduce an algorithm to sequentially learn the relevant basis functions in the ordinal scenario. Our method, called Incremental Sparse Bayesian Ordinal Regression (ISBOR), automatically optimizes the hyper-parameters via the type-II maximum likelihood method. By exploiting fast marginal likelihood optimization, ISBOR can avoid big matrix inverses, which is the main bottleneck in applying basis function-based algorithms to OR tasks on large-scale datasets. We show that ISBOR can make accurate predictions with parsimonious basis functions while offering automatic estimates of the prediction uncertainty. Extensive experiments on synthetic and real word datasets demonstrate the efficiency and effectiveness of ISBOR compared to other basis function-based OR approaches.
A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress
Arora, Saurabh, Doshi, Prashant
Inverse reinforcement learning is the problem of inferring the reward function of an observed agent, given its policy or behavior. Researchers perceive IRL both as a problem and as a class of methods. By categorically surveying the current literature in IRL, this article serves as a reference for researchers and practitioners in machine learning to understand the challenges of IRL and select the approaches best suited for the problem on hand. The survey formally introduces the IRL problem along with its central challenges which include accurate inference, generalizability, correctness of prior knowledge, and growth in solution complexity with problem size. The article elaborates how the current methods mitigate these challenges. We further discuss the extensions of traditional IRL methods: (i) inaccurate and incomplete perception, (ii) incomplete model, (iii) multiple rewards, and (iv) non-linear reward functions. This discussion concludes with some broad advances in the research area and currently open research questions.
Unsupervised Word Segmentation from Speech with Attention
Godard, Pierre, Zanon-Boito, Marcely, Ondel, Lucas, Berard, Alexandre, Yvon, François, Villavicencio, Aline, Besacier, Laurent
We present a first attempt to perform attentional word segmentation directly from the speech signal, with the final goal to automatically identify lexical units in a low-resource, unwritten language (UL). Our methodology assumes a pairing between recordings in the UL with translations in a well-resourced language. It uses Acoustic Unit Discovery (AUD) to convert speech into a sequence of pseudo-phones that is segmented using neural soft-alignments produced by a neural machine translation model. Evaluation uses an actual Bantu UL, Mboshi; comparisons to monolingual and bilingual baselines illustrate the potential of attentional word segmentation for language documentation.
Predicting Switching Graph Labelings with Cluster Specialists
Herbster, Mark, Robinson, James
We address the problem of predicting the labeling of a graph in an online setting when the labeling is changing over time. We provide three mistake-bounded algorithms based on three paradigmatic methods for online algorithm design. The algorithm with the strongest guarantee is a quasi-Bayesian classifier which requires $\mathcal{O}(t \log n)$ time to predict at trial $t$ on an $n$-vertex graph. The fastest algorithm (with the weakest guarantee) is based on a specialist [10] approach and surprisingly only requires $\mathcal{O}(\log n)$ time on any trial $t$. We also give an algorithm based on a kernelized Perceptron with an intermediate per-trial time complexity of $\mathcal{O}(n)$ and a mistake bound which is not strictly comparable. Finally, we provide experiments on simulated data comparing these methods.
Binary Classification in Unstructured Space With Hypergraph Case-Based Reasoning
Binary classification is one of the most common problem in machine learning. It consists in predicting whether a given element is of a particular class. In this paper, a new algorithm for binary classification is proposed using a hypergraph representation. Each element to be classified is partitioned according to its interactions with the training set. For each class, the total support is calculated as a convex combination of the {\it evidence} strength of the element of the partition. The evidence measure is pre-computed using the hypergraph induced by the training set and iteratively adjusted through a training phase. It does not require structured information, each case being represented by a set of {\it agnostic information} atoms. Empirical validation demonstrates its high potential on a wide range of well-known datasets and the results are compared to the state-of-art. The time complexity is given and empirically validated. Its capacity to provide good performances without hyperparameter tuning compared to standard classification methods is studied. Finally, the limitation of the model space is discussed and some potential solutions proposed.
Machine Learning Key Terms - myVertica
Machine Learning Key Terms Posted on Monday, June 4th, 2018 at 3:03 pm. Share this: This blog post was authored by Soniya Shah. Machine learning seems to be everywhere these days – in the online recommendations you get on Netflix, the self-driving cars that hyped in the media, and in serious cases, like fraud detection. Data is a huge part of machine learning, and so are the key terms. Unless you have a background in statistics or data science, it can be confusing to keep all the terminology straight. And even then, you might want to keep a list of terms handy.
How I Learned to Stop Worrying and Love Uncertainty
Since their early days, humans have had an important, often antagonistic relationship with uncertainty; we try to kill it everywhere we find it. Without an explanation for many natural phenomena, humans invented gods to explain them, and without certainty of the future, they consulted oracles. It was precisely the oracle's role to reduce uncertainty for their fellow humans, predicting their future and giving counsel according to their gods' will, and even though their accuracy left much to be desired, they were believed, for any measure of certainty is better than none. As society grew sophisticated, oracles were (not completely) displaced by empiric thought, which proved much more successful at prediction and counsel. Empiricism itself evolved into the collection of techniques we call the scientific method, which has proven to be much more effective at reducing uncertainty, and is modern society's most trustworthy way of producing predictions.
Minibatch Gibbs Sampling on Large Graphical Models
De Sa, Christopher, Chen, Vincent, Wong, Wing
Gibbs sampling is a Markov chain Monte Carlo method that is one of the most widespread techniques used with graphical models [7]. Gibbs sampling is an iterative method that repeatedly resamples a variable in the model from its conditional distribution, a process that is guaranteed to converge asymptotically to the desired distribution. Since these updates are typically simple and fast to run, Gibbs sampling can be applied to a variety of problems, and has been used for inference on large-scale graphical models in many systems [11, 13, 14, 19, 20, 21]. Unfortunately, for large graphical models with many factors, the computational cost of running an iteration of Gibbs sampling can become prohibitive. Even though Gibbs sampling is a graph-local algorithm, in the sense that each update only needs to reference data associated with a local neighborhood of the factor graph, as graphs become large and highly connected, even these local neighborhoods can become huge.
On the Relationship between Data Efficiency and Error for Uncertainty Sampling
Mussmann, Stephen, Liang, Percy
While active learning offers potential cost savings, the actual data efficiency---the reduction in amount of labeled data needed to obtain the same error rate---observed in practice is mixed. This paper poses a basic question: when is active learning actually helpful? We provide an answer for logistic regression with the popular active learning algorithm, uncertainty sampling. Empirically, on 21 datasets from OpenML, we find a strong inverse correlation between data efficiency and the error rate of the final classifier. Theoretically, we show that for a variant of uncertainty sampling, the asymptotic data efficiency is within a constant factor of the inverse error rate of the limiting classifier.