Edmonton
A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning
Patterson, Andrew, White, Adam, Ghiassian, Sina, White, Martha
Many reinforcement learning algorithms rely on value estimation. However, the most widely used algorithms -- namely temporal difference algorithms -- can diverge under both off-policy sampling and nonlinear function approximation. Many algorithms have been developed for off-policy value estimation which are sound under linear function approximation, based on the linear mean-squared projected Bellman error (PBE). Extending these methods to the non-linear case has been largely unsuccessful. Recently, several methods have been introduced that approximate a different objective, called the mean-squared Bellman error (BE), which naturally facilities nonlinear approximation. In this work, we build on these insights and introduce a new generalized PBE, that extends the linear PBE to the nonlinear setting. We show how this generalized objective unifies previous work, including previous theory, and obtain new bounds for the value error of the solutions of the generalized objective. We derive an easy-to-use, but sound, algorithm to minimize the generalized objective which is more stable across runs, is less sensitive to hyperparameters, and performs favorably across four control domains with neural network function approximation.
Phenotyping OSA: a time series analysis using fuzzy clustering and persistent homology
Loliencar, Prachi, Heo, Giseon
Sleep apnea is a disorder that has serious consequences for the pediatric population. There has been recent concern that traditional diagnosis of the disorder using the apnea-hypopnea index may be ineffective in capturing its multi-faceted outcomes. In this work, we take a first step in addressing this issue by phenotyping patients using a clustering analysis of airflow time series. This is approached in three ways: using feature-based fuzzy clustering in the time and frequency domains, and using persistent homology to study the signal from a topological perspective. The fuzzy clusters are analyzed in a novel manner using a Dirichlet regression analysis, while the topological approach leverages Takens embedding theorem to study the periodicity properties of the signals.
From Weakly Supervised Learning to Biquality Learning: an Introduction
Nodet, Pierre, Lemaire, Vincent, Bondu, Alexis, Cornuéjols, Antoine, Ouorou, Adam
The field of Weakly Supervised Learning (WSL) has recently seen a surge of popularity, with numerous papers addressing different types of "supervision deficiencies". In WSL use cases, a variety of situations exists where the collected "information" is imperfect. The paradigm of WSL attempts to list and cover these problems with associated solutions. In this paper, we review the research progress on WSL with the aim to make it as a brief introduction to this field. We present the three axis of WSL cube and an overview of most of all the elements of their facets. We propose three measurable quantities that acts as coordinates in the previously defined cube namely: Quality, Adaptability and Quantity of information. Thus we suggest that Biquality Learning framework can be defined as a plan of the WSL cube and propose to re-discover previously unrelated patches in WSL literature as a unified Biquality Learning literature.
Planning with Expectation Models for Control
Kudashkina, Katya, Wan, Yi, Naik, Abhishek, Sutton, Richard S.
In model-based reinforcement learning (MBRL), Wan et al. (2019) showed conditions under which the environment model could produce the expectation of the next feature vector rather than the full distribution, or a sample thereof, with no loss in planning performance. Such expectation models are of interest when the environment is stochastic and non-stationary, and the model is approximate, such as when it is learned using function approximation. In these cases a full distribution model may be impractical and a sample model may be either more expensive computationally or of high variance. Wan et al. considered only planning for prediction to evaluate a fixed policy. In this paper, we treat the control case - planning to improve and find a good approximate policy. We prove that planning with an expectation model must update a state-value function, not an action-value function as previously suggested (e.g., Sorg & Singh, 2010). This opens the question of how planning influences action selections. We consider three strategies for this and present general MBRL algorithms for each. We identify the strengths and weaknesses of these algorithms in computational experiments. Our algorithms and experiments are the first to treat MBRL with expectation models in a general setting.
Survey on reinforcement learning for language processing
Uc-Cetina, Victor, Navarro-Guerrero, Nicolas, Martin-Gonzalez, Anabel, Weber, Cornelius, Wermter, Stefan
Machine learning algorithms have been very successful to solve problems in the natural language processing (NLP) domain for many years, especially supervised and unsupervised methods. However, this is not the case with reinforcement learning (RL), which is somewhat surprising since in other domains, reinforcement learning methods have experienced an increased level of success with some impressive results, for instance in board games such as AlphaGo Zero [106]. Yet, deep reinforcement learning for natural language processing is still in its infancy when compared to supervised learning [65]. Thus, the goal of this article is to provide a review of applications of reinforcement learning to NLP and we present an analysis of the underlying structure of the problems that make them viable to be treated entirely or partially as RL problems intended as an aid to newcomers to the field. We also analyze some existing research gaps and provide a list of promising research directions in which natural language systems might benefit from reinforcement learning algorithms.
ALT-MAS: A Data-Efficient Framework for Active Testing of Machine Learning Algorithms
Ha, Huong, Gupta, Sunil, Rana, Santu, Venkatesh, Svetha
This is clearly demonstrated by the performance of BALD. To be specific, the BNNs trained with BALD have accuracies ranging from 70 90%, but for the models-under-test M-FashionMNIST and M-MNIST-ES (average & bad models), the metric estimation accuracies range from 90 100% - which are much higher than the BNNs' accuracies. For our proposed method ALT-MAS, with the models-under-test M-FashionMNIST, M-MNIST-ES, the behaviours are similar to those of BALD. That is, the metric estimation accuracies are always higher than the BNNs accuracies, especially for per-class metrics. It is worth noting that, for the per-class metrics, even though the BNNs accuracies by ALT-MAS are much lower than the BNNs by BALD, but the metric estimations by ALT-MAS are much higher than by BALD. This asserts the motivation of our sampling approach, that is, the BNN only needs to accurately predict the data points that contribute to the metric estimation. On the other hand, with the good model-under-test M-MNIST, due to our data augmentation training strategy, the BNN accuracies by ALT-MAS are much higher than those of BALD, and thus, the metric estimations by ALT-MAS are also more accurate than those by BALD. Figure 2: The accuracy of the BNN, for each combination of model-under-test (M-MNIST, M-FashionMNIST, & M-MNIST-ES) and metric set. Plotting mean and standard error over 3 repetitions (Best seen in color).
Deep Indexed Active Learning for Matching Heterogeneous Entity Representations
Jain, Arjit, Sarawagi, Sunita, Sen, Prithviraj
Given two large lists of records, the task in entity resolution (ER) is to find the pairs from the Cartesian product of the lists that correspond to the same real world entity. Typically, passive learning methods on tasks like ER require large amounts of labeled data to yield useful models. Active Learning is a promising approach for ER in low resource settings. However, the search space, to find informative samples for the user to label, grows quadratically for instance-pair tasks making active learning hard to scale. Previous works, in this setting, rely on hand-crafted predicates, pre-trained language model embeddings, or rule learning to prune away unlikely pairs from the Cartesian product. This blocking step can miss out on important regions in the product space leading to low recall. We propose DIAL, a scalable active learning approach that jointly learns embeddings to maximize recall for blocking and accuracy for matching blocked pairs. DIAL uses an Index-By-Committee framework, where each committee member learns representations based on powerful transformer models. We highlight surprising differences between the matcher and the blocker in the creation of the training data and the objective used to train their parameters. Experiments on five benchmark datasets and a multilingual record matching dataset show the effectiveness of our approach in terms of precision, recall and running time. Code is available at https://github.com/ArjitJ/DIAL
Extended Parallel Corpus for Amharic-English Machine Translation
Gezmu, Andargachew Mekonnen, Nürnberger, Andreas, Bati, Tesfaye Bayu
This paper describes the acquisition, preprocessing, segmentation, and alignment of an Amharic-English parallel corpus. It will be useful for machine translation of an under-resourced language, Amharic. The corpus is larger than previously compiled corpora; it is released for research purposes. We trained neural machine translation and phrase-based statistical machine translation models using the corpus. In the automatic evaluation, neural machine translation models outperform phrase-based statistical machine translation models.
Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning
Furuta, Hiroki, Matsushima, Tatsuya, Kozuno, Tadashi, Matsuo, Yutaka, Levine, Sergey, Nachum, Ofir, Gu, Shixiang Shane
While in the past much of the empirical RL However, analyzing the nature of research has focused on tabular or linear function approximation those environments is often overlooked. In particular, case (Dietterich, 1998; McGovern & Barto, 2001; we still do not have agreeable ways to Konidaris & Barto, 2009), the impressive successes of recent measure the difficulty or solvability of a task, years (and anticipation of domains ripe for subsequent given that each has fundamentally different actions, successes) has spurred the creation of non-tabular benchmarks observations, dynamics, rewards, and can - i.e., continuous control and/or continuous observation be tackled with diverse RL algorithms. In this - in which neural network function approximators are work, we propose policy information capacity effectively a prerequisite (Bellemare et al., 2013; Brockman (PIC) - the mutual information between policy parameters et al., 2016; Tassa et al., 2018). Accordingly, empirical RL and episodic return - and policy-optimal research is presently heavily focused on the use of neural information capacity (POIC) - between policy network function approximators, spurring new algorithmic parameters and episodic optimality - as two developments in both model-free (Mnih et al., 2015; Schulman environment-agnostic, algorithm-agnostic quantitative et al., 2015; Lillicrap et al., 2016; Gu et al., 2016b; metrics for task difficulty. Evaluating our 2017; Haarnoja et al., 2018) and model-based (Chua et al., metrics across toy environments as well as continuous 2018; Janner et al., 2019; Hafner et al., 2020a) RL. control benchmark tasks from OpenAI Gym and DeepMind Control Suite, we empirically Despite the impressive progress of RL algorithms, the analysis demonstrate that these information-theoretic of the RL environments has been difficult and stagnant, metrics have higher correlations with normalized precisely due to the complexity of modern benchmarks and task solvability scores than a variety of alternatives.
Dependency Graph-to-String Statistical Machine Translation
Li, Liangyou, Way, Andy, Liu, Qun
We present graph-based translation models which translate source graphs into target strings. Source graphs are constructed from dependency trees with extra links so that non-syntactic phrases are connected. Inspired by phrase-based models, we first introduce a translation model which segments a graph into a sequence of disjoint subgraphs and generates a translation by combining subgraph translations left-to-right using beam search. However, similar to phrase-based models, this model is weak at phrase reordering. Therefore, we further introduce a model based on a synchronous node replacement grammar which learns recursive translation rules. We provide two implementations of the model with different restrictions so that source graphs can be parsed efficiently. Experiments on Chinese--English and German--English show that our graph-based models are significantly better than corresponding sequence- and tree-based baselines.