Education
Metric-Based Auto-Instructor for Learning Mixed Data Representation
Jian, Songlei (National University of Defense Technology, University of Technology Sydney) | Hu, Liang (University of Technology Sydney) | Cao, Longbing (University of Technology Sydney) | Lu, Kai (National University of Defense Technology)
Mixed data with both categorical and continuous features are ubiquitous in real-world applications. Learning a good representation of mixed data is critical yet challenging for further learning tasks. Existing methods for representing mixed data often overlook the heterogeneous coupling relationships between categorical and continuous features as well as the discrimination between objects. To address these issues, we propose an auto-instructive representation learning scheme to enable margin-enhanced distance metric learning for a discrimination-enhanced representation. Accordingly, we design a metric-based auto-instructor (MAI) model which consists of two collaborative instructors. Each instructor captures the feature-level couplings in mixed data with fully connected networks, and guides the infinite-margin metric learning for the peer instructor with a contrastive order. By feeding the learned representation into both partition-based and density-based clustering methods, our experiments on eight UCI datasets show highly significant learning performance improvement and much more distinguishable visualization outcomes over the baseline methods.
Selective Experience Replay for Lifelong Learning
Isele, David (University of Pennsylvania, Honda Research Institute ) | Cosgun, Akansel (Honda Research Institute)
Deep reinforcement learning has emerged as a powerful tool for a variety of learning tasks, however deep nets typically exhibit forgetting when learning multiple tasks in sequence. To mitigate forgetting, we propose an experience replay process that augments the standard FIFO buffer and selectively stores experiences in a long-term memory. We explore four strategies for selecting which experiences will be stored: favoring surprise, favoring reward, matching the global training distribution, and maximizing coverage of the state space. We show that distribution matching successfully prevents catastrophic forgetting, and is consistently the best approach on all domains tested. While distribution matching has better and more consistent performance, we identify one case in which coverage maximization is beneficial---when tasks that receive less trained are more important. Overall, our results show that selective experience replay, when suitable selection algorithms are employed, can prevent catastrophic forgetting.
Deep Q-learning From Demonstrations
Hester, Todd (Google DeepMind) | Vecerik, Matej (Google DeepMind) | Pietquin, Olivier (Google DeepMind ) | Lanctot, Marc (Google DeepMind) | Schaul, Tom (Google DeepMind) | Piot, Bilal (Google DeepMind ) | Horgan, Dan (Google DeepMind) | Quan, John (Google DeepMind) | Sendonaris, Andrew (Google DeepMind ) | Osband, Ian (Google DeepMind) | Dulac-Arnold, Gabriel (Google DeepMind) | Agapiou, John (Google DeepMind) | Leibo, Joel Z. (Google DeepMind) | Gruslys, Audrunas (Google DeepMind )
Deep reinforcement learning (RL) has achieved several high profile successes in difficult decision-making problems. However, these algorithms typically require a huge amount of data before they reach reasonable performance. In fact, their performance during learning can be extremely poor. This may be acceptable for a simulator, but it severely limits the applicability of deep RL to many real-world tasks, where the agent must learn in the real environment. In this paper we study a setting where the agent may access data from previous control of the system. We present an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstration data and is able to automatically assess the necessary ratio of demonstration data while learning thanks to a prioritized replay mechanism. DQfD works by combining temporal difference updates with supervised classification of the demonstratorโs actions. We show that DQfD has better initial performance than Prioritized Dueling Double Deep Q-Networks (PDD DQN) as it starts with better scores on the first million steps on 41 of 42 games and on average it takes PDD DQN 83 million steps to catch up to DQfDโs performance. DQfD learns to out-perform the best demonstration given in 14 of 42 games. In addition, DQfD leverages human demonstrations to achieve state-of-the-art results for 11 games. Finally, we show that DQfD performs better than three related algorithms for incorporating demonstration data into DQN.
Human Guided Linear Regression With Feature-Level Constraints
Gress, Aubrey (University of California, Davis) | Davidson, Ian (University of California, Davis)
Linear regression methods are commonly used by both researchers and data scientists due to their interpretability and their reduced likelihood of overfitting. However, these methods can still perform poorly if little labeled training data is available. Typical methods used to overcome a lack of labeled training data somehow involve exploiting an outside source of labeled data or large amounts of unlabeled data. This includes areas such as active learning, semi-supervised learning and transfer learning, but in many domains these approaches are not always applicable because they require either a mechanism to label data, large amounts of unlabeled data or additional sources of sufficiently related data. In this paper we explore an alternative, non-data centric approach. We allow the user to guide the learning system through three forms of feature-level guidance which constrain the parameters of the regression function. Such guidance is unlikely to be perfectly accurate, so we derive methods which are robust to some amounts of noise, a property we formally prove for one of our methods.
Decomposition Strategies for Constructive Preference Elicitation
Dragone, Paolo (University of Trento) | Teso, Stefano (KU Leuven) | Kumar, Mohit (KU Leuven) | Passerini, Andrea (University of Trento)
We tackle the problem of constructive preference elicitation, that is the problem of learning user preferences over verylarge decision problems, involving a combinatorial space of possible outcomes. In this setting, the suggested configuration is synthesized on-the-fly by solving a constrained optimization problem, while the preferences are learned iteratively by interacting with the user. Previous work has shown that Coactive Learning is a suitable method for learning userpreferences in constructive scenarios. In Coactive Learning the user provides feedback to the algorithm in the form of an improvement to a suggested configuration. When the problem involves many decision variables and constraints, this type of interaction poses a significant cognitive burden on the user. We propose a decomposition technique for large preference-based decision problems relying exclusively on inference and feedback over partial configurations. This has the clear advantage of drastically reducing the user cognitive load. Additionally, part-wise inference can be (up to exponentially) less computationally demanding than inference over full configurations. We discuss the theoretical implications of working with parts and present promising empirical results on one synthetic and two realistic constructive problems.
DarkRank: Accelerating Deep Metric Learning via Cross Sample Similarities Transfer
Chen, Yuntao (Institute of Automation, Chinese Academy of Sciences, University of Chinese Academy of Sciences) | Wang, Naiyan (TuSimple) | Zhang, Zhaoxiang (Institute of Automation, Chinese Academy of Sciences, University of Chinese Academy of Sciences)
We have witnessed rapid evolution of deep neural network architecture design in the past years. These latest progresses greatly facilitate the developments in various areas such as computer vision and natural language processing. However, along with the extraordinary performance, these state-of-the-art models also bring in expensive computational cost. Directly deploying these models into applications with real-time requirement is still infeasible. Recently, Hinton et al. have shown that the dark knowledge within a powerful teacher model can significantly help the training of a smaller and faster student network. These knowledge are vastly beneficial to improve the generalization ability of the student model. Inspired by their work, we introduce a new type of knowledge---cross sample similarities for model compression and acceleration. This knowledge can be naturally derived from deep metric learning model. To transfer them, we bring the "learning to rank" technique into deep metric learning formulation. We test our proposed DarkRank method on various metric learning tasks including pedestrian re-identification, image retrieval and image clustering. The results are quite encouraging. Our method can improve over the baseline method by a large margin. Moreover, it is fully compatible with other existing methods. When combined, the performance can be further boosted.
Online Learning for Structured Loss Spaces
Barman, Siddharth (Indian Institute of Science (IISc), Bangalore ) | Gopalan, Aditya (Indian Institute of Science (IISc), Bangalore ) | Saha, Aadirupa (Indian Institute of Science (IISc), Bangalore)
We consider prediction with expert advice when the loss vectors are assumed to lie in a set described by the sum of atomic norm balls. We derive a regret bound for a general version of the online mirror descent (OMD) algorithm that uses a combination of regularizers, each adapted to the constituent atomic norms. The general result recovers standard OMD regret bounds, and yields regret bounds for new structured settings where the loss vectors are (i) noisy versions of vectors from a low-dimensional subspace, (ii) sparse vectors corrupted with noise, and (iii) sparse perturbations of low-rank vectors. For the problem of online learning with structured losses, we also show lower bounds on regret in terms of rank and sparsity of the loss vectors, which implies lower bounds for the above additive loss settings as well.
Exercise-Enhanced Sequential Modeling for Student Performance Prediction
Su, Yu (Anhui University) | Liu, Qingwen (iFLYTEKย CO.,LTD. ) | Liu, Qi (iFLYTEK CO.,LTD.) | Huang, Zhenya (University of Science and Technology of China ) | Yin, Yu ( University of Science and Technology of China ) | Chen, Enhong ( University of Science and Technology of China ) | Ding, Chris ( University of Science and Technology of China ) | Wei, Si ( University of Science and Technology of China ) | Hu, Guoping (University of Texas at Arlington)
In online education systems, for offering proactive services to students (e.g., personalized exercise recommendation), a crucial demand is to predict student performance (e.g., scores) on future exercising activities. Existing prediction methods mainly exploit the historical exercising records of students, where each exercise is usually represented as the manually labeled knowledge concepts, and the richer information contained in the text description of exercises is still underexplored. In this paper, we propose a novel Exercise-Enhanced Recurrent Neural Network (EERNN) framework for student performance prediction by taking full advantage of both student exercising records and the text of each exercise. Specifically, for modeling the student exercising process, we first design a bidirectional LSTM to learn each exercise representation from its text description without any expertise and information loss. Then, we propose a new LSTM architecture to trace student states (i.e., knowledge states) in their sequential exercising process with the combination of exercise representations. For making final predictions, we design two strategies under EERNN, i.e., EERNNM with Markov property and EERNNA with Attention mechanism. Extensive experiments on large-scale real-world data clearly demonstrate the effectiveness of EERNN framework. Moreover, by incorporating the exercise correlations, EERNN can well deal with the cold start problems from both student and exercise perspectives.
Compatibility Family Learning for Item Recommendation and Generation
Shih, Yong-Siang (Appier Inc.) | Chang, Kai-Yueh (Appier Inc.) | Lin, Hsuan-Tien (Appier Inc.) | Sun, Min ( National Tsing Hua University )
Compatibility between items, such as clothes and shoes, is a major factor among customer's purchasing decisions. However, learning "compatibility" is challenging due to (1) broader notions of compatibility than those of similarity, (2) the asymmetric nature of compatibility, and (3) only a small set of compatible and incompatible items are observed. We propose an end-to-end trainable system to embed each item into a latent vector and project a query item into K compatible prototypes in the same space. These prototypes reflect the broad notions of compatibility. We refer to both the embedding and prototypes as "Compatibility Family." In our learned space, we introduce a novel Projected Compatibility Distance (PCD) function which is differentiable and ensures diversity by aiming for at least one prototype to be close to a compatible item, whereas none of the prototypes are close to an incompatible item. We evaluate our system on a toy dataset, two Amazon product datasets, and Polyvore outfit dataset. Our method consistently achieves state-of-the-art performance. Finally, we show that we can visualize the candidate compatible prototypes using a Metric-regularized Conditional Generative Adversarial Network (MrCGAN), where the input is a projected prototype and the output is a generated image of a compatible item. We ask human evaluators to judge the relative compatibility between our generated images and images generated by CGANs conditioned directly on query items. Our generated images are significantly preferred, with roughly twice the number of votes as others.
Video-Based Sign Language Recognition Without Temporal Segmentation
Huang, Jie (University of Science and Technology of China) | Zhou, Wengang ( University of Science and Technology of China ) | Zhang, Qilin (HERE Technologies, Chicago, Illinois) | Li, Houqiang ( University of Science and Technology of China ) | Li, Weiping ( University of Science and Technology of China )
Millions of hearing impaired people around the world routinely use some variants of sign languages to communicate, thus the automatic translation of a sign language is meaningful and important. Currently, there are two sub-problems in Sign Language Recognition (SLR), i.e., isolated SLR that recognizes word by word and continuous SLR that translates entire sentences. Existing continuous SLR methods typically utilize isolated SLRs as building blocks, with an extra layer of preprocessing (temporal segmentation) and another layer of post-processing (sentence synthesis). Unfortunately, temporal segmentation itself is non-trivial and inevitably propagates errors into subsequent steps. Worse still, isolated SLR methods typically require strenuous labeling of each word separately in a sentence, severely limiting the amount of attainable training data. To address these challenges, we propose a novel continuous sign recognition framework, the Hierarchical Attention Network with Latent Space (LS-HAN), which eliminates the preprocessing of temporal segmentation. The proposed LS-HAN consists of three components: a two-stream Convolutional Neural Network (CNN) for video feature representation generation, a Latent Space (LS) for semantic gap bridging, and a Hierarchical Attention Network (HAN) for latent space based recognition. Experiments are carried out on two large scale datasets. Experimental results demonstrate the effectiveness of the proposed framework.