Supervised Learning
Structural query-by-committee
Tosh, Christopher, Dasgupta, Sanjoy
We introduce interactive structure learning, an abstract problem that encompasses many interactive learning tasks that have traditionally been studied in isolation, including active learning of binary classifiers, interactive clustering, interactive embedding, and active learning of structured output predictors. These problems include variants of both supervised and unsupervised tasks, and allow many different types of feedback, from binary labels to must-link/cannot-link constraints to similarity assessments to structured outputs. Despite these surface differences, they conform to a common template that allows them to be fruitfully unified. In interactive structure learning, there is a space of items X --for instance, an input space on which a classifier is to be learned, or points to cluster, or points to embed in a metric space--and the goal is to learn a structure on X, chosen from a family G. This set G could consist, for example, of all linear classifiers on X, or all hierarchical clusterings of X, or all knowledge graphs on X.
adambielski/siamese-triplet
Siamese and triplet networks are useful to learn mappings from image to a compact Euclidean space where distances correspond to a measure of similarity [2]. Embeddings trained in such way can be used as features vectors for classification or few-shot learning tasks. Experiments were run in jupyter notebook. We'll go through learning supervised feature embeddings using different loss functions on MNIST dataset. This is just for visualization purposes, thus we'll be using 2-dimensional embeddings which isn't the best choice in practice.
Adversarial Extreme Multi-label Classification
Babbar, Rohit, Schรถlkopf, Bernhard
The goal in extreme multi-label classification is to learn a classifier which can assign a small subset of relevant labels to an instance from an extremely large set of target labels. Datasets in extreme classification exhibit a long tail of labels which have small number of positive training instances. In this work, we pose the learning task in extreme classification with large number of tail-labels as learning in the presence of adversarial perturbations. This view motivates a robust optimization framework and equivalence to a corresponding regularized objective. Under the proposed robustness framework, we demonstrate efficacy of Hamming loss for tail-label detection in extreme classification. The equivalent regularized objective, in combination with proximal gradient based optimization, performs better than state-of-the-art methods on propensity scored versions of precision@k and nDCG@k(upto 20% relative improvement over PFastreXML - a leading tree-based approach and 60% relative improvement over SLEEC - a leading label-embedding approach). Furthermore, we also highlight the sub-optimality of a sparse solver in a widely used package for large-scale linear classification, which is interesting in its own right. We also investigate the spectral properties of label graphs for providing novel insights towards understanding the conditions governing the performance of Hamming loss based one-vs-rest scheme vis-\`a-vis label embedding methods.
Revisiting the Vector Space Model: Sparse Weighted Nearest-Neighbor Method for Extreme Multi-Label Classification
Aoshima, Tatsuhiro, Kobayashi, Kei, Minami, Mihoko
Machine learning has played an important role in information retrieval (IR) in recent times. In search engines, for example, query keywords are accepted and documents are returned in order of relevance to the given query; this can be cast as a multi-label ranking problem in machine learning. Generally, the number of candidate documents is extremely large (from several thousand to several million); thus, the classifier must handle many labels. This problem is referred to as extreme multi-label classification (XMLC). In this paper, we propose a novel approach to XMLC termed the Sparse Weighted Nearest-Neighbor Method. This technique can be derived as a fast implementation of state-of-the-art (SOTA) one-versus-rest linear classifiers for very sparse datasets. In addition, we show that the classifier can be written as a sparse generalization of a representer theorem with a linear kernel. Furthermore, our method can be viewed as the vector space model used in IR. Finally, we show that the Sparse Weighted Nearest-Neighbor Method can process data points in real time on XMLC datasets with equivalent performance to SOTA models, with a single thread and smaller storage footprint. In particular, our method exhibits superior performance to the SOTA models on a dataset with 3 million labels.
Investigating Inner Properties of Multimodal Representation and Semantic Compositionality With Brain-Based Componential Semantics
Wang, Shaonan (Institute of Automation, Chinese Academy of Sciences) | Zhang, Jiajun (Institute of Automation, Chinese Academy of Sciences) | Lin, Nan (Institute of Psychology, Chinese Academy of Sciences) | Zong, Chengqing (Institute of Automation, Chinese Academy of Sciences)
Multimodal models have been proven to outperform text-based approaches on learning semantic representations. However, it still remains unclear what properties are encoded in multimodal representations, in what aspects do they outperform the single-modality representations, and what happened in the process of semantic compositionality in different input modalities. Considering that multimodal models are originally motivated by human concept representations, we assume that correlating multimodal representations with brain-based semantics would interpret their inner properties to answer the above questions. To that end, we propose simple interpretation methods based on brain-based componential semantics. First we investigate the inner properties of multimodal representations by correlating them with corresponding brain-based property vectors. Then we map the distributed vector space to the interpretable brain-based componential space to explore the inner properties of semantic compositionality. Ultimately, the present paper sheds light on the fundamental questions of natural language understanding, such as how to represent the meaning of words and how to combine word meanings into larger units.
Diagnosing University Student Subject Proficiency and Predicting Degree Completion in Vector Space
Luo, Yuetian (UW-Madison) | Pardos, Zachary A. (UC Berkeley)
We investigate the issues of undergraduate on-time graduation with respect to subject proficiencies through the lens of representation learning, training a student vector embeddings from a dataset of 8 years of course enrollments. We compare the per-semester student representations of a cohort of undergraduate Integrative Biology majors to those of graduated students in subject areas involved in their degree requirements. The result is an embedding rich in information about the relationships between majors and pathways taken by students which encoded enough information to improve prediction accuracy of on-time graduation to 95%, up from a baseline of 87.3%. Challenges to preparation of the data for student vectorization and sourcing of validation sets for optimization are discussed.
Unsupervised Selection of Negative Examples for Grounded Language Learning
Pillai, Nisha (University of Maryland, Baltimore County) | Matuszek, Cynthia (University of Maryland, Baltimore County)
There has been substantial work in recent years on grounded language acquisition, in which language and sensor data are used to create a model relating linguistic constructs to the perceivable world. While powerful, this approach is frequently hindered by ambiguities, redundancies, and omissions found in natural language. We describe an unsupervised system that learns language by training visual classifiers, first selecting important terms from object descriptions, then automatically choosing negative examples from a paired corpus of perceptual and linguistic data. We evaluate the effectiveness of each stage as well as the system's performance on the overall learning task.
Generative Adversarial Network Based Heterogeneous Bibliographic Network Representation for Personalized Citation Recommendation
Cai, Xiaoyan (School of Automation, Northwestern Polytechnical University) | Han, Junwei (School of Automation, Northwestern Polytechnical University) | Yang, Libin (School of Automation, Northwestern Polytechnical University)
Network representation has been recently exploited for many applications, such as citation recommendation, multi-label classification and link prediction. It learns low-dimensional vector representation for each vertex in networks. Existing network representation methods only focus on incomplete aspects of vertex information (i.e., vertex content, network structure or partial integration), moreover they are commonly designed for homogeneous information networks where all the vertices of a network are of the same type. In this paper, we propose a deep network representation model that integrates network structure and the vertex content information into a unified framework by exploiting generative adversarial network, and represents different types of vertices in the heterogeneous network in a continuous and common vector space. Based on the proposed model, we can obtain heterogeneous bibliographic network representation for efficient citation recommendation. The proposed model also makes personalized citation recommendation possible, which is a new issue that a few papers addressed in the past. When evaluated on the AAN and DBLP datasets, the performance of the proposed heterogeneous bibliographic network based citation recommendation approach is comparable with that of the other network representation based citation recommendation approaches. The results also demonstrate that the personalized citation recommendation approach is more effective than the non-personalized citation recommendation approach.
Learning Graph-Structured Sum-Product Networks for Probabilistic Semantic Maps
Zheng, Kaiyu (University of Washington, Seattle) | Pronobis, Andrzej (University of Washington, Seattle ) | Rao, Rajesh P. N. (University of Washington, Seattle )
We introduce Graph-Structured Sum-Product Networks (GraphSPNs), a probabilistic approach to structured prediction for problems where dependencies between latent variables are expressed in terms of arbitrary, dynamic graphs. While many approaches to structured prediction place strict constraints on the interactions between inferred variables, many real-world problems can be only characterized using complex graph structures of varying size, often contaminated with noise when obtained from real data. Here, we focus on one such problem in the domain of robotics. We demonstrate how GraphSPNs can be used to bolster inference about semantic, conceptual place descriptions using noisy topological relations discovered by a robot exploring large-scale office spaces. Through experiments, we show that GraphSPNs consistently outperform the traditional approach based on undirected graphical models, successfully disambiguating information in global semantic maps built from uncertain, noisy local evidence. We further exploit the probabilistic nature of the model to infer marginal distributions over semantic descriptions of as yet unexplored places and detect spatial environment configurations that are novel and incongruent with the known evidence.
Feature-Induced Labeling Information Enrichment for Multi-Label Learning
Zhang, Qian-Wen (Tencent Smart Platform &) | Zhong, Yun (Products Department) | Zhang, Min-Ling (Southeast University)
In multi-label learning, each training example is represented by a single instance (feature vector) while associated with multiple class labels simultaneously. The task is to learn a predictive model from the training examples which can assign a set of proper labels for the unseen instance. Most existing approaches make use of multi-label training examples by exploiting their labeling information in a crisp manner, i.e. one class label is either fully relevant or irrelevant to the instance. In this paper, a novel multi-label learning approach is proposed which aims to enrich the labeling information by leveraging the structural information in feature space. Firstly, the underlying structure of feature space is characterized by conducting sparse reconstruction among the training examples. Secondly, the reconstruction information is conveyed from feature space to label space so as to enrich the original categorical labels into numerical ones. Thirdly, the multi-label predictive model is induced by learning from training examples with enriched labeling information. Extensive experiments on fifteen benchmark data sets clearly validate the effectiveness of the proposed feature-induced strategy for enhancing labeling information of multi-label examples.