Inductive Learning
Graph Construction for Semi-Supervised Learning
Berton, Lilian (University of Sao Paulo) | Lopes, Alneu de Andrade (University of Sao Paulo)
Semi-Supervised Learning (SSL) techniques have become very relevant since they require a small set of labeled data. In this scenario, graph-based SSL algorithms provide a powerful framework for modeling manifold structures in high-dimensional spaces and are effective for the propagation of the few initial labels present in training data through the graph. An important step in graph-based SSL methods is the conversion of tabular data into a weighted graph. The graph construction has a key role in the quality of the classification in graph-based methods. Nevertheless, most of the SSL literature focuses on developing label inference algorithms without studying graph construction methods and its effect on the base algorithm performance. This PhD project aims to study this issue and proposes new methods for graph construction from flat data and improves the performance of the graph-based algorithms.
Solving the Partial Label Learning Problem: An Instance-Based Approach
Zhang, Min-Ling (Southeast University) | Yu, Fei (Southeast University)
In partial label learning, each training example is associated with a set of candidate labels, among which only one is valid. An intuitive strategy to learn from partial label examples is to treat all candidate labels equally and make prediction by averaging their modeling outputs. Nonetheless, this strategy may suffer from the problem that the modeling output from the valid label is overwhelmed by those from the false positive labels. In this paper, an instance-based approach named IPAL is proposed by directly disambiguating the candidate label set. Briefly, IPAL tries to identify the valid label of each partial label example via an iterative label propagation procedure, and then classifies the unseen instance based on minimum error reconstruction from its nearest neighbors. Extensive experiments show that IPAL compares favorably against the existing instance-based as well as other state-of-the-art partial label learning approaches.
Towards Class-Imbalance Aware Multi-Label Learning
Zhang, Min-Ling (Southeast University) | Li, Yu-Kun (Southeast University) | Liu, Xu-Ying (Southeast University)
In multi-label learning, each object is represented by a single instance while associated with a set of class labels. Due to the huge (exponential) number of possible label sets for prediction, existing approaches mainly focus on how to exploit label correlations to facilitate the learning process. Nevertheless, an intrinsic characteristic of learning from multi-label data, i.e. the widely-existing class-imbalance among labels, has not been well investigated. Generally, the number of positive training instances w.r.t. each class label is far less than its negative counterparts, which may lead to performance degradation for most multi-label learning techniques. In this paper, a new multi-label learning approach named Cross-Coupling Aggregation (COCOA) is proposed, which aims at leveraging the exploitation of label correlations as well as the exploration of class-imbalance. Briefly, to induce the predictive model on each class label, one binary-class imbalance learner corresponding to the current label and several multi-class imbalance learners coupling with other labels are aggregated for prediction. Extensive experiments clearly validate the effectiveness of the proposed approach, especially in terms of imbalance-specific evaluation metrics such as F-measure and area under the ROC curve.
Online Learning of k-CNF Boolean Functions
Veness, Joel (Google DeepMind) | Hutter, Marcus (Australian National University) | Orseau, Laurent (Google DeepMind) | Bellemare, Marc (Google DeepMind)
This paper revisits the problem of learning a k-CNF Boolean function from examples, for fixed k, in the context of online learning under the logarithmic loss. We give a Bayesian interpretation to one of Valiant’s classic PAC learning algorithms, which we then build upon to derive three efficient, online, probabilistic, supervised learning algorithms for predicting the output of an unknown k-CNF Boolean function. We analyze the loss of our methods, and show that the cumulative log-loss can be upper bounded by a polynomial function of the size of each example.
Extended Discriminative Random Walk: A Hypergraph Approach to Multi-View Multi-Relational Transductive Learning
Satchidanand, Sai Nageswar (Indian Institute of Technology Madras) | Ananthapadmanaban, Harini (Indian Institute of Technology Madras) | Ravindran, Balaraman (Indian Institute of Technology Madras)
Transductive inference on graphs has been garnering increasing attention due to the connected nature of many real-life data sources, such as online social media and biological data (protein-protein interaction network, gene networks, etc.). Typically relational information in the data is encoded as edges in a graph but often it is important to model multi-way interactions, such as in collaboration networks and reaction networks. In this work we model multi-way relations as hypergraphs and extend the discriminative random walk (DRW) framework, originally proposed for transductive inference on single graphs, to the case of multiple hypergraphs. We use the extended DRW framework for inference on multi-view, multi-relational data in a natural way, by representing attribute descriptions of the data also as hypergraphs. We further exploit the structure of hypergraphs to modify the random walk operator to take into account class imbalance in the data. This work is among very few approaches to explicitly address class imbalance in the in-network classification setting, using random walks. We compare our approach to methods proposed for inference on hypergraphs, and to methods proposed for multi-view data and show that empirically we achieve better performance. We also compare to methods specifically tailored for class-imbalanced data and show that our approach achieves comparable performance even on non-network data.
Active Imitation Learning of Hierarchical Policies
Hamidi, Mandana (Oregon State University) | Tadepalli, Prasad (Oregon State University) | Goetschalckx, Robby (Oregon State University) | Fern, Alan (Oregon State University)
However, by being autonomous, structure of the policy, which is often critical for understanding these approaches have the problem of discovering the demonstration, is unobserved. We unnatural hierarchies, which may be difficult to interpret and formulate this problem as active learning of Probabilistic communicate to people. State-Dependent Grammars (PSDGs) from In this paper, we study the problem of learning policies demonstrations. Given a set of expert demonstrations, with hierarchical structure from demonstrations of a teacher our approach learns a hierarchical policy by whose policy is structured hierarchically, with natural applications actively selecting demonstrations and using queries to problems such as tutoring arithmetic, cooking, and to explicate their intentional structure at selected furniture assembly. A key challenge in this problem is that the points. Our contributions include a new algorithm demonstrations do not reveal the hierarchical task structure of for imitation learning of hierarchical policies and the teacher. Rather, only ground states and teacher actions are principled heuristics for the selection of demonstrations directly observable. This can lead to significant ambiguity in and queries.
Discriminative Reordering Model Adaptation via Structural Learning
Zhang, Biao (Xiamen University) | Su, Jinsong (Xiamen University) | Xiong, Deyi (Soochow University) | Duan, Hong (Xiamen University) | Yao, Junfeng (Xiamen University)
Reordering model adaptation remains a big challenge in statistical machine translation because reordering patterns of translation units often vary dramatically from one domain to another. In this paper, we propose a novel adaptive discriminative reordering model (DRM) based on structural learning, which can capture correspondences among reordering features from two different domains. Exploiting both in-domain and out-of-domain monolingual corpora, our model learns a shared feature representation for cross-domain phrase reordering. Incorporating features of this representation, the DRM trained on out-of-domain corpus generalizes better to in-domain data. Experiment results on the NIST Chinese-English translation task show that our approach significantly outperforms a variety of baselines.
FlashNormalize: Programming by Examples for Text Normalization
Kini, Dileep (University of Illinois at Urbana-Champaign) | Gulwani, Sumit (Microsoft Research)
Several applications including text-to-speech require some normalized format of non-standard words in various domains such as numbers, dates, and currencies and in various human languages. The traditional approach of manually constructing a program for such a normalization task requires expertise in both programming and target (human) language and further does not scale to a large number of domain, format, and target language combinations. We propose to learn programs for such normalization tasks through examples. We present a domain-specific programming language that offers appropriate abstractions for succinctly describing such normalization tasks, and then present a novel search algorithm that can effectively learn programs in this language from input-output examples. We also briefly describe domain-specific heuristics for guiding users of our system to provide representative examples for normalization tasks related to that domain. Our experiments show that weare able to effectively learn desired programs for a variety of normalization tasks.
A Generalized Kernel Approach to Structured Output Learning
Kadri, Hachem, Ghavamzadeh, Mohammad, Preux, Philippe
We study the problem of structured output learning from a regression perspective. We first provide a general formulation of the kernel dependency estimation (KDE) approach to this problem using operator-valued kernels. Our formulation overcomes the two main limitations of the original KDE approach, namely the decoupling between outputs in the image space and the inability to use a joint feature space. We then propose a covariance-based operator-valued kernel that allows us to take into account the structure of the kernel feature space. This kernel operates on the output space and only encodes the interactions between the outputs without any reference to the input space. To address this issue, we introduce a variant of our KDE method based on the conditional covariance operator that in addition to the correlation between the outputs takes into account the effects of the input variables. Finally, we evaluate the performance of our KDE approach on three structured output problems, and compare it to the state-of-the-art kernelbased structured output regression methods.
Framework for Multi-task Multiple Kernel Learning and Applications in Genome Analysis
Widmer, Christian, Kloft, Marius, Sreedharan, Vipin T, Rätsch, Gunnar
We present a general regularization-based framework for Multi-task learning (MTL), in which the similarity between tasks can be learned or refined using $\ell_p$-norm Multiple Kernel learning (MKL). Based on this very general formulation (including a general loss function), we derive the corresponding dual formulation using Fenchel duality applied to Hermitian matrices. We show that numerous established MTL methods can be derived as special cases from both, the primal and dual of our formulation. Furthermore, we derive a modern dual-coordinate descend optimization strategy for the hinge-loss variant of our formulation and provide convergence bounds for our algorithm. As a special case, we implement in C++ a fast LibLinear-style solver for $\ell_p$-norm MKL. In the experimental section, we analyze various aspects of our algorithm such as predictive performance and ability to reconstruct task relationships on biologically inspired synthetic data, where we have full control over the underlying ground truth. We also experiment on a new dataset from the domain of computational biology that we collected for the purpose of this paper. It concerns the prediction of transcription start sites (TSS) over nine organisms, which is a crucial task in gene finding. Our solvers including all discussed special cases are made available as open-source software as part of the SHOGUN machine learning toolbox (available at \url{http://shogun.ml}).