Goto

Collaborating Authors

 Statistical Learning


Formalizing Hierarchical Clustering as Integer Linear Programming

AAAI Conferences

Hierarchical clustering is typically implemented as a greedy heuristic algorithm with no explicit objective function. In this work we formalize hierarchical clustering as an integer linear programming (ILP) problem with a natural objective function and the dendrogram properties enforced as linear constraints.  Though exact solvers exists for ILP we show that a simple randomized algorithm and a linear programming (LP) relaxation can be used to provide approximate solutions faster.  Formalizing hierarchical clustering also has the benefit that relaxing the constraints can produce novel problem variations such as overlapping clusterings.  Our experiments show that our formulation is capable of outperforming standard agglomerative clustering algorithms in a variety of settings, including traditional hierarchical clustering as well as learning overlapping clusterings.


On Power-Law Kernels, Corresponding Reproducing Kernel Hilbert Space and Applications

AAAI Conferences

The role of kernels is central to machine learning. Motivated by the importance of power-law distributions in statistical modeling, in this paper, we propose the notion of power-law kernels to investigate power-laws in learning problem. We propose two power-law kernels by generalizing Gaussian and Laplacian kernels. This generalization is based on distributions, arising out of maximization of a generalized information measure known as nonextensive entropy that is very well studied in statistical mechanics. We prove that the proposed kernels are positive definite, and provide some insights regarding the corresponding Reproducing Kernel Hilbert Space (RKHS). We also study practical significance of both kernels in classification and regression, and present some simulation results.


Automatic Identification of Conceptual Metaphors With Limited Knowledge

AAAI Conferences

Full natural language understanding requires identifying and analyzing the meanings of metaphors, which are ubiquitous in both text and speech. Over the last thirty years, linguistic metaphors have been shown to be based on more general conceptual metaphors, partial semantic mappings between disparate conceptual domains. Though some achievements have been made in identifying linguistic metaphors over the last decade or so, little work has been done to date on automatically identifying conceptual metaphors. This paper describes research on identifying conceptual metaphors based on corpus data. Our method uses as little background knowledge as possible, to ease transfer to new languages and to mini- mize any bias introduced by the knowledge base construction process. The method relies on general heuristics for identifying linguistic metaphors and statistical clustering (guided by Wordnet) to form conceptual metaphor candidates. Human experiments show the system effectively finds meaningful conceptual metaphors.


SMILe: Shuffled Multiple-Instance Learning

AAAI Conferences

Resampling techniques such as bagging are often used in supervised learning to produce more accurate classifiers. In this work, we show that multiple-instance learning admits a different form of resampling, which we call "shuffling." In shuffling, we resample instances in such a way that the resulting bags are likely to be correctly labeled. We show that resampling results in both a reduction of bag label noise and a propagation of additional informative constraints to a multiple-instance classifier. We empirically evaluate shuffling in the context of multiple-instance classification and multiple-instance active learning and show that the approach leads to significant improvements in accuracy.


A Maximum K-Min Approach for Classification

AAAI Conferences

In this paper, a general Maximum K-Min approach for classification is proposed. With the physical meaning of optimizing the classification confidence of the K worst instances, Maximum K-Min Gain/Minimum K-Max Loss (MKM) criterion is introduced. To make the original optimization problem with combinational constraints computationally tractable, the optimization techniques are adopted and a general compact representation lemma for MKM Criterion is summarized. Based on the lemma, a Nonlinear Maximum K-Min (NMKM) classifier and a Semi-supervised Maximum K-Min (SMKM) classifier are presented for traditional classification task and semi-supervised classification task respectively. Based on the experiment results of publicly available datasets, our Maximum K-Min methods have achieved competitive performance when comparing against Hinge Loss classifiers.


The Automated Acquisition of Suggestions from Tweets

AAAI Conferences

This paper targets at automatically detecting and classifying user's suggestions from tweets. The short and informal nature of tweets, along with the imbalanced characteristics of suggestion tweets, makes the task extremely challenging. To this end, we develop a classification framework on Factorization Machines, which is effective and efficient especially in classification tasks with feature sparsity settings. Moreover, we tackle the imbalance problem by introducing cost-sensitive learning techniques in Factorization Machines. Extensively experimental studies on a manually annotated real-life data set show that the proposed approach significantly improves the baseline approach, and yields the precision of 71.06% and recall of 67.86%. We also investigate the reason why Factorization Machines perform better. Finally, we introduce the first manually annotated dataset for suggestion classification.


Online Lazy Updates for Portfolio Selection with Transaction Costs

AAAI Conferences

A major challenge for stochastic optimization is the cost of updating model parameters especially when the number of parameters is large. Updating parameters frequently can prove to be computationally or monetarily expensive. In this paper, we introduce an efficient primal-dual based online algorithm that performs lazy updates to the parameter vector and show that its performance is competitive with reasonable strategies which have the benefit of hindsight. We demonstrate the effectiveness of our algorithm in the online portfolio selection domain where a trader has to pay proportional transaction costs every time his portfolio is updated. Our Online Lazy Updates (OLU) algorithm takes into account the transaction costs while computing an optimal portfolio which results in sparse updates to the portfolio vector. We successfully establish the robustness and scalability of our lazy portfolio selection algorithm with extensive theoretical and experimental results on two real-world datasets.


Uncorrelated Lasso

AAAI Conferences

In this paper, motivated by the previous sparse learning In many regression applications, there are too many unrelated based research, we propose to add variable correlation into predictors which may hide the relationship between the sparse-learning-based variable selection approach. We response and the most related predictors. A common way to note that in previous Lasso-type variable selection, variable resolve this problem is variable selection, that is to select a correlations are not taken into account, while in most subset of the most representative or discriminative predictors real-life data, predictors are often correlated. Strongly correlated from the input predictor set. The central requirement is that predictors share similar properties, and have some good predictor set contains predictors that are highly correlated overlapped information.


Bridging Information Criteria and Parameter Shrinkage for Model Selection

arXiv.org Machine Learning

Model selection based on classical information criteria, such as BIC, is generally computationally demanding, but its properties are well studied. On the other hand, model selection based on parameter shrinkage by $\ell_1$-type penalties is computationally efficient. In this paper we make an attempt to combine their strengths, and propose a simple approach that penalizes the likelihood with data-dependent $\ell_1$ penalties as in adaptive Lasso and exploits a fixed penalization parameter. Even for finite samples, its model selection results approximately coincide with those based on information criteria; in particular, we show that in some special cases, this approach and the corresponding information criterion produce exactly the same model. One can also consider this approach as a way to directly determine the penalization parameter in adaptive Lasso to achieve information criteria-like model selection. As extensions, we apply this idea to complex models including Gaussian mixture model and mixture of factor analyzers, whose model selection is traditionally difficult to do; by adopting suitable penalties, we provide continuous approximators to the corresponding information criteria, which are easy to optimize and enable efficient model selection.


Machine Learning in Proof General: Interfacing Interfaces

arXiv.org Artificial Intelligence

It allows users to gather proof statistics related to shapes of goals, sequences of applied tactics, and proof tree structures from the libraries of interactive higher-order proofs written in Coq and SSReflect. The gathered data is clustered using the state-of-the-art machine learning algorithms available in MATLAB and Weka. ML4PG provides automated interfacing between Proof General and MATLAB/Weka. The results of clustering are used by ML4PG to provide proof hints in the process of interactive proof development.