Country
Biclustering-Driven Ensemble of Bayesian Belief Network Classifiers for Underdetermined Problems
Pansombut, Tatdow (North Carolina State University, Oak Ridge National Laboratory) | Hendrix, William (North Carolina State University, Oak Ridge National Laboratory) | Gao, Zekai J. (Zhejiang University) | Harrison, Brent E. (North Carolina State University, Oak Ridge National Laboratory) | Samatova, Nagiza F. (North Carolina State University, Oak Ridge National Laboratory)
In this paper, we present BENCH (BiclusteringdrivenENsemble of Classifiers), an algorithm toconstruct an ensemble of classifiers through concurrentfeature and data point selection guided byunsupervised knowledge obtained from biclustering.BENCH is designed for underdeterminedproblems. In our experiments, we use Bayesian BeliefNetwork (BBN) classifiers as base classifiers inthe ensemble; however, BENCH can be applied toother classification models as well. We show thatBENCH is able to increase prediction accuracy ofa single classifier and traditional ensemble of classifiersby up to 15% on three microarray datasetsusing various weighting schemes for combining individualpredictions in the ensemble.
Robust Principal Component Analysis with Non-Greedy ℓ1-Norm Maximization
Nie, Feiping (University of Texas at Arlington) | Huang, Heng (University of Texas at Arlington) | Ding, Chris (University of Texas at Arlington) | Luo, Dijun (University of Texas at Arlington) | Wang, Hua (University of Texas at Arlington)
Principal Component Analysis (PCA) is one of the most important methods to handle high-dimensional data. However, the high computa-tional complexity makes it hard to apply to the large scale data with high dimensionality, and the used ℓ2-norm makes it sensitive to outliers. A recent work proposed principal component analysis based on ℓ1-norm maximization, which is efficient and robust to outliers. In that work, a greedy strategy was applied due to the difficulty of directly solving the ℓ1-norm maximization problem, which is easy to get stuck in local solution. In this paper, we first propose an efficient optimization algorithm to solve a general ℓ1-norm maximization problem, and then propose a robust principal component analysis with non-greedy ℓ1-norm maximization. Experimental results on real world datasets show that the non-greedy method always obtains much better solution than that of the greedy method.
Distribution-Aware Online Classifiers
Nguyen, Tam T. (Nanyang Technological University) | Chang, Kuiyu (Nanyang Technological University) | Hui, Cheung Siu (Nanyang Technological University)
We propose a family of Passive-Aggressive Mahalanobis (PAM) algorithms, which are incremental (online) binary classifiers that consider the distribution of data. PAM is in fact a generalization of the Passive-Aggressive (PA) algorithms to handle data distributions that can be represented by a covariance matrix. The update equations for PAM are derived and theoretical error loss bounds computed. We benchmarked PAM against the original PA-I, PA-II, and Confidence Weighted (CW) learning. Although PAM somewhat resembles CW in its update equations, PA minimizes differences in the weights while CW minimizes differences in weight distributions. Results on 8 classification datasets, which include a real-life micro-blog sentiment classification task, show that PAM consistently outperformed its competitors, most notably CW. This shows that a simple approach like PAM is more practical in real-life classification tasks, compared to more elegant and sophisticated approaches like CW.
Positive Unlabeled Learning for Time Series Classification
Nguyen, Minh Nhut (Institute for Infocomm Research, Singapore) | Li, Xiao-Li (Institute for Infocomm Research, Singapore) | Ng, See-Kiong (Institute for Infocomm Research, Singapore)
In many real-world applications of the time series classification problem, not only could the negative training instances be missing, the number of positive instances available for learning may also be rather limited. This has motivated the development of new classification algorithms that can learn from a small set P of labeled seed positive instances augmented with a set U of unlabeled instances (i.e. PU learning algorithms). However, existing PU learning algorithms for time series classification have less than satisfactory performance as they are unable to identify the class boundary between positive and negative instances accurately. In this paper, we propose a novel PU learning algorithm LCLC (Learning from Common Local Clusters) for time series classification. LCLC is designed to effectively identify the ground truths’ positive and negative boundaries, resulting in more accurate classifiers than those constructed using existing methods. We have applied LCLC to classify time series data from different application domains; the experimental results demonstrate that LCLC outperforms existing methods significantly.
Imitation Learning in Relational Domains: A Functional-Gradient Boosting Approach
Natarajan, Sriraam (Wake Forest University School of Medicine) | Joshi, Saket (Oregon State University) | Tadepalli, Prasad (Oregon State University) | Kersting, Kristian (Fraunhofer IAIS) | Shavlik, Jude (University of Wisconsin-Madiso)
Imitation learning refers to the problem of learning how to behave by observinga teacher in action. We consider imitation learning in relational domains, in which there is a varying number of objects and relations among them. In prior work, simple relational policies are learned by viewing imitation learning as supervised learning of a function from states to actions. For propositional worlds, functional gradient methods have been proved to be beneficial. They are simpler to implement than most existing methods, more efficient, more naturally satisfy common constraints on the cost function, and better represent our prior beliefs about the form of the function. Building on recent generalizations of functional gradient boosting to relational representations, we implement a functional gradient boosting approach to imitation learning in relational domains. In particular, given a set of traces from the human teacher, our system learns a policy in the form of a set of relational regression trees that additively approximate the functional gradients. The use of multiple additive trees combined with relational representation allows for learning more expressive policies than what has been done before. We demonstrate the usefulness of our approach in several different domains.
Multi-Kernel Gaussian Processes
Melkumyan, Arman (The University of Sydney) | Ramos, Fabio (The University of Sydney)
Multi-task learning remains a difficult yet important problem in machine learning. In Gaussian processes the main challenge is the definition of valid kernels (covariance functions) able to capture the relationships between different tasks. This paper presents a novel methodology to construct valid multi-task covariance functions (Mercer kernels) for Gaussian processes allowing for a combination of kernels with different forms. The method is based on Fourier analysis and is general for arbitrary stationary covariance functions. Analytical solutions for cross covariance terms between popular forms are provided including Mat´ern, squared exponential and sparse covariance functions. Experiments are conducted with both artificial and real datasets demonstrating the benefits of the approach.
Agent-Oriented Incremental Team and Activity Recognition
Masato, Daniele (University of Aberdeen) | Norman, Timothy J. (University of Aberdeen) | Vasconcelos, Wamberto W. (University of Aberdeen) | Sycara, Katia (Carnegie Mellon University)
Monitoring team activity is beneficial when human teams cooperate in the enactment of a joint plan. Monitoring allows teams to maintain awareness of each other's progress within the plan and it enables anticipation of information needs. Humans find this difficult, particularly in time-stressed and uncertain environments. In this paper we introduce a probabilistic model, based on Conditional Random Fields, to automatically recognise the composition of teams and the team activities in relation to a plan. The team composition and activities are recognised incrementally by interpreting a stream of spatio-temporal observations.
Combining Supervised and Unsupervised Models Via Unconstrained Probabilistic Embedding
Ma, Xudong (Chinese Academy of Sciences) | Luo, Ping (Hewlett Packard Labs China) | Zhuang, Fuzhen (Chinese Academy of Sciences) | He, Qing (Chinese Academy of Sciences) | Shi, Zhongzhi (Chinese Academy of Sciences) | Shen, Zhiyong (Hewlett Packard Labs China)
Ensemble learning with output from multiple supervised and unsupervised models aims to improvethe classification accuracy of supervised model ensembleby jointly considering the grouping results from unsupervised models. In this paper we cast this ensemble task as an unconstrained probabilistic embedding problem. Specifically, we assume both objects and classes/clusters have latent coordinates without constraints in a D -dimensional Euclidean space, and consider the mapping from the embedded space into the space of results from supervised and unsupervised models as a probabilistic generative process. The prediction of an objectis then determined by the distances between the objectand the classes in the embedded space. A solution of this embedding can be obtained using the quasi-Newton method, resulting in the objects and classes/clusters with high co-occurrence weights being embedded close. We demonstrate the benefits of this unconstrained embedding method by three real applications.
Ball Ranking Machines for Content-Based Multimedia Retrieval
Luo, Dijun (The University of Texas at Arlington) | Huang, Heng (The University of Texas at Arlington)
In this paper, we propose the new Ball Ranking Machines (BRMs) to address the supervised ranking problems. In previous work, supervised ranking methods have been successfully applied in various information retrieval tasks. Among these methodologies, the Ranking Support Vector Machines (Rank SVMs) are well investigated. However, one major fact limiting their applications is that Ranking SVMs need optimize a margin-based objective function over all possible document pairs within all queries on the training set. In consequence, Ranking SVMs need select a large number of support vectors among a huge number of support vector candidates. This paper introduces a new model of of Ranking SVMs and develops an efficient approximation algorithm, which decreases the training time and generates much fewer support vectors. Empirical studies on synthetic data and content-based image/video retrieval data show that our method is comparable to Ranking SVMs in accuracy, but use much fewer ranking support vectors and significantly less training time.
Cluster Indicator Decomposition for Efficient Matrix Factorization
Luo, Dijun (The University of Texas at Arlington) | Ding, Chris (The University of Texas at Arlington) | Huang, Heng (The University of Texas at Arlington)
We propose a new clustering based low-rank matrix approximation method, Cluster Indicator Decomposition (CID), which yields more accurate low-rank approximations than previous commonly used singular value decomposition and other Nyström style decompositions. Our model utilizes the intrinsic structures of data and theoretically be more compact and accurate than the traditional low rank approximation approaches. The reconstruction in CID is extremely fast leading to a desirable advantage of our method in large-scale kernel machines (like Support Vector Machines) in which the reconstruction of the kernels needs to be frequently computed. Experimental results indicate that our approach compress images much more efficiently than other factorization based methods. We show that combining our method with Support Vector Machines obtains more accurate approximation and more accurate prediction while consuming much less computation resources.