Industry
Efficient Sparse Group Feature Selection via Nonconvex Optimization
Xiang, Shuo, Shen, Xiaotong, Ye, Jieping
Sparse feature selection has been demonstrated to be effective in handling high-dimensional data. While promising, most of the existing works use convex methods, which may be suboptimal in terms of the accuracy of feature selection and parameter estimation. In this paper, we expand a nonconvex paradigm to sparse group feature selection, which is motivated by applications that require identifying the underlying group structure and performing feature selection simultaneously. The main contributions of this article are twofold: (1) statistically, we introduce a nonconvex sparse group feature selection model which can reconstruct the oracle estimator. Therefore, consistent feature selection and parameter estimation can be achieved; (2) computationally, we propose an efficient algorithm that is applicable to large-scale problems. Numerical results suggest that the proposed nonconvex method compares favorably against its competitors on synthetic data and real-world applications, thus achieving desired goal of delivering high performance.
Financial Portfolio Optimization: Computationally guided agents to investigate, analyse and invest!?
Financial portfolio optimization is a widely studied problem in mathematics, statistics, financial and computational literature. It adheres to determining an optimal combination of weights associated with financial assets held in a portfolio. In practice, it faces challenges by virtue of varying math. formulations, parameters, business constraints and complex financial instruments. Empirical nature of data is no longer one-sided; thereby reflecting upside and downside trends with repeated yet unidentifiable cyclic behaviours potentially caused due to high frequency volatile movements in asset trades. Portfolio optimization under such circumstances is theoretically and computationally challenging. This work presents a novel mechanism to reach an optimal solution by encoding a variety of optimal solutions in a solution bank to guide the search process for the global investment objective formulation. It conceptualizes the role of individual solver agents that contribute optimal solutions to a bank of solutions, a super-agent solver that learns from the solution bank, and, thus reflects a knowledge-based computationally guided agents approach to investigate, analyse and reach to optimal solution for informed investment decisions. Conceptual understanding of classes of solver agents that represent varying problem formulations and, mathematically oriented deterministic solvers along with stochastic-search driven evolutionary and swarm-intelligence based techniques for optimal weights are discussed. Algorithmic implementation is presented by an enhanced neighbourhood generation mechanism in Simulated Annealing algorithm. A framework for inclusion of heuristic knowledge and human expertise from financial literature related to investment decision making process is reflected via introduction of controlled perturbation strategies using a decision matrix for neighbourhood generation.
A Spectral Algorithm for Latent Dirichlet Allocation
Anandkumar, Animashree, Foster, Dean P., Hsu, Daniel, Kakade, Sham M., Liu, Yi-Kai
The problem of topic modeling can be seen as a generalization of the clustering problem, in that it posits that observations are generated due to multiple latent factors (e.g., the words in each document are generated as a mixture of several active topics, as opposed to just one). This increased representational power comes at the cost of a more challenging unsupervised learning problem of estimating the topic probability vectors (the distributions over words for each topic), when only the words are observed and the corresponding topics are hidden. We provide a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of mixture models, including the popular latent Dirichlet allocation (LDA) model. For LDA, the procedure correctly recovers both the topic probability vectors and the prior over the topics, using only trigram statistics (i.e., third order moments, which may be estimated with documents containing just three words). The method, termed Excess Correlation Analysis (ECA), is based on a spectral decomposition of low order moments (third and fourth order) via two singular value decompositions (SVDs). Moreover, the algorithm is scalable since the SVD operations are carried out on $k\times k$ matrices, where $k$ is the number of latent factors (e.g. the number of topics), rather than in the $d$-dimensional observed space (typically $d \gg k$).
Non-parametric Bayesian modelling of digital gene expression data
Vavoulis, Dimitrios V., Gough, Julian
Next-generation sequencing technologies provide a revolutionary tool for generating gene expression data. Starting with a fixed RNA sample, they construct a library of millions of differentially abundant short sequence tags or "reads", which constitute a fundamentally discrete measure of the level of gene expression. A common limitation in experiments using these technologies is the low number or even absence of biological replicates, which complicates the statistical analysis of digital gene expression data. Analysis of this type of data has often been based on modified tests originally devised for analysing microarrays; both these and even de novo methods for the analysis of RNA-seq data are plagued by the common problem of low replication. We propose a novel, non-parametric Bayesian approach for the analysis of digital gene expression data. We begin with a hierarchical model for modelling over-dispersed count data and a blocked Gibbs sampling algorithm for inferring the posterior distribution of model parameters conditional on these counts. The algorithm compensates for the problem of low numbers of biological replicates by clustering together genes with tag counts that are likely sampled from a common distribution and using this augmented sample for estimating the parameters of this distribution. The number of clusters is not decided a priori, but it is inferred along with the remaining model parameters. We demonstrate the ability of this approach to model biological data with high fidelity by applying the algorithm on a public dataset obtained from cancerous and non-cancerous neural tissues.
A Study on Using Uncertain Time Series Matching Algorithms in MapReduce Applications
Rizvandi, Nikzad Babaii, Taheri, Javid, Zomaya, Albert Y., Moraveji, Reza
This paper has been originally published as "A study on using uncertain time series matching algorithms for MapReduce applications" in Journal of Concurrency and Computation: Practice and Experience - Special Issue in Cloud Computing Scalability, John Wiley Publisher. We realized that the original title is not appropriate and cannot be found by people working in this area. Therefore, this text is for changing the title but the original paper can be found at the rest of this text (starting from the next page). For citation, please cite the original title as: NB Rizvandi, J Taheri, R Moraveji, AY Zomaya, "A study on using uncertain time series matching algorithms for MapReduce applications", Journal of Concurrency and Computation: Practice and Experience - Special Issue in Cloud Computing Scalability, John Wiley Publisher (2012) A Study on Using Uncertain Time Series Matching Algorithms for MapReduce Applications Abstract--In this paper, we study CPU utilization time patterns of several MapReduce applications. After extracting running patterns of several applications, the patterns along with their statistical information are saved in a reference database to be later used to tweak system parameters to efficiently execute future unknown applications. To achieve this goal, CPU utilization patterns of new applications along with its statistical information are compared with the already known ones in the reference database to find/predict their most probable execution patterns. Because of different pattern lengths, the Dynamic Time Warping (DTW) is utilized for such comparison; a statistical analysis is then applied to DTWs' outcomes to select the most suitable candidates. Furthermore, under a hypothesis, we also proposed another algorithm to classify applications under similar CPU utilization patterns. Finally, dependency between minimum distance/maximum similarity of applications and their scalability (in both input size and number of virtual nodes) are studied.
Change-Point Detection in Time-Series Data by Relative Density-Ratio Estimation
Liu, Song, Yamada, Makoto, Collier, Nigel, Sugiyama, Masashi
The objective of change-point detection is to discover abrupt property changes lying behind time-series data. In this paper, we present a novel statistical change-point detection algorithm based on non-parametric divergence estimation between time-series samples from two retrospective segments. Our method uses the relative Pearson divergence as a divergence measure, and it is accurately and efficiently estimated by a method of direct density-ratio estimation. Through experiments on artificial and real-world datasets including human-activity sensing, speech, and Twitter messages, we demonstrate the usefulness of the proposed method.
Game Networks
We introduce Game networks (G nets), a novel representation for multi-agent decision problems. Compared to other game-theoretic representations, such as strategic or extensive forms, G nets are more structured and more compact; more fundamentally, G nets constitute a computationally advantageous framework for strategic inference, as both probability and utility independencies are captured in the structure of the network and can be exploited in order to simplify the inference process. An important aspect of multi-agent reasoning is the identification of some or all of the strategic equilibria in a game; we present original convergence methods for strategic equilibrium which can take advantage of strategic separabilities in the G net structure in order to simplify the computations. Specifically, we describe a method which identifies a unique equilibrium as a function of the game payoffs, and one which identifies all equilibria.
Gaussian Process Networks
Friedman, Nir, Nachman, Iftach
In this paper we address the problem of learning the structure of a Bayesian network in domains with continuous variables. This task requires a procedure for comparing different candidate structures. In the Bayesian framework, this is done by evaluating the {em marginal likelihood/} of the data given a candidate structure. This term can be computed in closed-form for standard parametric families (e.g., Gaussians), and can be approximated, at some computational cost, for some semi-parametric families (e.g., mixtures of Gaussians). We present a new family of continuous variable probabilistic networks that are based on {em Gaussian Process/} priors. These priors are semi-parametric in nature and can learn almost arbitrary noisy functional relations. Using these priors, we can directly compute marginal likelihoods for structure learning. The resulting method can discover a wide range of functional dependencies in multivariate data. We develop the Bayesian score of Gaussian Process Networks and describe how to learn them from data. We present empirical results on artificial data as well as on real-life domains with non-linear dependencies.
Dynamic Bayesian Multinets
In this work, dynamic Bayesian multinets are introduced where a Markov chain state at time t determines conditional independence patterns between random variables lying within a local time window surrounding t. It is shown how information-theoretic criterion functions can be used to induce sparse, discriminative, and class-conditional network structures that yield an optimal approximation to the class posterior probability, and therefore are useful for the classification task. Using a new structure learning heuristic, the resulting models are tested on a medium-vocabulary isolated-word speech recognition task. It is demonstrated that these discriminatively structured dynamic Bayesian multinets, when trained in a maximum likelihood setting using EM, can outperform both HMMs and other dynamic Bayesian networks with a similar number of parameters.
Feature Selection and Dualities in Maximum Entropy Discrimination
Jebara, Tony S., Jaakkola, Tommi S.
Incorporating feature selection into a classification or regression method often carries a number of advantages. In this paper we formalize feature selection specifically from a discriminative perspective of improving classification/regression accuracy. The feature selection method is developed as an extension to the recently proposed maximum entropy discrimination (MED) framework. We describe MED as a flexible (Bayesian) regularization approach that subsumes, e.g., support vector classification, regression and exponential family models. For brevity, we restrict ourselves primarily to feature selection in the context of linear classification/regression methods and demonstrate that the proposed approach indeed carries substantial improvements in practice. Moreover, we discuss and develop various extensions of feature selection, including the problem of dealing with example specific but unobserved degrees of freedom -- alignments or invariants.