AITopics

We propose a semiparametric approach, named nonparanormal skeptic, for estimating high dimensional undirected graphical models. In terms of modeling, we consider the nonparanormal family proposed by Liu et al (2009). In terms of estimation, we exploit nonparametric rank-based correlation coefficient estimators including the Spearman's rho and Kendall's tau. In high dimensional settings, we prove that the nonparanormal skeptic achieves the optimal parametric rate of convergence in both graph and parameter estimation. This result suggests that the nonparanormal graphical models are a safe replacement of the Gaussian graphical models, even when the data are Gaussian.

artificial intelligence, estimator, machine learning, (11 more...)

1206.6488

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.34)

Industry: Banking & Finance > Trading (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)

Passos, Alexandre, Rai, Piyush, Wainer, Jacques, Daume, Hal III

Flexible Modeling of Latent Task Structures in Multitask Learning

Multitask learning algorithms are typically designed assuming some fixed, a priori known latent structure shared by all the tasks. However, it is usually unclear what type of latent task structure is the most appropriate for a given multitask learning problem. Ideally, the "right" latent task structure should be learned in a data-driven manner. We present a flexible, nonparametric Bayesian model that posits a mixture of factor analyzers structure on the tasks. The nonparametric aspect makes the model expressive enough to subsume many existing models of latent task structures (e.g, mean-regularized tasks, clustered tasks, low-rank or linear/non-linear subspace assumption on tasks, etc.). Moreover, it can also learn more general task structures, addressing the shortcomings of such models. We present a variational inference algorithm for our model. Experimental results on synthetic and real-world datasets, on both regression and classification problems, demonstrate the effectiveness of the proposed method.

artificial intelligence, factor analyzer, machine learning, (13 more...)

1206.6486

Country:

North America > United States > Massachusetts (0.28)
North America > United States > Maryland (0.28)

Genre: Research Report (0.65)

Industry:

Education (0.67)
Government (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

Geist, Matthieu, Scherrer, Bruno, Lazaric, Alessandro, Ghavamzadeh, Mohammad

A Dantzig Selector Approach to Temporal Difference Learning

LSTD is a popular algorithm for value function approximation. Whenever the number of features is larger than the number of samples, it must be paired with some form of regularization. In particular, L1-regularization methods tend to perform feature selection by promoting sparsity, and thus, are well-suited for high-dimensional problems. However, since LSTD is not a simple regression algorithm, but it solves a fixed--point problem, its integration with L1-regularization is not straightforward and might come with some drawbacks (e.g., the P-matrix assumption for LASSO-TD). In this paper, we introduce a novel algorithm obtained by integrating LSTD with the Dantzig Selector. We investigate the performance of the proposed algorithm and its relationship with the existing regularized approaches, and show how it addresses some of their drawbacks.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

1206.648

Country: Europe > France (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)

Balasubramanian, Krishnakumar, Lebanon, Guy

The Landmark Selection Method for Multiple Output Prediction

Conditional modeling x \to y is a central problem in machine learning. A substantial research effort is devoted to such modeling when x is high dimensional. We consider, instead, the case of a high dimensional y, where x is either low dimensional or high dimensional. Our approach is based on selecting a small subset y_L of the dimensions of y, and proceed by modeling (i) x \to y_L and (ii) y_L \to y. Composing these two models, we obtain a conditional model x \to y that possesses convenient statistical properties. Multi-label classification and multivariate regression experiments on several datasets show that this model outperforms the one vs. all approach as well as several sophisticated multiple output prediction methods.

artificial intelligence, machine learning, regression, (15 more...)

1206.6479

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Zhang, Yi, Schneider, Jeff

Maximum Margin Output Coding

In this paper we study output coding for multi-label prediction. For a multi-label output coding to be discriminative, it is important that codewords for different label vectors are significantly different from each other. In the meantime, unlike in traditional coding theory, codewords in output coding are to be predicted from the input, so it is also critical to have a predictable label encoding. To find output codes that are both discriminative and predictable, we first propose a max-margin formulation that naturally captures these two properties. We then convert it to a metric learning formulation, but with an exponentially large number of constraints as commonly encountered in structured prediction problems. Without a label structure for tractable inference, we use overgenerating (i.e., relaxation) techniques combined with the cutting plane method for optimization. In our empirical study, the proposed output coding scheme outperforms a variety of existing multi-label prediction methods for image, text and music classification.

artificial intelligence, codeword, machine learning, (14 more...)

1206.6478

Genre: Research Report (0.82)

Industry: Education (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Discovering Support and Affiliated Features from Very High Dimensions

Zhai, Yiteng, Tan, Mingkui, Tsang, Ivor, Ong, Yew Soon

In this paper, a novel learning paradigm is presented to automatically identify groups of informative and correlated features from very high dimensions. Specifically, we explicitly incorporate correlation measures as constraints and then propose an efficient embedded feature selection method using recently developed cutting plane strategy. The benefits of the proposed algorithm are twofolds. First, it can identify the optimal discriminative and uncorrelated feature subset to the output labels, denoted here as Support Features, which brings about significant improvements in prediction performance over other state of the art feature selection methods considered in the paper. Second, during the learning process, the underlying group structures of correlated features associated with each support feature, denoted as Affiliated Features, can also be discovered without any additional cost. These affiliated features serve to improve the interpretations on the learning tasks. Extensive empirical studies on both synthetic and very high dimensional real-world datasets verify the validity and efficiency of the proposed method. To address this issue, a plethora of feature selection methods have been developed in the recent decades.

artificial intelligence, machine learning, support feature, (14 more...)

1206.6477

Country: North America > United States (0.15)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Bellet, Aurelien, Habrard, Amaury, Sebban, Marc

Similarity Learning for Provably Accurate Sparse Linear Classification

In recent years, the crucial importance of metrics in machine learning algorithms has led to an increasing interest for optimizing distance and similarity functions. Most of the state of the art focus on learning Mahalanobis distances (requiring to fulfill a constraint of positive semi-definiteness) for use in a local k-NN algorithm. However, no theoretical link is established between the learned metrics and their performance in classification. In this paper, we make use of the formal framework of good similarities introduced by Balcan et al. to design an algorithm for learning a non PSD linear similarity optimized in a nonlinear feature space, which is then used to build a global linear classifier. We show that our approach has uniform stability and derive a generalization bound on the classification error. Experiments performed on various datasets confirm the effectiveness of our approach compared to state-of-the-art methods and provide evidence that (i) it is fast, (ii) robust to overfitting and (iii) produces very sparse classifiers.

artificial intelligence, machine learning, similarity function, (15 more...)

1206.6476

Country: Europe (0.46)

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Richard, Emile, Savalle, Pierre-Andre, Vayatis, Nicolas

Estimation of Simultaneously Sparse and Low Rank Matrices

The paper introduces a penalized matrix estimation procedure aiming at solutions which are sparse and low-rank at the same time. Such structures arise in the context of social networks or protein interactions where underlying graphs have adjacency matrices which are block-diagonal in the appropriate basis. We introduce a convex mixed penalty which involves $\ell_1$-norm and trace norm simultaneously. We obtain an oracle inequality which indicates how the two effects interact according to the nature of the target matrix. We bound generalization error in the link prediction problem. We also develop proximal descent strategies to solve the optimization problem efficiently and evaluate performance on synthetic and real data sets.

artificial intelligence, machine learning, optimization problem, (17 more...)

1206.6474

Country: Europe > France (0.28)

Genre: Research Report (0.64)

Industry:

Health & Medicine (0.68)
Information Technology > Services (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Merchante, Luis Francisco Sanchez, Grandvalet, Yves, Govaert, Gerrad

An Efficient Approach to Sparse Linear Discriminant Analysis

We present a novel approach to the formulation and the resolution of sparse Linear Discriminant Analysis (LDA). Our proposal, is based on penalized Optimal Scoring. It has an exact equivalence with penalized LDA, contrary to the multi-class approaches based on the regression of class indicator that have been proposed so far. Sparsity is obtained thanks to a group-Lasso penalty that selects the same features in all discriminant directions. Our experiments demonstrate that this approach generates extremely parsimonious models without compromising prediction performances. Besides prediction, the resulting sparse discriminant directions are also amenable to low-dimensional representations of data. Our algorithm is highly efficient for medium to large number of variables, and is thus particularly well suited to the analysis of gene expression data.

artificial intelligence, machine learning, penalty, (17 more...)

1206.6472

Country:

North America > United States > California (0.28)
Europe (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Discriminant Analysis (0.61)

On Causal and Anticausal Learning

Schoelkopf, Bernhard, Janzing, Dominik, Peters, Jonas, Sgouritsa, Eleni, Zhang, Kun, Mooij, Joris

We consider the problem of function estimation in the case where an underlying causal model can be inferred. This has implications for popular scenarios such as covariate shift, concept drift, transfer learning and semi-supervised learning. We argue that causal knowledge may facilitate some approaches for a given problem, and rule out others. In particular, we formulate a hypothesis for when semi-supervised learning can help, and corroborate it with empirical results.

artificial intelligence, inductive learning, machine learning, (16 more...)

1206.6471

Country:

Europe (1.00)
North America > United States (0.93)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.55)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.47)