AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Sparse Trace Norm Regularization

arXiv.org Machine LearningJun-1-2012

We study the problem of estimating multiple predictive functions from a dictionary of basis functions in the nonparametric regression setting. Our estimation scheme assumes that each predictive function can be estimated in the form of a linear combination of the basis functions. By assuming that the coefficient matrix admits a sparse low-rank structure, we formulate the function estimation problem as a convex program regularized by the trace norm and the $\ell_1$-norm simultaneously. We propose to solve the convex program using the accelerated gradient (AG) method and the alternating direction method of multipliers (ADMM) respectively; we also develop efficient algorithms to solve the key components in both AG and ADMM. In addition, we conduct theoretical analysis on the proposed function estimation scheme: we derive a key property of the optimal solution to the convex program; based on an assumption on the basis functions, we establish a performance bound of the proposed function estimation scheme (via the composite regularization). Simulation studies demonstrate the effectiveness and efficiency of the proposed algorithms.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Machine Learning

1206.0333

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Finding Important Genes from High-Dimensional Data: An Appraisal of Statistical Tests and Machine-Learning Approaches

Wang, Chamont, Gevertz, Jana, Chen, Chaur-Chin, Auslender, Leonardo

arXiv.org Machine LearningMay-29-2012

Over the past decades, statisticians and machine-learning researchers have developed literally thousands of new tools for the reduction of high-dimensional data in order to identify the variables most responsible for a particular trait. These tools have applications in a plethora of settings, including data analysis in the fields of business, education, forensics, and biology (such as microarray, proteomics, brain imaging), to name a few. In the present work, we focus our investigation on the limitations and potential misuses of certain tools in the analysis of the benchmark colon cancer data (2,000 variables; Alon et al., 1999) and the prostate cancer data (6,033 variables; Efron, 2010, 2008). Our analysis demonstrates that models that produce 100% accuracy measures often select different sets of genes and cannot stand the scrutiny of parameter estimates and model stability. Furthermore, we created a host of simulation datasets and "artificial diseases" to evaluate the reliability of commonly used statistical and data mining tools. We found that certain widely used models can classify the data with 100% accuracy without using any of the variables responsible for the disease. With moderate sample size and suitable pre-screening, stochastic gradient boosting will be shown to be a superior model for gene selection and variable screening from high-dimensional datasets.

artificial intelligence, casual inference, machine learning, (17 more...)

arXiv.org Machine Learning

1205.6523

Country: North America > United States (0.93)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Research Report > Strength High (0.67)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Sparse Approximation via Penalty Decomposition Methods

Lu, Zhaosong, Zhang, Yong

arXiv.org Machine LearningMay-29-2012

In this paper we consider sparse approximation problems, that is, general $l_0$ minimization problems with the $l_0$-"norm" of a vector being a part of constraints or objective function. In particular, we first study the first-order optimality conditions for these problems. We then propose penalty decomposition (PD) methods for solving them in which a sequence of penalty subproblems are solved by a block coordinate descent (BCD) method. Under some suitable assumptions, we establish that any accumulation point of the sequence generated by the PD methods satisfies the first-order optimality conditions of the problems. Furthermore, for the problems in which the $l_0$ part is the only nonconvex part, we show that such an accumulation point is a local minimizer of the problems. In addition, we show that any accumulation point of the sequence generated by the BCD method is a saddle point of the penalty subproblem. Moreover, for the problems in which the $l_0$ part is the only nonconvex part, we establish that such an accumulation point is a local minimizer of the penalty subproblem. Finally, we test the performance of our PD methods by applying them to sparse logistic regression, sparse inverse covariance selection, and compressed sensing problems. The computational results demonstrate that our methods generally outperform the existing methods in terms of solution quality and/or speed.

artificial intelligence, machine learning, pd method, (18 more...)

arXiv.org Machine Learning

1205.2334

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

Regularization for Cox's proportional hazards model with NP-dimensionality

Bradic, Jelena, Fan, Jianqing, Jiang, Jiancheng

arXiv.org Machine LearningMay-25-2012

High throughput genetic sequencing arrays with thousands of measurements per sample and a great amount of related censored clinical data have increased demanding need for better measurement specific model selection. In this paper we establish strong oracle properties of nonconcave penalized methods for nonpolynomial (NP) dimensional data with censoring in the framework of Cox's proportional hazards model. A class of folded-concave penalties are employed and both LASSO and SCAD are discussed specifically. We unveil the question under which dimensionality and correlation restrictions can an oracle estimator be constructed and grasped. It is demonstrated that nonconcave penalties lead to significant reduction of the "irrepresentable condition" needed for LASSO model selection consistency. The large deviation result for martingales, bearing interests of its own, is developed for characterizing the strong oracle property. Moreover, the nonconcave regularized estimator, is shown to achieve asymptotically the information bound of the oracle estimator. A coordinate-wise algorithm is developed for finding the grid of solution paths for penalized hazard regression problems, and its performance is evaluated on simulated and gene association study examples.

artificial intelligence, machine learning, penalty, (17 more...)

arXiv.org Machine Learning

doi: 10.1214/11-AOS911

1010.5233

Country: North America > United States > California (0.28)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.34)

Industry:

Law > Civil Rights & Constitutional Law (0.55)
Health & Medicine > Therapeutic Area (0.46)
Education > Educational Setting > Higher Education (0.46)
Health & Medicine > Diagnostic Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.87)

Add feedback

Language-Constraint Reachability Learning in Probabilistic Graphs

Taranto, Claudio, Di Mauro, Nicola, Esposito, Floriana

arXiv.org Artificial IntelligenceMay-24-2012

Probabilistic graphs model uncertainty by means of probabilistic edges whose value quantifies the likelihood of the edge existence or the strength of the link it represents. One of the main issues in probabilistic graphs is how to compute the connectivity of the network. The network reliability problem [4] is a generalization of the pairwise reachability, in which the goal is to determine the probability that all pairs of nodes are reachable from one another. Unlike a deterministic graph in which the reachability function is a binary value function indicating whether or not there is a path connecting two nodes, in the case of probabilistic graphs the function assumes probabilistic values. The concept of reachability in probabilistic graphs is used, along with its specialization, as a tool to compute how two nodes in the graph are likely to be connected. Reachability plays an important role in wide range of applications, such as in peer-to-peer networks [3, 18], for probabilistic-routing problem [2, 10], in road network [11], and in trust analysis in social networks [22].As adopted in these works, reachability is quite similar to the general concept of link prediction [9], whose task may be formalized as follows. Given a networked structure (V,E) made up of a set of data instances V and set of observed links E among some nodes in V, the task corresponds to predict how likely should exist an unobserved link between two nodes in the network. The extension to probabilistic graphs adds an important ingredient that should be adequately exploited.

graph, probabilistic graph, probability, (13 more...)

arXiv.org Artificial Intelligence

1205.5367

Country:

Europe > Middle East > Malta > Port Region > Southern Harbour District > Floriana (0.05)
North America > United States > New York (0.04)
North America > United States > New Hampshire > Grafton County > Hanover (0.04)
(2 more...)

Genre: Research Report (0.84)

Industry:

Transportation (0.74)
Information Technology > Services (0.34)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining (0.89)

Add feedback

Variance function estimation in high-dimensions

Kolar, Mladen, Sharpnack, James

arXiv.org Machine LearningMay-21-2012

We consider the high-dimensional heteroscedastic regression model, where the mean and the log variance are modeled as a linear combination of input variables. Existing literature on high-dimensional linear regres- sion models has largely ignored non-constant error variances, even though they commonly occur in a variety of applications ranging from biostatis- tics to finance. In this paper we study a class of non-convex penalized pseudolikelihood estimators for both the mean and variance parameters. We show that the Heteroscedastic Iterative Penalized Pseudolikelihood Optimizer (HIPPO) achieves the oracle property, that is, we prove that the rates of convergence are the same as if the true model was known. We demonstrate numerical properties of the procedure on a simulation study and real world data.

artificial intelligence, exp, machine learning, (17 more...)

arXiv.org Machine Learning

1205.477

Country: Europe (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Add feedback

Hypothesis testing using pairwise distances and associated kernels (with Appendix)

Sejdinovic, Dino, Gretton, Arthur, Sriperumbudur, Bharath, Fukumizu, Kenji

arXiv.org Machine LearningMay-21-2012

We provide a unifying framework linking two classes of statistics used in two-sample and independence testing: on the one hand, the energy distances and distance covariances from the statistics literature; on the other, distances between embeddings of distributions to reproducing kernel Hilbert spaces (RKHS), as established in machine learning. The equivalence holds when energy distances are computed with semimetrics of negative type, in which case a kernel may be defined such that the RKHS distance between distributions corresponds exactly to the energy distance. We determine the class of probability distributions for which kernels induced by semimetrics are characteristic (that is, for which embeddings of the distributions to an RKHS are injective). Finally, we investigate the performance of this family of kernels in two-sample and independence tests: we show in particular that the energy distance most commonly employed in statistics is just one member of a parametric family of kernels, and that other choices from this family can yield more powerful tests.

artificial intelligence, machine learning, scientific discovery, (17 more...)

arXiv.org Machine Learning

1205.0411

Country: Europe (0.28)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.41)

Add feedback

Towards a General Framework for Maximum Entropy Reasoning

Potyka, Nico (Fern University in Hagen)

AAAI ConferencesMay-20-2012

A possible approach to extend classical logics to probabilistic logics is to consider a probability distribution over the classical interpretations that satisfies some constraints and maximizes entropy. Over the past years miscellaneous languages and semantics have been considered often based on similar ideas. In this paper a hierarchy of general probabilistic semantics is developed. It incorporates some interesting specific semantics and a family of standard semantics that can be used to extend arbitrary languages with finite interpretation sets to probabilistic languages. We use the hierarchy to generalize an approach reducing the complexity of the whole entailment process and sketch the importance for further theoretical and practical applications.

formula, interpretation, probability distribution, (15 more...)

AAAI Conferences

Twenty-Fifth International FLAIRS Conference

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada > Ontario > Toronto (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.41)

Add feedback

Asymptotic Maximum Entropy Principle for Utility Elicitation under High Uncertainty and Partial Information

Hadfi, Rafik (Nagoya Institute of Technology) | Ito, Takayuki (Nagoya Institute of Technology)

AAAI ConferencesMay-20-2012

Decision making has proposed multiple methods to help the decision maker in his analysis, by suggesting ways of formalization of the preferences as well as the assessment of the uncertainties. Although these techniques are established and proven to be mathematically sound, experience has shown that in certain situations we tend to avoid the formal approach by acting intuitively. Especially, when the decision involves a large number of attributes and outcomes, and where we need to use pragmatic and heuristic simplifications such as considering only the most important attributes and omitting the others. In this paper, we provide a model for decision making in situations subject to a large predictive uncertainty with a small learning sample. The high predictive uncertainty is concretized by a countably infinite number of prospects, making the preferences assessment more difficult. Our main result is an extension of the Maximum Entropy utility (MEU) principle into an asymptotic maximum entropy utility principle for preferences elicitation. This will allow us to overcome the limits of the existing MEU method to the extend that we focus on utility assessment when the set of the available discrete prospects is countably infinite. Furthermore, our proposed model can be used to analyze situations of high-cognitive load as well as to understand how humans handle these problems under Ceteris Paribus assumption.

convergence, utility function, utility increment vector, (13 more...)

AAAI Conferences

Twenty-Fifth International FLAIRS Conference

Country:

Asia > Japan (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.81)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)

Add feedback

Towards Data Driven Model Improvement

Qiu, Yumeng (Worcester Polytechnic Institute) | Pardos, Zachary A. (Worcester Polytechnic Institute) | Heffernan, Neil T (Worcester Polytechnic Institute)

AAAI ConferencesMay-20-2012

In the area of student knowledge assessment, knowledge tracing is a model that has been used for over a decade to predict student knowledge and performance. Many modifications to this model have been proposed and evaluated, however, the modifications are often based on a combination of intuition and experience in the domain. This method of model improvement can be difficult for researchers without high level of domain experience and furthermore, the best improvements to the model could be unintuitive ones. Therefore, we propose a completely data driven approach to model improvement. This alternative allows for researchers to evaluate which aspects of a model are most likely to result in model performance improvement. Our results suggest a variety of different improvements to knowledge tracing many of which have not been explored.

dataset, prediction, student, (15 more...)

AAAI Conferences

Twenty-Fifth International FLAIRS Conference

Country: Europe > Spain > Andalusia > Córdoba Province > Córdoba (0.04)

Genre: Research Report > New Finding (0.69)

Industry: Education > Educational Technology > Educational Software > Computer Based Training (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Add feedback