AITopics | Regression

Collaborating Authors

Regression

News Overviews Instructional Materials AI-Alerts Classics

A Dirty Model for Multiple Sparse Regression

Jalali, Ali, Ravikumar, Pradeep, Sanghavi, Sujay

arXiv.org Machine LearningJun-28-2011

Sparse linear regression -- finding an unknown vector from linear measurements -- is now known to be possible with fewer samples than variables, via methods like the LASSO. We consider the multiple sparse linear regression problem, where several related vectors -- with partially shared support sets -- have to be recovered. A natural question in this setting is whether one can use the sharing to further decrease the overall number of samples required. A line of recent research has studied the use of \ell_1/\ell_q norm block-regularizations with q>1 for such problems; however these could actually perform worse in sample complexity -- vis a vis solving each problem separately ignoring sharing -- depending on the level of sharing. We present a new method for multiple sparse linear regression that can leverage support and parameter overlap when it exists, but not pay a penalty when it does not. A very simple idea: we decompose the parameters into two components and regularize these differently. We show both theoretically and empirically, our method strictly and noticeably outperforms both \ell_1 or \ell_1/\ell_q methods, over the entire range of possible overlaps (except at boundary cases, where we match the best method). We also provide theoretical guarantees that the method performs well under high-dimensional scaling.

artificial intelligence, machine learning, probability, (17 more...)

arXiv.org Machine Learning

1106.5826

Country: North America > United States > Texas (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

High-dimensional covariance estimation based on Gaussian graphical models

Zhou, Shuheng, Rutimann, Philipp, Xu, Min, Buhlmann, Peter

arXiv.org Machine LearningJun-22-2011

Undirected graphs are often used to describe high dimensional distributions. Under sparsity conditions, the graph can be estimated using $\ell_1$-penalization methods. We propose and study the following method. We combine a multiple regression approach with ideas of thresholding and refitting: first we infer a sparse undirected graphical model structure via thresholding of each among many $\ell_1$-norm penalized regression functions; we then estimate the covariance matrix and its inverse using the maximum likelihood estimator. We show that under suitable conditions, this approach yields consistent estimation in terms of graphical structure and fast convergence rates with respect to the operator and Frobenius norm for the covariance matrix and its inverse. We also derive an explicit bound for the Kullback Leibler divergence.

artificial intelligence, machine learning, matrix, (17 more...)

arXiv.org Machine Learning

1009.053

Country: North America > United States > Michigan (0.27)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Random forest models of the retention constants in the thin layer chromatography

Kursa, Miron B., Komsta, Łukasz, Rudnicki, Witold R.

arXiv.org Artificial IntelligenceJun-16-2011

In the current study we examine an application of the machine learning methods to model the retention constants in the thin layer chromatography (TLC). This problem can be described with hundreds or even thousands of descriptors relevant to various molecular properties, most of them redundant and not relevant for the retention constant prediction. Hence we employed feature selection to significantly reduce the number of attributes. Additionally we have tested application of the bagging procedure to the feature selection. The random forest regression models were built using selected variables. The resulting models have better correlation with the experimental data than the reference models obtained with linear regression. The cross-validation confirms robustness of the models.

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Artificial Intelligence

1106.3361

Country:

Europe > Poland > Masovia Province > Warsaw (0.05)
Europe > Poland > Lublin Province > Lublin (0.05)

Genre: Research Report > New Finding (0.48)

Industry: Materials > Chemicals > Commodity Chemicals (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)

Add feedback

Causal Network Inference via Group Sparse Regularization

Bolstad, Andrew, Van Veen, Barry, Nowak, Robert

arXiv.org Machine LearningJun-3-2011

This paper addresses the problem of inferring sparse causal networks modeled by multivariate auto-regressive (MAR) processes. Conditions are derived under which the Group Lasso (gLasso) procedure consistently estimates sparse network structure. The key condition involves a "false connection score." In particular, we show that consistent recovery is possible even when the number of observations of the network is far less than the number of parameters describing the network, provided that the false connection score is less than one. The false connection score is also demonstrated to be a useful metric of recovery in non-asymptotic regimes. The conditions suggest a modified gLasso procedure which tends to improve the false connection score and reduce the chances of reversing the direction of causal influence. Computational experiments and a real network based electrocorticogram (ECoG) simulation study demonstrate the effectiveness of the approach.

artificial intelligence, machine learning, node, (16 more...)

arXiv.org Machine Learning

doi: 10.1109/TSP.2011.2129515

1106.0762

Country: North America > United States (0.93)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Automated Assessment of Paragraph Quality: Introduction, Body, and Conclusion Paragraphs

Roscoe, Rod (University of Memphis) | Crossley, Scott (Georgia State University) | Weston, Jennifer (University of Memphis) | McNamara, Danielle (University of Memphis)

AAAI ConferencesMay-18-2011

Natural language processing and statistical methods were used to identify linguistic features associated with the quality of student-generated paragraphs. Linguistic features were assessed using Coh-Metrix. The resulting computational models demonstrated small to medium effect sizes for predicting paragraph quality: introduction quality r2 = .25, body quality r2 = .10, and conclusion quality r2 = .11. Although the variance explained was somewhat low, the linguistic features identified were consistent with the rhetorical goals of paragraph types. Avenues for bolstering this approach by considering individual writing styles and techniques are considered.

information, paragraph, paragraph quality, (17 more...)

AAAI Conferences

Twenty-Fourth International FLAIRS Conference

Country:

North America > United States > California (0.04)
North America > United States > Tennessee > Shelby County > Memphis (0.04)
North America > United States > Mississippi (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Education > Assessment & Standards > Student Performance (0.69)
Education > Educational Technology > Educational Software (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

Simulating Human Ratings on Word Concreteness

Feng, Shi (University of Memphis) | Cai, Zhiqiang (University of Memphis) | Crossley, Scott (Georgia State University) | McNamara, Danielle S ( University of Memphis )

AAAI ConferencesMay-18-2011

However, word concreteness is not an attribute that a A single word in the human language has many complex computer can directly compute. One means of assessing dimensions such as semantics, parts of speech, lexical type, the characteristics of words is by having humans rate them imagability, concreteness, familiarity, etc. It is important to on the dimensions of interest. Humans are proficient in know the dimensions of words in languages so that we can categorizing words into linguistic dimensions, but it is develop a better theoretical understanding of language and impractical to have humans rating tens of thousands of also to build tools that simulate human intelligence and words that we would need for psycholinguistic research.

concreteness, mcnamara, word concreteness, (16 more...)

AAAI Conferences

Twenty-Fourth International FLAIRS Conference

Country:

North America > United States > New Jersey > Bergen County > Mahwah (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Tennessee > Shelby County > Memphis (0.04)
North America > United States > New Jersey > Somerset County > Somerset (0.04)

Genre: Research Report > New Finding (0.95)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)
Information Technology > Artificial Intelligence > Cognitive Science > Simulation of Human Behavior (0.40)

Add feedback

Regression Conformal Prediction with Nearest Neighbours

Papadopoulos, H., Vovk, V., Gammerman, A.

Journal of Artificial Intelligence ResearchApr-30-2011

In this paper we apply Conformal Prediction (CP) to the k-Nearest Neighbours Regression (k-NNR) algorithm and propose ways of extending the typical nonconformity measure used for regression so far. Unlike traditional regression methods which produce point predictions, Conformal Predictors output predictive regions that satisfy a given confidence level. The regions produced by any Conformal Predictor are automatically valid, however their tightness and therefore usefulness depends on the nonconformity measure used by each CP. In effect a nonconformity measure evaluates how strange a given example is compared to a set of other examples based on some traditional machine learning algorithm. We define six novel nonconformity measures based on the k-Nearest Neighbours Regression algorithm and develop the corresponding CPs following both the original (transductive) and the inductive CP approaches. A comparison of the predictive regions produced by our measures with those of the typical regression measure suggests that a major improvement in terms of predictive region tightness is achieved by the new measures.

icp, nonconformity measure, predictive region, (13 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.3198

AI Access Foundation

10703

Journal of Artificial Intelligence Research

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Austria > Vienna (0.14)
South America > Paraguay > Asunción > Asunción (0.04)
(7 more...)

Genre: Research Report (0.47)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression

Kakade, Sham, Kalai, Adam Tauman, Kanade, Varun, Shamir, Ohad

arXiv.org Artificial IntelligenceApr-11-2011

Generalized Linear Models (GLMs) and Single Index Models (SIMs) provide powerful generalizations of linear regression, where the target variable is assumed to be a (possibly unknown) 1-dimensional function of a linear predictor. In general, these problems entail non-convex estimation procedures, and, in practice, iterative local search heuristics are often used. Kalai and Sastry (2009) recently provided the first provably efficient method for learning SIMs and GLMs, under the assumptions that the data are in fact generated under a GLM and under certain monotonicity and Lipschitz constraints. However, to obtain provable performance, the method requires a fresh sample every iteration. In this paper, we provide algorithms for learning GLMs and SIMs, which are both computationally and statistically efficient. We also provide an empirical study, demonstrating their feasibility in practice.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

1104.2018

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.66)

Add feedback

Robust Nonparametric Regression via Sparsity Control with Application to Load Curve Data Cleansing

Mateos, Gonzalo, Giannakis, Georgios B.

arXiv.org Machine LearningApr-3-2011

Nonparametric methods are widely applicable to statistical inference problems, since they rely on a few modeling assumptions. In this context, the fresh look advocated here permeates benefits from variable selection and compressive sampling, to robustify nonparametric regression against outliers - that is, data markedly deviating from the postulated models. A variational counterpart to least-trimmed squares regression is shown closely related to an L0-(pseudo)norm-regularized estimator, that encourages sparsity in a vector explicitly modeling the outliers. This connection suggests efficient solvers based on convex relaxation, which lead naturally to a variational M-type estimator equivalent to the least-absolute shrinkage and selection operator (Lasso). Outliers are identified by judiciously tuning regularization parameters, which amounts to controlling the sparsity of the outlier vector along the whole robustification path of Lasso solutions. Reduced bias and enhanced generalization capability are attractive features of an improved estimator obtained after replacing the L0-(pseudo)norm with a nonconvex surrogate. The novel robust spline-based smoother is adopted to cleanse load curve data, a key task aiding operational decisions in the envisioned smart grid system. Computer simulations and tests on real load curve data corroborate the effectiveness of the novel sparsity-controlling robust estimators.

data mining, data quality, machine learning, (20 more...)

arXiv.org Machine Learning

doi: 10.1109/TSP.2011.2181837

1104.0455

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (0.81)

Industry:

Government > Regional Government > North America Government > United States Government (0.93)
Energy > Power Industry (0.66)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)

Add feedback

Regularizers for Structured Sparsity

Micchelli, Charles A., Morales, Jean M., Pontil, Massimiliano

arXiv.org Machine LearningMar-30-2011

We study the problem of learning a sparse linear regression vector under additional conditions on the structure of its sparsity pattern. This problem is relevant in machine learning, statistics and signal processing. It is well known that a linear regression can benefit from knowledge that the underlying regression vector is sparse. The combinatorial problem of selecting the nonzero components of this vector can be "relaxed" by regularizing the squared error with a convex penalty function like the $\ell_1$ norm. However, in many applications, additional conditions on the structure of the regression vector and its sparsity pattern are available. Incorporating this information into the learning method may lead to a significant decrease of the estimation error. In this paper, we present a family of convex penalty functions, which encode prior knowledge on the structure of the vector formed by the absolute values of the regression coefficients. This family subsumes the $\ell_1$ norm and is flexible enough to include different models of sparsity patterns, which are of practical and theoretical importance. We establish the basic properties of these penalty functions and discuss some examples where they can be computed explicitly. Moreover, we present a convergent optimization algorithm for solving regularized least squares with these penalty functions. Numerical simulations highlight the benefit of structured sparsity and the advantage offered by our approach over the Lasso method and other related methods.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Machine Learning

1010.0556

Country:

North America > United States (0.46)
Asia (0.28)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.74)

Add feedback