AITopics | entire regularization path

Tree ensembles are powerful models that achieve excellent predictive performances, but can grow to unwieldy sizes. These ensembles are often post-processed (pruned) to reduce memory footprint and improve interpretability. We present ForestPrune, a novel optimization framework to post-process tree ensembles by pruning depth layers from individual trees. Since the number of nodes in a decision tree increases exponentially with tree depth, pruning deep trees drastically compactifies ensembles. We develop a specialized optimization algorithm to efficiently obtain high-quality solutions to problems under ForestPrune. Our algorithm typically reaches good solutions in seconds for medium-size datasets and ensembles, with 10000s of rows and 100s of trees, resulting in significant speedups over existing approaches. Our experiments demonstrate that ForestPrune produces parsimonious models that outperform models extracted by existing post-processing algorithms.

artificial intelligence, ensemble, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2206.00128

Country: Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The Well Tempered Lasso

Li, Yuanzhi, Singer, Yoram

arXiv.org Machine LearningJun-8-2018

We study the complexity of the entire regularization path for least squares regression with 1-norm penalty, known as the Lasso. Every regression parameter in the Lasso changes linearly as a function of the regularization value. The number of changes is regarded as the Lasso's complexity. Experimental results using exact path following exhibit polynomial complexity of the Lasso in the problem size. Alas, the path complexity of the Lasso on artificially designed regression problems is exponential. We use smoothed analysis as a mechanism for bridging the gap between worst case settings and the de facto low complexity. Our analysis assumes that the observed data has a tiny amount of intrinsic noise. We then prove that the Lasso's complexity is polynomial in the problem size. While building upon the seminal work of Spielman and Teng on smoothed complexity, our analysis is morally different as it is divorced from specific path following algorithms. We verify the validity of our analysis in experiments with both worst case settings and real datasets. The empirical results we obtain closely match our analysis.

artificial intelligence, linear segment, machine learning, (19 more...)

arXiv.org Machine Learning

1806.0319

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Exploring the Entire Regularization Path for the Asymmetric Cost Linear Support Vector Machine

Wesierski, Daniel

arXiv.org Machine LearningOct-12-2016

We propose an algorithm for exploring the entire regularization path of asymmetric-cost linear support vector machines. Empirical evidence suggests the predictive power of support vector machines depends on the regularization parameters of the training algorithms. The algorithms exploring the entire regularization paths have been proposed for single-cost support vector machines thereby providing the complete knowledge on the behavior of the trained model over the hyperparameter space. Considering the problem in two-dimensional hyperparameter space though enables our algorithm to maintain greater flexibility in dealing with special cases and sheds light on problems encountered by algorithms building the paths in one-dimensional spaces. We demonstrate two-dimensional regularization paths for linear support vector machines that we train on synthetic and real data.

algorithm, artificial intelligence, machine learning, (13 more...)

arXiv.org Machine Learning

1610.03738

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Add feedback

The Entire Regularization Path for the Support Vector Machine

Rosset, Saharon, Tibshirani, Robert, Zhu, Ji, Hastie, Trevor J.

Neural Information Processing SystemsDec-31-2005

In this paper we argue that the choice of the SVM cost parameter can be critical. We then derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model.

algorithm, entire regularization path, piecewise linear, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
(4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Add feedback

Computing regularization paths for learning multiple kernels

Bach, Francis R., Thibaux, Romain, Jordan, Michael I.

Neural Information Processing SystemsDec-31-2005

The problem of learning a sparse conic combination of kernel functions or kernel matrices for classification or regression can be achieved via the regularization by a block 1-norm [1]. In this paper, we present an algorithm that computes the entire regularization path for these problems. The path is obtained by using numerical continuation techniques, and involves a running time complexity that is a constant times the complexity of solving the problem for one value of the regularization parameter. Working in the setting of kernel linear regression and kernel logistic regression, we show empirically that the effect of the block 1-norm regularization differs notably from the (non-block) 1-norm regularization commonly used for variable selection, and that the regularization path is of particular value in the block case.

kernel, regression, regularization path, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
Asia > Middle East > Jordan (0.05)

Genre:

Research Report > New Finding (0.35)
Research Report > Experimental Study (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.56)

Add feedback

The Entire Regularization Path for the Support Vector Machine

Rosset, Saharon, Tibshirani, Robert, Zhu, Ji, Hastie, Trevor J.

Neural Information Processing SystemsDec-31-2005

In this paper we argue that the choice of the SVM cost parameter can be critical. We then derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model.

algorithm, entire regularization path, piecewise linear, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
(4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Add feedback

Computing regularization paths for learning multiple kernels

Bach, Francis R., Thibaux, Romain, Jordan, Michael I.

Neural Information Processing SystemsDec-31-2005

The problem of learning a sparse conic combination of kernel functions or kernel matrices for classification or regression can be achieved via the regularization by a block 1-norm [1]. In this paper, we present an algorithm that computes the entire regularization path for these problems. The path is obtained by using numerical continuation techniques, and involves a running time complexity that is a constant times the complexity of solving the problem for one value of the regularization parameter. Working in the setting of kernel linear regression and kernel logistic regression, we show empirically that the effect of the block 1-norm regularization differs notably from the (non-block) 1-norm regularization commonly used for variable selection, and that the regularization path is of particular value in the block case.

kernel, regression, regularization path, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
Asia > Middle East > Jordan (0.05)

Genre:

Research Report > New Finding (0.35)
Research Report > Experimental Study (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.56)

Add feedback

The Entire Regularization Path for the Support Vector Machine

Rosset, Saharon, Tibshirani, Robert, Zhu, Ji, Hastie, Trevor J.

Neural Information Processing SystemsDec-31-2005

We have a set of 71 training pairs 27,-, yi, Where x,- 6 R" is a p-Vector of real valued predictors

algorithm, artificial intelligence, machine learning, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County (0.15)
North America > United States > Michigan (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Add feedback

Computing regularization paths for learning multiple kernels

Bach, Francis R., Thibaux, Romain, Jordan, Michael I.

Neural Information Processing SystemsDec-31-2005

The problem of learning a sparse conic combination of kernel functions or kernel matrices for classification or regression can be achieved via the regularization by a block 1-norm [1]. In this paper, we present an algorithm thatcomputes the entire regularization path for these problems. The path is obtained by using numerical continuation techniques, and involves a running time complexity that is a constant times the complexity ofsolving the problem for one value of the regularization parameter. Working in the setting of kernel linear regression and kernel logistic regression, weshow empirically that the effect of the block 1-norm regularization differsnotably from the (non-block) 1-norm regularization commonly used for variable selection, and that the regularization path is of particular value in the block case.

artificial intelligence, kernel, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report > New Finding (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.56)

Add feedback