AITopics | Regression

Collaborating Authors

Regression

News Overviews Instructional Materials AI-Alerts Classics

TREX: Tree-Ensemble Representer-Point Explanations

arXiv.org Machine LearningSep-24-2020

How can we identify the training examples that contribute most to the prediction of a tree ensemble? In this paper, we introduce TREX, an explanation system that provides instance-attribution explanations for tree ensembles, such as random forests and gradient boosted trees. TREX builds on the representer point framework previously developed for explaining deep neural networks. Since tree ensembles are non-differentiable, we define a kernel that captures the structure of the specific tree ensemble. By using this kernel in kernel logistic regression or a support vector machine, TREX builds a surrogate model that approximates the original tree ensemble. The weights in the kernel expansion of the surrogate model are used to define the global or local importance of each training example. Our experiments show that TREX's surrogate model accurately approximates the tree ensemble; its global importance weights are more effective in dataset debugging than the previous state-of-the-art; its explanations identify the most influential samples better than alternative methods under the remove and retrain evaluation framework; it runs orders of magnitude faster than alternative methods; and its local explanations can identify and explain errors due to domain mismatch.

artificial intelligence, machine learning, prediction, (19 more...)

arXiv.org Machine Learning

2009.0553

Country: North America > United States > Oregon (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

Using Orange to Build a Machine Learning Model

#artificialintelligenceSep-23-2020, 11:10:34 GMT

Orange is an open-source, GUI based platform that is popularly used for rule mining and easy data analysis. The reason behind the popularity of this platform is it is completely code-free. Researchers, students, non-developers and business analysts use platforms like Orange to get a good understanding of the data at hand and also quickly build machine learning models to understand the relationship between the data points better. Orange is a platform built on Python that lets you do everything required to build machine learning models without code. Orange includes a wide range of data visualisation, exploration, preprocessing and modelling techniques. Not only does it become handy in machine learning, but it is also very useful for associative rule mining of numbers, text and even network analysis.

artificial intelligence, dataset, machine learning, (16 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.33)

Add feedback

Rank-Based Multi-task Learning for Fair Regression

Zhao, Chen, Chen, Feng

arXiv.org Machine LearningSep-23-2020

In this work, we develop a novel fairness learning approach for multi-task regression models based on a biased training dataset, using a popular rank-based non-parametric independence test, i.e., Mann Whitney U statistic, for measuring the dependency between target variable and protected variables. To solve this learning problem efficiently, we first reformulate the problem as a new non-convex optimization problem, in which a non-convex constraint is defined based on group-wise ranking functions of individual objects. We then develop an efficient model-training algorithm based on the framework of non-convex alternating direction method of multipliers (NC-ADMM), in which one of the main challenges is to implement an efficient projection oracle to the preceding non-convex set defined based on ranking functions. Through the extensive experiments on both synthetic and real-world datasets, we validated the out-performance of our new approach against several state-of-the-art competitive methods on several popular metrics relevant to fairness learning.

artificial intelligence, constraint, machine learning, (17 more...)

arXiv.org Machine Learning

2009.11405

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Texas > Dallas County > Dallas (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report (1.00)

Industry: Education (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Add feedback

Learning Mixtures of Low-Rank Models

Chen, Yanxi, Ma, Cong, Poor, H. Vincent, Chen, Yuxin

arXiv.org Machine LearningSep-23-2020

We study the problem of learning mixtures of low-rank models, i.e. reconstructing multiple low-rank matrices from unlabelled linear measurements of each. This problem enriches two widely studied settings -- low-rank matrix sensing and mixed linear regression -- by bringing latent variables (i.e. unknown labels) and structural priors (i.e. low-rank structures) into consideration. To cope with the non-convexity issues arising from unlabelled heterogeneous data and low-complexity structure, we develop a three-stage meta-algorithm that is guaranteed to recover the unknown matrices with near-optimal sample and computational complexities under Gaussian designs. In addition, the proposed algorithm is provably stable against random noise. We complement the theoretical studies with empirical evidence that confirms the efficacy of our algorithm.

artificial intelligence, machine learning, matrix, (15 more...)

arXiv.org Machine Learning

2009.11282

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(2 more...)

Genre: Research Report (0.81)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.50)

Add feedback

An $l_1$-oracle inequality for the Lasso in mixture-of-experts regression models

Nguyen, TrungTin, Nguyen, Hien D, Chamroukhi, Faicel, McLachlan, Geoffrey J

arXiv.org Artificial IntelligenceSep-22-2020

Mixture-of-experts (MoE) models are a popular framework for modeling heterogeneity in data, for both regression and classification problems in statistics and machine learning, due to their flexibility and the abundance of statistical estimation and model choice tools. Such flexibility comes from allowing the mixture weights (or gating functions) in the MoE model to depend on the explanatory variables, along with the experts (or component densities). This permits the modeling of data arising from more complex data generating processes, compared to the classical finite mixtures and finite mixtures of regression models, whose mixing parameters are independent of the covariates. The use of MoE models in a high-dimensional setting, when the number of explanatory variables can be much larger than the sample size (i.e., $p\gg n$), is challenging from a computational point of view, and in particular from a theoretical point of view, where the literature is still lacking results in dealing with the curse of dimensionality, in both the statistical estimation and feature selection. We consider the finite mixture-of-experts model with soft-max gating functions and Gaussian experts for high-dimensional regression on heterogeneous data, and its $l_1$-regularized estimation via the Lasso. We focus on the Lasso estimation properties rather than its feature selection properties. We provide a lower bound on the regularization parameter of the Lasso function that ensures an $l_1$-oracle inequality satisfied by the Lasso estimator according to the Kullback-Leibler loss.

artificial intelligence, machine learning, regression model, (17 more...)

arXiv.org Artificial Intelligence

2009.10622

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

An Exponential Factorization Machine with Percentage Error Minimization to Retail Sales Forecasting

Li, Chongshou, Cheang, Brenda, Luo, Zhixing, Lim, Andrew

arXiv.org Machine LearningSep-22-2020

This paper proposes a new approach to sales forecasting for new products with long lead time but short product life cycle. These SKUs are usually sold for one season only, without any replenishments. An exponential factorization machine (EFM) sales forecast model is developed to solve this problem which not only considers SKU attributes, but also pairwise interactions. The EFM model is significantly different from the original Factorization Machines (FM) from two-fold: (1) the attribute-level formulation for explanatory variables and (2) exponential formulation for the positive response variable. The attribute-level formation excludes infeasible intra-attribute interactions and results in more efficient feature engineering comparing with the conventional one-hot encoding, while the exponential formulation is demonstrated more effective than the log-transformation for the positive but not skewed distributed responses. In order to estimate the parameters, percentage error squares (PES) and error squares (ES) are minimized by a proposed adaptive batch gradient descent method over the training set. Real-world data provided by a footwear retailer in Singapore is used for testing the proposed approach. The forecasting performance in terms of both mean absolute percentage error (MAPE) and mean absolute error (MAE) compares favourably with not only off-the-shelf models but also results reported by extant sales and demand forecasting studies. The effectiveness of the proposed approach is also demonstrated by two external public datasets. Moreover, we prove the theoretical relationships between PES and ES minimization, and present an important property of the PES minimization for regression models; that it trains models to underestimate data. This property fits the situation of sales forecasting where unit-holding cost is much greater than the unit-shortage cost.

interaction, loss function, pes, (9 more...)

arXiv.org Machine Learning

2009.10619

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > Singapore > Central Region > Singapore (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (1.00)

Industry: Retail (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

ABM: an automatic supervised feature engineering method for loss based models based on group and fused lasso

Luo, Weijian, Long, Yongxian

arXiv.org Machine LearningSep-22-2020

A vital problem in solving classification or regression problem is to apply feature engineering and variable selection on data before fed into models.One of a most popular feature engineering method is to discretisize continous variable with some cutting points,which is refered to as bining processing.Good cutting points are important for improving model's ability, because wonderful bining may ignore some noisy variance in continous variable range and keep useful leveled information with good ordered encodings.However, to our best knowledge a majority of cutting point selection is done via researchers domain knownledge or some naive methods like equal-width cutting or equal-frequency cutting.In this paper we propose an end-to-end supervised cutting point selection method based on group and fused lasso along with the automatically variable selection effect.We name our method \textbf{ABM}(automatic bining machine). We firstly cut each variable range into fine grid bins and train model with our group and group fused lasso regularization on each successive bins.It is a method that integrates feature engineering,variable selection and model training simultanously.And one more inspiring thing is that the method is flexible such that it can be taken into a bunch of loss function based model including deep neural networks.We have also implemented the method in R and open the source code to other researchers.A Python version will also meet the community in days.

artificial intelligence, machine learning, regression model, (17 more...)

arXiv.org Machine Learning

2009.10498

Country:

Asia > China > Beijing > Beijing (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Research Report (0.61)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.39)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Demand Prediction Using Machine Learning Methods and Stacked Generalization

Tugay, Resul, Oguducu, Sule Gunduz

arXiv.org Artificial IntelligenceSep-21-2020

Supply and demand are two fundamental concepts of sellers and customers. Predicting demand accurately is critical for organizations in order to be able to make plans. In this paper, we propose a new approach for demand prediction on an e-commerce web site. The proposed model differs from earlier models in several ways. The business model used in the e-commerce web site, for which the model is implemented, includes many sellers that sell the same product at the same time at different prices where the company operates a market place model. The demand prediction for such a model should consider the price of the same product sold by competing sellers along the features of these sellers. In this study we first applied different regression algorithms for specific set of products of one department of a company that is one of the most popular online e-commerce companies in Turkey. Then we used stacked generalization or also known as stacking ensemble learning to predict demand. Finally, all the approaches are evaluated on a real world data set obtained from the e-commerce company. The experimental results show that some of the machine learning methods do produce almost as good results as the stacked generalization method.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2009.09756

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.05)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
(2 more...)

Genre: Research Report > New Finding (0.69)

Industry: Information Technology > Services (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.46)

Add feedback

A Nearest Neighbor Characterization of Lebesgue Points in Metric Measure Spaces

Cesari, Tommaso, Colomboni, Roberto

arXiv.org Machine LearningSep-21-2020

The property of almost every point being a Lebesgue point has proven to be crucial for the consistency of several classification algorithms based on nearest neighbors. We characterize Lebesgue points in terms of a 1-Nearest Neighbor regression algorithm for pointwise estimation, fleshing out the role played by tie-breaking rules in the corresponding convergence problem. We then give an application of our results, proving the convergence of the risk of a large class of 1-Nearest Neighbor classification algorithms in general metric spaces where almost every point is a Lebesgue point.

artificial intelligence, lebesgue point, machine learning, (17 more...)

arXiv.org Machine Learning

2007.03937

Country:

Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Indiana (0.04)
(2 more...)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

Early Stopping in Deep Networks: Double Descent and How to Eliminate it

Heckel, Reinhard, Yilmaz, Fatih Furkan

arXiv.org Machine LearningSep-19-2020

This intriguing double descent behavior also occurs as a function of training epochs, and has been conjectured to arise because training epochs control the model complexity. In this paper, we show that such epoch-wise double descent arises for a different reason: It is caused by a superposition of two or more bias-variance tradeoffs that arise because different parts of the network are learned at different epochs, and eliminating this by proper scaling of stepsizes can significantly improve the early stopping performance. We show this analytically for i) linear regression, where differently scaled features give rise to a superposition of bias-variance tradeoffs, and for ii) a two-layer neural network, where the first and second layer each govern a bias-variance tradeoff. Inspired by this theory, we study two standard convolutional networks empirically, and show that eliminating epoch-wise double descent through adjusting stepsizes of different layers improves the early stopping performance significantly.

artificial intelligence, descent, machine learning, (17 more...)

arXiv.org Machine Learning

2007.10099

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback