Goto

Collaborating Authors

 multivariate adaptive regression spline


SMART: A Flexible Approach to Regression using Spline-Based Multivariate Adaptive Regression Trees

arXiv.org Machine Learning

Decision trees are powerful for predictive modeling but often suffer from high variance when modeling continuous relationships. While algorithms like Multivariate Adaptive Regression Splines (MARS) excel at capturing such continuous relationships, they perform poorly when modeling discontinuities. To address the limitations of both approaches, we introduce Spline-based Multivariate Adaptive Regression Trees (SMART), which uses a decision tree to identify subsets of data with distinct continuous relationships and then leverages MARS to fit these relationships independently. Unlike other methods that rely on the tree structure to model interaction and higher-order terms, SMART leverages MARS's native ability to handle these terms, allowing the tree to focus solely on identifying discontinuities in the relationship. We test SMART on various datasets, demonstrating its improvement over state-of-the-art methods in such cases. Additionally, we provide an open-source implementation of our method to be used by practitioners.


Beyond Beats: A Recipe to Song Popularity? A machine learning approach

arXiv.org Artificial Intelligence

Music popularity prediction has garnered significant attention in both industry and academia, fuelled by the rise of data-driven algorithms and streaming platforms like Spotify. This study aims to explore the predictive power of various machine learning models in forecasting song popularity using a dataset comprising 30,000 songs spanning different genres from 1957 to 2020. Methods: We employ Ordinary Least Squares (OLS), Multivariate Adaptive Regression Splines (MARS), Random Forest, and XGBoost algorithms to analyse song characteristics and their impact on popularity. Results: Ordinary Least Squares (OLS) regression analysis reveals genre as the primary influencer of popularity, with notable trends over time. MARS modelling highlights the complex relationship between variables, particularly with features like instrumentalness and duration. Random Forest and XGBoost models underscore the importance of genre, especially EDM, in predicting popularity. Despite variations in performance, Random Forest emerges as the most effective model, improving prediction accuracy by 7.1% compared to average scores. Despite the importance of genre, predicting song popularity remains challenging, as observed variations in music-related features suggest complex interactions between genre and other factors. Consequently, while certain characteristics like loudness and song duration may impact popularity scores, accurately predicting song success remains elusive.


Intuition of Multivariate Adaptive Regression Splines (MARS)

#artificialintelligence

Multivariate Adaptive Regression Splines or commonly known as MARS is an algorithm best suited for high dimensional and complex non-linear relationship dataset. It can be seen as a generalised form of Stepwise Regression (Stepwise regression does a forward selection first where it starts loading the model and then pruning or backward selection to remove the variables that don't help reduce the error rate significantly). This function is similar to rectified linear function of neural network where the starting point is the input itself or 0. MARS uses this function to create knots and this is known as Spline. These functions are always generated in pair of Left function and Right function. MARS then generates multiple such functions that we call basis function for all the input variables and then runs linear regression on each basis function's output.


MARS: Multivariate Adaptive Regression Splines -- How to Improve on Linear Regression?

#artificialintelligence

Machine Learning is making huge leaps forward, with an increasing number of algorithms enabling us to solve complex real-world problems. This story is part of a deep dive series explaining the mechanics of Machine Learning algorithms. In addition to giving you an understanding of how ML algorithms work, it also provides you with Python examples to build your own ML models. Before we dive into the specifics of MARS, I assume that you are already familiar with Linear Regression. Looking at the algorithm's full name -- Multivariate Adaptive Regression Splines -- you would be correct to guess that MARS belongs to the group of regression algorithms used to predict continuous (numerical) target variables.


Multivariate Adaptive Regression Splines (MARS) in Python

#artificialintelligence

Multivariate Adaptive Regression Splines, or MARS, is an algorithm for complex non-linear regression problems. The algorithm involves finding a set of simple linear functions that in aggregate result in the best predictive performance. In this way, MARS is a type of ensemble of simple linear functions and can achieve good performance on challenging regression problems with many input variables and complex non-linear relationships. In this tutorial, you will discover how to develop Multivariate Adaptive Regression Spline models in Python. Multivariate Adaptive Regression Splines (MARS) in Python Photo by Sei F, some rights reserved.


Distribution Assertive Regression

arXiv.org Machine Learning

In regression modelling approach, the main step is to fit the regression line as close as possible to the target variable. In this process most algorithms try to fit all of the data in a single line and hence fitting all parts of target variable in one go. It was observed that the error between predicted and target variable usually have a varying behavior across the various quantiles of the dependent variable and hence single point diagnostic like MAPE has its limitation to signify the level of fitness across the distribution of Y(dependent variable). To address this problem, a novel approach is proposed in the paper to deal with regression fitting over various quantiles of target variable. Using this approach we have significantly improved the eccentric behavior of the distance (error) between predicted and actual value of regression. Our proposed solution is based on understanding the segmented behavior of the data with respect to the internal segments within the data and approach for retrospectively fitting the data based on each quantile behavior. We believe exploring and using this approach would help in achieving better and more explainable results in most settings of real world data modelling problems.