Regression
Shapley Chains: Extending Shapley Values to Classifier Chains
Ayad, Célia Wafa, Bonnier, Thomas, Bosch, Benjamin, Read, Jesse
In spite of increased attention on explainable machine learning models, explaining multi-output predictions has not yet been extensively addressed. Methods that use Shapley values to attribute feature contributions to the decision making are one of the most popular approaches to explain local individual and global predictions. By considering each output separately in multi-output tasks, these methods fail to provide complete feature explanations. We propose Shapley Chains to overcome this issue by including label interdependencies in the explanation design process. Shapley Chains assign Shapley values as feature importance scores in multi-output classification using classifier chains, by separating the direct and indirect influence of these feature scores. Compared to existing methods, this approach allows to attribute a more complete feature contribution to the predictions of multi-output classification tasks. We provide a mechanism to distribute the hidden contributions of the outputs with respect to a given chaining order of these outputs. Moreover, we show how our approach can reveal indirect feature contributions missed by existing approaches. Shapley Chains help to emphasize the real learning factors in multi-output applications and allows a better understanding of the flow of information through output interdependencies in synthetic and real-world datasets.
A Note On Nonlinear Regression Under L2 Loss
We investigate the nonlinear regression problem under L2 loss (square loss) functions. Traditional nonlinear regression models often result in non-convex optimization problems with respect to the parameter set. We show that a convex nonlinear regression model exists for the traditional least squares problem, which can be a promising towards designing more complex systems with easier to train models.
A Machine Learning Approach to Forecasting Honey Production with Tree-Based Methods
Brini, Alessio, Giovannini, Elisa, Smaniotto, Elia
The beekeeping sector has undergone considerable production variations over the past years due to adverse weather conditions, occurring more frequently as climate change progresses. These phenomena can be high-impact and cause the environment to be unfavorable to the bees' activity. We disentangle the honey production drivers with tree-based methods and predict honey production variations for hives in Italy, one of the largest honey producers in Europe. The database covers hundreds of beehive data from 2019-2022 gathered with advanced precision beekeeping techniques. We train and interpret the machine learning models making them prescriptive other than just predictive. Superior predictive performances of tree-based methods compared to standard linear techniques allow for better protection of bees' activity and assess potential losses for beekeepers for risk management.
Student-centric Model of Learning Management System Activity and Academic Performance: from Correlation to Causation
Mandalapu, Varun, Chen, Lujie Karen, Shetty, Sushruta, Chen, Zhiyuan, Gong, Jiaqi
In recent years, there is a lot of interest in modeling students' digital traces in Learning Management System (LMS) to understand students' learning behavior patterns including aspects of meta-cognition and self-regulation, with the ultimate goal to turn those insights into actionable information to support students to improve their learning outcomes. In achieving this goal, however, there are two main issues that need to be addressed given the existing literature. Firstly, most of the current work is course-centered (i.e. models are built from data for a specific course) rather than student-centered; secondly, a vast majority of the models are correlational rather than causal. Those issues make it challenging to identify the most promising actionable factors for intervention at the student level where most of the campus-wide academic support is designed for. In this paper, we explored a student-centric analytical framework for LMS activity data that can provide not only correlational but causal insights mined from observational data. We demonstrated this approach using a dataset of 1651 computing major students at a public university in the US during one semester in the Fall of 2019. This dataset includes students' fine-grained LMS interaction logs and administrative data, e.g. demographics and academic performance. In addition, we expand the repository of LMS behavior indicators to include those that can characterize the time-of-the-day of login (e.g. chronotype). Our analysis showed that student login volume, compared with other login behavior indicators, is both strongly correlated and causally linked to student academic performance, especially among students with low academic performance. We envision that those insights will provide convincing evidence for college student support groups to launch student-centered and targeted interventions that are effective and scalable.
Multinomial Logistic Regression Algorithms via Quadratic Gradient
Multinomial logistic regression, also known by other names such as multiclass logistic regression and softmax regression, is a fundamental classification method that generalizes binary logistic regression to multiclass problems. A recently work proposed a faster gradient called $\texttt{quadratic gradient}$ that can accelerate the binary logistic regression training, and presented an enhanced Nesterov's accelerated gradient (NAG) method for binary logistic regression. In this paper, we extend this work to multiclass logistic regression and propose an enhanced Adaptive Gradient Algorithm (Adagrad) that can accelerate the original Adagrad method. We test the enhanced NAG method and the enhanced Adagrad method on some multiclass-problem datasets. Experimental results show that both enhanced methods converge faster than their original ones respectively.
Working with Regression Functions part1(Machine Learning)
Abstract: Functional regression analysis is an established tool for many contemporary scientific applications. Regression problems involving large and complex data sets are ubiquitous, and feature selection is crucial for avoiding overfitting and achieving accurate predictions. We propose a new, flexible, and ultra-efficient approach to perform feature selection in a sparse high dimensional function-on-function regression problem, and we show how to extend it to the scalar-on-function framework. Our method combines functional data, optimization, and machine learning techniques to perform feature selection and parameter estimation simultaneously. We exploit the properties of Functional Principal Components, and the sparsity inherent to the Dual Augmented Lagrangian problem to significantly reduce computational cost, and we introduce an adaptive scheme to improve selection accuracy.
Back To Basics, Part Uno: Linear Regression and Cost Function
These concepts form the foundation of many machine learning algorithms. Initially, I decided against writing an article on these topics because they are so widely covered. However, I have changed my mind because understanding these concepts is essential for understanding more advanced topics like Neural Networks (that I plan on tackling in the near future). In addition, this series will be divided into two parts to make it more manageable and organized for better understanding. So make yourself comfortable, grab a cup of coffee, and get ready to embark on a magical journey of machine learning. As with any machine learning problem, we begin with a specific question we want to answer.
Working with Regression Functions part2(Machine Learning)
Abstract: The problem of domain generalization is to learn, given data from different source distributions, a model that can be expected to generalize well on new target distributions which are only seen through unlabeled samples. In this paper, we study domain generalization as a problem of functional regression. Our concept leads to a new algorithm for learning a linear operator from marginal distributions of inputs to the corresponding conditional distributions of outputs given inputs. Our algorithm allows a source distribution-dependent construction of reproducing kernel Hilbert spaces for prediction, and, satisfies finite sample error bounds for the idealized risk. Abstract: eed-forward neural networks (NN) are a staple machine learning method widely used in many areas of science and technology.
5 Essential Books for Beginners in Data Science
The beauty of learning complex things is by breaking them down into smaller simple things. Nobody was born an expert, just like the writer did not become a data geek until after campus -- without even a Data Science background. Nevertheless, you should be in love with mathematics and coding to even appreciate the most difficult concepts in Data Science. To be a Data Science pro, you should be skilled in Statistics, Machine Learning,Deep Leaning; capable of knowing the right tools in those fields. Apparently, there is more to Data than collecting, preparing and cleaning using tools like MS Excel, R, SQL and Tableau that you will find in any Data Analytics course. Data Analytics answers questions pertaining descriptive, diagnostic and prescriptive analytics while Data Science involves an additional field known as predictive analytics.
Introduction to PyTorch: from training loop to prediction
That said, let's see what the code for writing a logistic regression model looks like. Our class inherits from nn.Module. This class provides the methods behind the scenes that make the model work. The __init__ method of a class contains the logic that runs when instantiating a class in Python. Here we pass two arguments: the number of features and the number of classes to predict.