Regression
A Non-linear Function-on-Function Model for Regression with Time Series Data
Wang, Qiyao, Wang, Haiyan, Gupta, Chetan, Rao, Aniruddha Rajendra, Khorasgani, Hamed
In the last few decades, building regression models for non-scalar variables, including time series, text, image, and video, has attracted increasing interests of researchers from the data analytic community. In this paper, we focus on a multivariate time series regression problem. Specifically, we aim to learn mathematical mappings from multiple chronologically measured numerical variables within a certain time interval S to multiple numerical variables of interest over time interval T. Prior arts, including the multivariate regression model, the Seq2Seq model, and the functional linear models, suffer from several limitations. The first two types of models can only handle regularly observed time series. Besides, the conventional multivariate regression models tend to be biased and inefficient, as they are incapable of encoding the temporal dependencies among observations from the same time series. The sequential learning models explicitly use the same set of parameters along time, which has negative impacts on accuracy. The function-on-function linear model in functional data analysis (a branch of statistics) is insufficient to capture complex correlations among the considered time series and suffer from underfitting easily. In this paper, we propose a general functional mapping that embraces the function-on-function linear model as a special case. We then propose a non-linear function-on-function model using the fully connected neural network to learn the mapping from data, which addresses the aforementioned concerns in the existing approaches. For the proposed model, we describe in detail the corresponding numerical implementation procedures. The effectiveness of the proposed model is demonstrated through the application to two real-world problems.
Modern Multiple Imputation with Functional Data
Rao, Aniruddha Rajendra, Reimherr, Matthew
This work considers the problem of fitting functional models with sparsely and irregularly sampled functional data. It overcomes the limitations of the state-of-the-art methods, which face major challenges in the fitting of more complex non-linear models. Currently, many of these models cannot be consistently estimated unless the number of observed points per curve grows sufficiently quickly with the sample size, whereas, we show numerically that a modified approach with more modern multiple imputation methods can produce better estimates in general. We also propose a new imputation approach that combines the ideas of {\it MissForest} with {\it Local Linear Forest} and compare their performance with {\it PACE} and several other multivariate multiple imputation methods. This work is motivated by a longitudinal study on smoking cessation, in which the Electronic Health Records (EHR) from Penn State PaTH to Health allow for the collection of a great deal of data, with highly variable sampling. To illustrate our approach, we explore the relation between relapse and diastolic blood pressure. We also consider a variety of simulation schemes with varying levels of sparsity to validate our methods.
Understanding Linear Regression
Let's say you're looking to buy a new PC from an online store (and you're most interested in how much RAM it has) and you see on their first page some PCs with 4GB at $100, then some with 16 GB at $1000. So, you estimate in your head that given the prices you saw so far, a PC with 8 GB RAM should be around $400. This will fit your budget and decide to buy one such PC with 8 GB RAM. This kind of estimations can happen almost automatically in your head without knowing it's called linear regression and without explicitly computing a regression equation in your head (in our case: y 75x – 200). So, what is linear regression? Linear regression is just the process of estimating an unknown quantity based on some known ones (this is the regression part) with the condition that the unknown quantity can be obtained from the known ones by using only 2 operations: scalar multiplication and addition (this is the linear part).
Learning The TensorFlow Way of Linear Regression
We will loop through batches of data points and let TensorFlow update the slope and y-intercept. Instead of generated data, we will use the iris dataset that is built into the Scikit Learn. Specifically, we will find an optimal line through data points where the x-value is the petal width and the y-value is the sepal length. We choose these two because there appears to be a linear relationship between them, as we will see in the graphs at the end. We will also talk more about the effects of different loss functions in the next section, but for now we will use the L2 loss function.
Road Map for Choosing Between Statistical Modeling and Machine Learning
When we raise money it's AI, when we hire it's machine learning, and when we do the work it's logistic regression. Machine learning (ML) may be distinguished from statistical models (SM) using any of three considerations: Uncertainty: SMs explicitly take uncertainty into account by specifying a probabilistic model for the data. Structural: SMs typically start by assuming additivity of predictor effects when specifying the model. Empirical: ML is more empirical including allowance for high-order interactions that are not pre-specified, whereas SMs have identified parameters of special interest. There is a growing number of hybrid methods combining characteristics of traditional SMs and ML, especially in the Bayesian world.
Structure Learning in Inverse Ising Problems Using $\ell_2$-Regularized Linear Estimator
Meng, Xiangming, Obuchi, Tomoyuki, Kabashima, Yoshiyuki
The inference performance of the pseudolikelihood method is discussed in the framework of the inverse Ising problem when the $\ell_2$-regularized (ridge) linear regression is adopted. This setup is introduced for theoretically investigating the situation where the data generation model is different from the inference one, namely the model mismatch situation. In the teacher-student scenario under the assumption that the teacher couplings are sparse, the analysis is conducted using the replica and cavity methods, with a special focus on whether the presence/absence of teacher couplings is correctly inferred or not. The result indicates that despite the model mismatch, one can perfectly identify the network structure using naive linear regression without regularization when the number of spins $N$ is smaller than the dataset size $M$, in the thermodynamic limit $N\to \infty$. Further, to access the underdetermined region $M < N$, we examine the effect of the $\ell_2$ regularization, and find that biases appear in all the coupling estimates, preventing the perfect identification of the network structure. We, however, find that the biases are shown to decay exponentially fast as the distance from the center spin chosen in the pseudolikelihood method grows. Based on this finding, we propose a two-stage estimator: In the first stage, the ridge regression is used and the estimates are pruned by a relatively small threshold; in the second stage the naive linear regression is conducted only on the remaining couplings, and the resultant estimates are again pruned by another relatively large threshold. This estimator with the appropriate regularization coefficient and thresholds is shown to achieve the perfect identification of the network structure even in $0
Improving Students Performance in Small-Scale Online Courses -- A Machine Learning-Based Intervention
Azimi, Sepinoud, Popa, Carmen-Gabriela, Cucić, Tatjana
The birth of massive open online courses (MOOCs) has had an undeniable effect on how teaching is being delivered. It seems that traditional in class teaching is becoming less popular with the young generation, the generation that wants to choose when, where and at what pace they are learning. As such, many universities are moving towards taking their courses, at least partially, online. However, online courses, although very appealing to the younger generation of learners, come at a cost. For example, the dropout rate of such courses is higher than that of more traditional ones, and the reduced in person interaction with the teachers results in less timely guidance and intervention from the educators. Machine learning (ML) based approaches have shown phenomenal successes in other domains. The existing stigma that applying ML based techniques requires a large amount of data seems to be a bottleneck when dealing with small scale courses with limited amounts of produced data. In this study, we show not only that the data collected from an online learning management system could be well utilized in order to predict students overall performance but also that it could be used to propose timely intervention strategies to boost the students performance level. The results of this study indicate that effective intervention strategies could be suggested as early as the middle of the course to change the course of students progress for the better. We also present an assistive pedagogical tool based on the outcome of this study, to assist in identifying challenging students and in suggesting early intervention strategies.
Conjecturing-Based Computational Discovery of Patterns in Data
Brooks, J. P., Edwards, D. J., Larson, C. E., Van Cleemput, N.
Modern machine learning methods are designed to exploit complex patterns in data regardless of their form, while not necessarily revealing them to the investigator. Here we demonstrate situations where modern machine learning methods are ill-equipped to reveal feature interaction effects and other nonlinear relationships. We propose the use of a conjecturing machine that generates feature relationships in the form of bounds for numerical features and boolean expressions for nominal features that are ignored by machine learning algorithms. The proposed framework is demonstrated for a classification problem with an interaction effect and a nonlinear regression problem. In both settings, true underlying relationships are revealed and generalization performance improves. The framework is then applied to patient-level data regarding COVID-19 outcomes to suggest possible risk factors.
Unfolding the Maths behind Ridge and Lasso Regression!
This article was published as a part of the Data Science Blogathon. Many times we have come across this statement – Lasso regression causes sparsity while Ridge regression doesn't! But I'm pretty sure that most of us might not have understood how exactly this works. Let's try to understand this using calculus. First, let's understand what sparsity is.
All Machine Learning Algorithms You Should Know in 2021
Linear Regression is one of the most fundamental algorithms used to model relationships between a dependent variable and one or more independent variables. In simpler terms, it involves finding the'line of best fit' that represents two or more variables. The line of best fit is found by minimizing the squared distances between the points and the line of best fit -- this is known as minimizing the sum of squared residuals. A residual is simply equal to the predicted value minus the actual value. In case it doesn't make sense yet, consider the image above.