Regression
Machine Learning Model using Multinomial Logistic Regression
Learn how to create Machine Learning model from scratch that uses Multinomial Logistic Regression. We are going to have a look at Multinomial Logistic Regression one of the classic supervised machine learning algorithms capable of doing multi-class classification, i.e., predict an outcome for the target variable when there are more than 2 possible discrete classes of outcomes. When it comes to real-world machine learning, around 70% of the problems are classification-based, where, on the basis of the available set of features, your model tries to predict that out of a given set of categories(discrete possible outcomes), what category does your target variable might belong to. This is a project-based guide, where we will see how to code an MLR model from scratch while understanding the mathematics involved that allows the model to make predictions. For the project, we will be working on the famous UCI Cleveland Heart Disease dataset.
AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph modularity
Udrescu, Silviu-Marian, Tan, Andrew, Feng, Jiahai, Neto, Orisvaldo, Wu, Tailin, Tegmark, Max
We present an improved method for symbolic regression that seeks to fit data to formulas that are Pareto-optimal, in the sense of having the best accuracy for a given complexity. It improves on the previous state-of-the-art by typically being orders of magnitude more robust toward noise and bad data, and also by discovering many formulas that stumped previous methods. We develop a method for discovering generalized symmetries (arbitrary modularity in the computational graph of a formula) from gradient properties of a neural network fit. We use normalizing flows to generalize our symbolic regression method to probability distributions from which we only have samples, and employ statistical hypothesis testing to accelerate robust brute-force search.
Deep Multitask Learning for Pervasive BMI Estimation and Identity Recognition in Smart Beds
Davoodnia, Vandad, Slinowsky, Monet, Etemad, Ali
Smart devices in the Internet of Things (IoT) paradigm provide a variety of unobtrusive and pervasive means for continuous monitoring of bio-metrics and health information. Furthermore, automated personalization and authentication through such smart systems can enable better user experience and security. In this paper, simultaneous estimation and monitoring of body mass index (BMI) and user identity recognition through a unified machine learning framework using smart beds is explored. To this end, we utilize pressure data collected from textile-based sensor arrays integrated onto a mattress to estimate the BMI values of subjects and classify their identities in different positions by using a deep multitask neural network. First, we filter and extract 14 features from the data and subsequently employ deep neural networks for BMI estimation and subject identification on two different public datasets. Finally, we demonstrate that our proposed solution outperforms prior works and several machine learning benchmarks by a considerable margin, while also estimating users' BMI in a 10-fold cross-validation scheme.
Feature Space Saturation during Training
Shenk, Justin, Richter, Mats L., Byttner, Wolf, Arpteg, Anders, Huss, Mikael
We propose layer saturation - a simple, online-computable method for analyzing the information processing in neural networks. First, we show that a layer's output can be restricted to the eigenspace of its variance matrix without performance loss. We propose a computationally lightweight method for approximating the variance matrix during training. From the dimension of its lossless eigenspace we derive layer saturation - the ratio between the eigenspace dimension and layer width. We show that saturation seems to indicate which layers contribute to network performance. We demonstrate how to alter layer saturation in a neural network by changing network depth, filter sizes and input resolution. Furthermore, we show that well-chosen input resolution increases network performance by distributing the inference process more evenly across the network.
Independent innovation analysis for nonlinear vector autoregressive process
Morioka, Hiroshi, Hyvรคrinen, Aapo
The nonlinear vector autoregressive (NVAR) model provides an appealing framework to analyze multivariate time series obtained from a nonlinear dynamical system. However, the innovation (or error), which plays a key role by driving the dynamics, is almost always assumed to be additive. Additivity greatly limits the generality of the model, hindering analysis of general NVAR process which have nonlinear interactions between the innovations. Here, we propose a new general framework called independent innovation analysis (IIA), which estimates the innovations from completely general NVAR. We assume mutual independence of the innovations as well as their modulation by a fully observable auxiliary variable (which is often taken as the time index and simply interpreted as nonstationarity). We show that IIA guarantees the identifiability of the innovations with arbitrary nonlinearities, up to a permutation and component-wise invertible nonlinearities. We propose two practical estimation methods, both of which can be easily implemented by ordinary neural network training. We thus provide the first rigorous identifiability result for general NVAR, as well as very general tools for learning such models.
Robust Meta-learning for Mixed Linear Regression with Small Batches
Kong, Weihao, Somani, Raghav, Kakade, Sham, Oh, Sewoong
A common challenge faced in practical supervised learning, such as medical image processing and robotic interactions, is that there are plenty of tasks but each task cannot afford to collect enough labeled examples to be learned in isolation. However, by exploiting the similarities across those tasks, one can hope to overcome such data scarcity. Under a canonical scenario where each task is drawn from a mixture of k linear regressions, we study a fundamental question: can abundant small-data tasks compensate for the lack of big-data tasks? Existing second moment based approaches show that such a trade-off is efficiently achievable, with the help of medium-sized tasks with $\Omega(k^{1/2})$ examples each. However, this algorithm is brittle in two important scenarios. The predictions can be arbitrarily bad (i) even with only a few outliers in the dataset; or (ii) even if the medium-sized tasks are slightly smaller with $o(k^{1/2})$ examples each. We introduce a spectral approach that is simultaneously robust under both scenarios. To this end, we first design a novel outlier-robust principal component analysis algorithm that achieves an optimal accuracy. This is followed by a sum-of-squares algorithm to exploit the information from higher order moments. Together, this approach is robust against outliers and achieves a graceful statistical trade-off; the lack of $\Omega(k^{1/2})$-size tasks can be compensated for with smaller tasks, which can now be as small as $O(\log k)$.
Coresets for Near-Convex Functions
Tukan, Murad, Maalouf, Alaa, Feldman, Dan
Coreset is usually a small weighted subset of $n$ input points in $\mathbb{R}^d$, that provably approximates their loss function for a given set of queries (models, classifiers, etc.). Coresets become increasingly common in machine learning since existing heuristics or inefficient algorithms may be improved by running them possibly many times on the small coreset that can be maintained for streaming distributed data. Coresets can be obtained by sensitivity (importance) sampling, where its size is proportional to the total sum of sensitivities. Unfortunately, computing the sensitivity of each point is problem dependent and may be harder to compute than the original optimization problem at hand. We suggest a generic framework for computing sensitivities (and thus coresets) for wide family of loss functions which we call near-convex functions. This is by suggesting the $f$-SVD factorization that generalizes the SVD factorization of matrices to functions. Example applications include coresets that are either new or significantly improves previous results, such as SVM, Logistic regression, M-estimators, and $\ell_z$-regression. Experimental results and open source are also provided.
Transfer Learning for High-dimensional Linear Regression: Prediction, Estimation, and Minimax Optimality
Li, Sai, Cai, T. Tony, Li, Hongzhe
This paper considers the estimation and prediction of a high-dimensional linear regression in the setting of transfer learning, using samples from the target model as well as auxiliary samples from different but possibly related regression models. When the set of "informative" auxiliary samples is known, an estimator and a predictor are proposed and their optimality is established. The optimal rates of convergence for prediction and estimation are faster than the corresponding rates without using the auxiliary samples. This implies that knowledge from the informative auxiliary samples can be transferred to improve the learning performance of the target problem. In the case that the set of informative auxiliary samples is unknown, we propose a data-driven procedure for transfer learning, called Trans-Lasso, and reveal its robustness to non-informative auxiliary samples and its efficiency in knowledge transfer. The proposed procedures are demonstrated in numerical studies and are applied to a dataset concerning the associations among gene expressions. It is shown that Trans-Lasso leads to improved performance in gene expression prediction in a target tissue by incorporating the data from multiple different tissues as auxiliary samples.
Learning Convex Optimization Models
Agrawal, Akshay, Barratt, Shane, Boyd, Stephen
A convex optimization model predicts an output from an input by solving a convex optimization problem. The class of convex optimization models is large, and includes as special cases many well-known models like linear and logistic regression. We propose a heuristic for learning the parameters in a convex optimization model given a dataset of input-output pairs, using recently developed methods for differentiating the solution of a convex optimization problem with respect to its parameters. We describe three general classes of convex optimization models, maximum a posteriori (MAP) models, utility maximization models, and agent models, and present a numerical experiment for each.
How Much Math do I need in Data Science?
Can I become a data scientist with little or no math background? What essential math skills are important in data science? There are so many good packages that can be used for building predictive models or for producing data visualizations. Thanks to these packages, anyone can build a model or produce a data visualization. However, very solid background knowledge in mathematics is essential for fine-tuning your models to produce reliable models with optimal performance.