Goto

Collaborating Authors

 Regression


Prediction of Reaction Time and Vigilance Variability from Spatiospectral Features of Resting-State EEG in a Long Sustained Attention Task

arXiv.org Machine Learning

Resting-state brain networks represent the intrinsic state of the brain during the majority of cognitive and sensorimotor tasks. However, no study has yet presented concise predictors of task-induced vigilance variability from spectrospatial features of the pre-task, resting-state electroencephalograms (EEG). We asked ten healthy volunteers (6 females, 4 males) to participate in 105-minute fixed-sequence-varying-duration sessions of sustained attention to response task (SART). A novel and adaptive vigilance scoring scheme was designed based on the performance and response time in consecutive trials, and demonstrated large inter-participant variability in terms of maintaining consistent tonic performance. Multiple linear regression using feature relevance analysis obtained significant predictors of the mean cumulative vigilance score (CVS), mean response time, and variabilities of these scores from the resting-state, band-power ratios of EEG signals, p<0.05. Single-layer neural networks trained with cross-validation also captured different associations for the beta sub-bands. Increase in the gamma (28-48 Hz) and upper beta ratios from the left central and temporal regions predicted slower reactions and more inconsistent vigilance as explained by the increased activation of default mode network (DMN) and differences between the high- and low-attention networks at temporal regions. Higher ratios of parietal alpha from the Brodmann's areas 18, 19, and 37 during the eyes-open states predicted slower responses but more consistent CVS and reactions associated with the superior ability in vigilance maintenance. The proposed framework and these findings on the most stable and significant attention predictors from the intrinsic EEG power ratios can be used to model attention variations during the calibration sessions of BCI applications and vigilance monitoring systems.


Generalized tensor regression with covariates on multiple modes

arXiv.org Machine Learning

We consider the problem of tensor-response regression given covariates on multiple modes. Such data problems arise frequently in applications such as neuroimaging, network analysis, and spatial-temporal modeling. We propose a new family of tensor response regression models that incorporate covariates, and establish the theoretical accuracy guarantees. Unlike earlier methods, our estimation allows high-dimensionality in both the tensor response and the covariate matrices on multiple modes. An efficient alternating updating algorithm is further developed. Our proposal handles a broad range of data types, including continuous, count, and binary observations. Through simulation and applications to two real datasets, we demonstrate the outperformance of our approach over the state-of-art.


Learning interaction kernels in heterogeneous systems of agents from multiple trajectories

arXiv.org Machine Learning

Systems of interacting particles or agents have wide applications in many disciplines such as Physics, Chemistry, Biology and Economics. These systems are governed by interaction laws, which are often unknown: estimating them from observation data is a fundamental task that can provide meaningful insights and accurate predictions of the behaviour of the agents. In this paper, we consider the inverse problem of learning interaction laws given data from multiple trajectories, in a nonparametric fashion, when the interaction kernels depend on pairwise distances. We establish a condition for learnability of interaction kernels, and construct estimators that are guaranteed to converge in a suitable $L^2$ space, at the optimal min-max rate for 1-dimensional nonparametric regression. We propose an efficient learning algorithm based on least squares, which can be implemented in parallel for multiple trajectories and is therefore well-suited for the high dimensional, big data regime. Numerical simulations on a variety examples, including opinion dynamics, predator-swarm dynamics and heterogeneous particle dynamics, suggest that the learnability condition is satisfied in models used in practice, and the rate of convergence of our estimator is consistent with the theory. These simulations also suggest that our estimators are robust to noise in the observations, and produce accurate predictions of dynamics in relative large time intervals, even when they are learned from data collected in short time intervals.


Making Bayesian Predictive Models Interpretable: A Decision Theoretic Approach

arXiv.org Artificial Intelligence

A salient approach to interpretable machine learning is to restrict modeling to simple and hence understandable models. In the Bayesian framework, this can be pursued by restricting the model structure and prior to favor interpretable models. Fundamentally, however, interpretability is about users' preferences, not the data generation mechanism: it is more natural to formulate interpretability as a utility function. In this work, we propose an interpretability utility, which explicates the trade-off between explanation fidelity and interpretability in the Bayesian framework. The method consists of two steps. First, a reference model, possibly a black-box Bayesian predictive model compromising no accuracy, is constructed and fitted to the training data. Second, a proxy model from an interpretable model family that best mimics the predictive behaviour of the reference model is found by optimizing the interpretability utility function. The approach is model agnostic - neither the interpretable model nor the reference model are restricted to be from a certain class of models - and the optimization problem can be solved using standard tools in the chosen model family. Through experiments on real-word data sets using decision trees as interpretable models and Bayesian additive regression models as reference models, we show that for the same level of interpretability, our approach generates more accurate models than the earlier alternative of restricting the prior. We also propose a systematic way to measure stabilities of interpretabile models constructed by different interpretability approaches and show that our proposed approach generates more stable models.


ML From Scratch, Part 1: Linear Regression - OranLooney.com

#artificialintelligence

To kick off this series, will start with something simple yet foundational: linear regression via ordinary least squares. While not exciting, linear regression finds widespread use both as a standalone learning algorithm and as a building block in more advanced learning algorithms. The output layer of a deep neural network trained for regression with MSE loss, simple AR time series models, and the "local regression" part of LOWESS smoothing are all examples of linear regression being used as an ingredient in a more sophisticated model. Linear regression is also the "simple harmonic oscillator" of machine learning; that is to say, a pedagogical example that allows us to present deep theoretical ideas about machine learning in a context that is not too mathematically taxing. There is also the small matter of it being the most widely used supervised learning algorithm in the world; although how much weight that carries I suppose depends on where you are on the "applied" to "theoretical" spectrum. However, since I can already feel your eyes glazing over from such an introductory topic, we can spice things up a little bit by doing something which isn't often done in introductory machine learning - we can present the algorithm that [your favorite statistical software here] actually uses to fit linear regression models: QR decomposition. It seems this is commonly glossed over because it involves more linear algebra than can be generally assumed, or perhaps because the exact solution we will derive doesn't generalize well to other machine learning algorithms, not even closely related variants such as regularized regression or robust regression.


Sparse (group) learning with Lipschitz loss functions: a unified analysis

arXiv.org Machine Learning

We study a family of sparse estimators defined as minimizers of some empirical Lipschitz loss function---which include hinge, logistic and quantile regression losses---with a convex, sparse or group-sparse regularization. In particular, we consider the L1-norm on the coefficients, its sorted Slope version, and the Group L1-L2 extension. First, we propose a theoretical framework which simultaneously derives new L2 estimation upper bounds for all three regularization schemes. For L1 and Slope regularizations, our bounds scale as $(k^*/n) \log(p/k^*)$---$n\times p$ is the size of the design matrix and $k^*$ the dimension of the theoretical loss minimizer $\beta^*$---matching the optimal minimax rate achieved for the least-squares case. For Group L1-L2 regularization, our bounds scale as $(s^*/n) \log\left( G / s^* \right) + m^* / n$---$G$ is the total number of groups and $m^*$ the number of coefficients in the $s^*$ groups which contain $\beta^*$---and improve over the least-squares case. We additionally show that Group L1-L2 is superior to L1 and Slope when the signal is strongly group-sparse. Our bounds are achieved both in probability and in expectation, under common assumptions in the literature. Second, we propose an accelerated proximal algorithm which computes the convex estimators studied when the number of variables is of the order of $100,000$. We compare the statistical performance of our estimators against standard baselines for settings where the signal is either sparse or group-sparse. Our experiments findings reveal (i) the good empirical performance of L1 and Slope for sparse binary classification problems, (ii) the superiority of Group L1-L2 regularization for group-sparse classification problems and (iii) the appealing properties of sparse quantile regression estimators for sparse regression problems with heteroscedastic noise.


Introduction to Coresets: Accurate Coresets

arXiv.org Machine Learning

A coreset (or core-set) of an input set is its small summation, such that solving a problem on the coreset as its input, provably yields the same result as solving the same problem on the original (full) set, for a given family of problems (models, classifiers, loss functions). Over the past decade, coreset construction algorithms have been suggested for many fundamental problems in e.g. machine/deep learning, computer vision, graphics, databases, and theoretical computer science. This introductory paper was written following requests from (usually non-expert, but also colleagues) regarding the many inconsistent coreset definitions, lack of available source code, the required deep theoretical background from different fields, and the dense papers that make it hard for beginners to apply coresets and develop new ones. The paper provides folklore, classic and simple results including step-by-step proofs and figures, for the simplest (accurate) coresets of very basic problems, such as: sum of vectors, minimum enclosing ball, SVD/ PCA and linear regression. Nevertheless, we did not find most of their constructions in the literature. Moreover, we expect that putting them together in a retrospective context would help the reader to grasp modern results that usually extend and generalize these fundamental observations. Experts might appreciate the unified notation and comparison table that links between existing results. Open source code with example scripts are provided for all the presented algorithms, to demonstrate their practical usage, and to support the readers who are more familiar with programming than math.


Machine Learning Ex2 Solutions

#artificialintelligence

It is estimated to top. Therefore the best way to understand machine learning is to look at some example problems. Machine Learning Ex2 - Linear Regression Implementing linear regression using gradient descent in Scala based on Andrew Ng's machine learning course. In this book, you'll do exactly that. Quality Assurance in Software Testing: Prevention is better than a cure, even where it concerns software solutions. HP Elite x2 Designed for IT, loved by users. With this mind, the Machine Learning & AI For Upstream Onshore Oil & Gas 2019 purely focuses on understanding the profitable applications of Machine Learning and AI, primarily for optimizing production for onshore E&Ps, and examine how to improve operational efficiencies in drilling and completions.


Coupling Oceanic Observation Systems to Study Mesoscale Ocean Dynamics

arXiv.org Machine Learning

Understanding local currents in the North Atlantic region of the ocean is a key part of modelling heat transfer and global climate patterns. Satellites provide a surface signature of the temperature of the ocean with a high horizontal resolution while in situ autonomous probes supply high vertical resolution, but horizontally sparse, knowledge of the ocean interior thermal structure. The objective of this paper is to develop a methodology to combine these complementary ocean observing systems measurements to obtain a three-dimensional time series of ocean temperatures with high horizontal and vertical resolution. Within an observation-driven framework, we investigate the extent to which mesoscale ocean dynamics in the North Atlantic region may be decomposed into a mixture of dynamical modes, characterized by different local regressions between Sea Surface Temperature (SST), Sea Level Anomalies (SLA) and Vertical Temperature fields. Ultimately we propose a Latent-class regression method to improve prediction of vertical ocean temperature.


Adversarial Regression. Generative Adversarial Networks for Non-Linear Regression: Theory and Assessment

arXiv.org Machine Learning

Adversarial Regression is a proposition to perform high dimensional non-linear regression with uncertainty estimation. We used Conditional Generative Adversarial Network to obtain an estimate of the full predictive distribution for a new observation. Generative Adversarial Networks (GAN) are implicit generative models which produce samples from a distribution approximating the distribution of the data. The conditional version of it (CGAN) takes the following expression: $\min\limits_G \max\limits_D V(D, G) = \mathbb{E}_{x\sim p_{r}(x)} [log(D(x, y))] + \mathbb{E}_{z\sim p_{z}(z)} [log (1-D(G(z, y)))]$. An approximate solution can be found by training simultaneously two neural networks to model D and G and feeding G with a random noise vector $z$. After training, we have that $G(z, y)\mathrel{\dot\sim} p_{data}(x, y)$. By fixing $y$, we have $G(z|y) \mathrel{\dot\sim} p{data}(x|y)$. By sampling $z$, we can therefore obtain samples following approximately $p(x|y)$, which is the predictive distribution of $x$ for a new $y$. We ran experiments to test various loss functions, data distributions, sample size, size of the noise vector, etc. Even if we observed differences, no experiment outperformed consistently the others. The quality of CGAN for regression relies on fine-tuning a range of hyperparameters. In a broader view, the results show that CGANs are very promising methods to perform uncertainty estimation for high dimensional non-linear regression.