Goto

Collaborating Authors

 Regression


Logistic-Regression with peer-group effects via inference in higher order Ising models

arXiv.org Machine Learning

Spin glass models, such as the Sherrington-Kirkpatrick, Hopfield and Ising models, are all well-studied members of the exponential family of discrete distributions, and have been influential in a number of application domains where they are used to model correlation phenomena on networks. Conventionally these models have quadratic sufficient statistics and consequently capture correlations arising from pairwise interactions. In this work we study extensions of these to models with higher-order sufficient statistics, modeling behavior on a social network with peer-group effects. In particular, we model binary outcomes on a network as a higher-order spin glass, where the behavior of an individual depends on a linear function of their own vector of covariates and some polynomial function of the behavior of others, capturing peer-group effects. Using a {\em single}, high-dimensional sample from such model our goal is to recover the coefficients of the linear function as well as the strength of the peer-group effects. The heart of our result is a novel approach for showing strong concavity of the log pseudo-likelihood of the model, implying statistical error rate of $\sqrt{d/n}$ for the Maximum Pseudo-Likelihood Estimator (MPLE), where $d$ is the dimensionality of the covariate vectors and $n$ is the size of the network (number of nodes). Our model generalizes vanilla logistic regression as well as the peer-effect models studied in recent works, and our results extend these results to accommodate higher-order interactions.


To deep, or not to deep, that is the question!

#artificialintelligence

As in other fields of artificial intelligence and prior to the emergence of Deep Learning, especially deep neural networks, artificial vision research was focused on a traditional Machine Learning approach. The traditional machine learning approach relies on developers massaging the data to extract the most salient or significant aspects from the data they are dealing with; that is, time sequences of frames, or videos. In this case, both scientific research and application development have been centered around identifying the most significant image elements that would allow, for example, facial and body recognition of the people who appear in the images, tracking them from one frame to another, or classifying the vehicles that move through a given area. After extracting this meaningful data, statistical methods are then employed to transform the representation into a so-called "understanding" of the real visual environment by using clustering, support-vector machines (SVMs), and filtering algorithms (linear, non-linear, regression), among others. This means that the merits of any given application lie in how well researchers and developers are able to source and generate data from the raw processed frames and transform it into useful structured data.


Statistically Guided Divide-and-Conquer for Sparse Factorization of Large Matrix

arXiv.org Machine Learning

The sparse factorization of a large matrix is fundamental in modern statistical learning. In particular, the sparse singular value decomposition and its variants have been utilized in multivariate regression, factor analysis, biclustering, vector time series modeling, among others. The appeal of this factorization is owing to its power in discovering a highly-interpretable latent association network, either between samples and variables or between responses and predictors. However, many existing methods are either ad hoc without a general performance guarantee, or are computationally intensive, rendering them unsuitable for large-scale studies. We formulate the statistical problem as a sparse factor regression and tackle it with a divide-and-conquer approach. In the first stage of division, we consider both sequential and parallel approaches for simplifying the task into a set of co-sparse unit-rank estimation (CURE) problems, and establish the statistical underpinnings of these commonly-adopted and yet poorly understood deflation methods. In the second stage of division, we innovate a contended stagewise learning technique, consisting of a sequence of simple incremental updates, to efficiently trace out the whole solution paths of CURE. Our algorithm has a much lower computational complexity than alternating convex search, and the choice of the step size enables a flexible and principled tradeoff between statistical accuracy and computational efficiency. Our work is among the first to enable stagewise learning for non-convex problems, and the idea can be applicable in many multi-convex problems. Extensive simulation studies and an application in genetics demonstrate the effectiveness and scalability of our approach.


Adversarial Transferability in Wearable Sensor Systems

arXiv.org Machine Learning

Machine learning has increasingly become the most used approach for inference and decision making in wearable sensor systems. However, recent studies have found that machine learning systems are easily fooled by the addition of adversarial perturbation to their inputs. What is more interesting is that the adversarial examples generated for one machine learning system can also degrade the performance of another. This property of adversarial examples is called transferability. In this work, we take the first strides in studying adversarial transferability in wearable sensor systems, from the following perspectives: 1) Transferability between machine learning models, 2) Transferability across subjects, 3) Transferability across sensor locations, and 4) Transferability across datasets. With Human Activity Recognition (HAR) as an example sensor system, we found strong untargeted transferability in all cases of transferability. Specifically, gradient-based attacks were able to achieve higher misclassification rates compared to non-gradient attacks. The misclassification rate of untargeted adversarial examples ranged from 20% to 98%. For targeted transferability between machine learning models, the success rate of adversarial examples was 100% for iterative attack methods. However, the success rate for other types of targeted transferability ranged from 20% to 0%. Our findings strongly suggest that adversarial transferability has serious consequences not only in sensor systems but also across the broad spectrum of ubiquitous computing.


ParKCa: Causal Inference with Partially Known Causes

arXiv.org Machine Learning

Causal Inference methods based on observational data are an alternative for applications where collecting the counterfactual data or realizing a more standard experiment is not possible. In this work, our goal is to combine several observational causal inference methods to learn new causes in applications where some causes are well known. We validate the proposed method on The Cancer Genome Atlas (TCGA) dataset to identify genes that potentially cause metastasis.


Improving predictions by nonlinear regression models from outlying input data

arXiv.org Machine Learning

When applying machine learning/statistical methods to the environmental sciences, nonlinear regression (NLR) models often perform only slightly better and occasionally worse than linear regression (LR). The proposed reason for this conundrum is that NLR models can give predictions much worse than LR when given input data which lie outside the domain used in model training. Continuous unbounded variables are widely used in environmental sciences, whence not uncommon for new input data to lie far outside the training domain. For six environmental datasets, inputs in the test data were classified as "outliers" and "non-outliers" based on the Mahalanobis distance from the training input data. The prediction scores (mean absolute error, Spearman correlation) showed NLR to outperform LR for the non-outliers, but often underperform LR for the outliers. An approach based on Occam's Razor (OR) was proposed, where linear extrapolation was used instead of nonlinear extrapolation for the outliers. The linear extrapolation to the outlier domain was based on the NLR model within the non-outlier domain. This NLR$_{\mathrm{OR}}$ approach reduced occurrences of very poor extrapolation by NLR, and it tended to outperform NLR and LR for the outliers. In conclusion, input test data should be screened for outliers. For outliers, the unreliable NLR predictions can be replaced by NLR$_{\mathrm{OR}}$ or LR predictions, or by issuing a "no reliable prediction" warning.



Time series and machine learning to forecast the water quality from satellite data

arXiv.org Machine Learning

Managing the quality of water for present and future generations of coastal regions should be a central concern of both citizens and public officials. Remote sensing can contribute to the management and monitoring of coastal water and pollutants. Algal blooms are a coastal pollutant that is a cause of concern. Many satellite data, such as MODIS, have been used to generate water-quality products to detect the blooms such as chlorophyll a (Chl-a), a photosynthesis index called fluorescence line height (FLH), and sea surface temperature (SST). It is important to characterize the spatial and temporal variations of these water quality products by using the mathematical models of these products. However, for monitoring, pollution control boards will need nowcasts and forecasts of any pollution. Therefore, we aim to predict the future values of the MODIS Chl-a, FLH, and SST of the water. This will not be limited to one type of water but, rather, will cover different types of water varying in depth and turbidity. This is very significant because the temporal trend of Chl-a, FLH, and SST is dependent on the geospatial and water properties. For this purpose, we will decompose the time series of each pixel into several components: trend, intra-annual variations, seasonal cycle, and stochastic stationary. We explore three such time series machine learning models that can characterize the non-stationary time series data and predict future values, including the Seasonal ARIMA (Auto Regressive Integrated Moving Average) (SARIMA), regression, and neural network. The results indicate that all these methods are effective at modelling Chl-a, FLH, and SST time series and predicting the values reasonably well. However, regression and neural network are found to be the best at predicting Chl-a in all types of water (turbid and shallow). Meanwhile, the SARIMA model provides the best prediction of FLH and SST.


A Numerical Transform of Random Forest Regressors corrects Systematically-Biased Predictions

arXiv.org Machine Learning

Over the past decade, random forest models have become widely used as a robust method for high-dimensional data regression tasks. In part, the popularity of these models arises from the fact that they require little hyperparameter tuning and are not very susceptible to overfitting. Random forest regression models are comprised of an ensemble of decision trees that independently predict the value of a (continuous) dependent variable; predictions from each of the trees are ultimately averaged to yield an overall predicted value from the forest. Using a suite of representative real-world datasets, we find a systematic bias in predictions from random forest models. We find that this bias is recapitulated in simple synthetic datasets, regardless of whether or not they include irreducible error (noise) in the data, but that models employing boosting do not exhibit this bias. Here we demonstrate the basis for this problem, and we use the training data to define a numerical transformation that fully corrects it. Application of this transformation yields improved predictions in every one of the real-world and synthetic datasets evaluated in our study.


Intuitions on L1 and L2 Regularisation

#artificialintelligence

L1 and L2 regularisation owes its name to L1 and L2 norm of a vector w respectively. A linear regression model that implements L1 norm for regularisation is called lasso regression, and one that implements (squared) L2 norm for regularisation is called ridge regression. Note: Strictly speaking, the last equation (ridge regression) is a loss function with squared L2 norm of the weights (notice the absence of the square root). The regularisation terms are'constraints' by which an optimisation algorithm must'adhere to' when minimising the loss function, apart from having to minimise the error between the true y and the predicted ŷ. Let's define a model to see how L1 and L2 work.