Goto

Collaborating Authors

 Regression


Variational Hierarchical Mixtures for Probabilistic Learning of Inverse Dynamics

arXiv.org Artificial Intelligence

Well-calibrated probabilistic regression models are a crucial learning component in robotics applications as datasets grow rapidly and tasks become more complex. Unfortunately, classical regression models are usually either probabilistic kernel machines with a flexible structure that does not scale gracefully with data or deterministic and vastly scalable automata, albeit with a restrictive parametric form and poor regularization. In this paper, we consider a probabilistic hierarchical modeling paradigm that combines the benefits of both worlds to deliver computationally efficient representations with inherent complexity regularization. The presented approaches are probabilistic interpretations of local regression techniques that approximate nonlinear functions through a set of local linear or polynomial units. Importantly, we rely on principles from Bayesian nonparametrics to formulate flexible models that adapt their complexity to the data and can potentially encompass an infinite number of components. We derive two efficient variational inference techniques to learn these representations and highlight the advantages of hierarchical infinite local regression models, such as dealing with non-smooth functions, mitigating catastrophic forgetting, and enabling parameter sharing and fast predictions. Finally, we validate this approach on large inverse dynamics datasets and test the learned models in real-world control scenarios.


Adaptive Selection of the Optimal Strategy to Improve Precision and Power in Randomized Trials

arXiv.org Machine Learning

Benkeser et al. demonstrate how adjustment for baseline covariates in randomized trials can meaningfully improve precision for a variety of outcome types. Their findings build on a long history, starting in 1932 with R.A. Fisher and including more recent endorsements by the U.S. Food and Drug Administration and the European Medicines Agency. Here, we address an important practical consideration: *how* to select the adjustment approach -- which variables and in which form -- to maximize precision, while maintaining Type-I error control. Balzer et al. previously proposed *Adaptive Prespecification* within TMLE to flexibly and automatically select, from a prespecified set, the approach that maximizes empirical efficiency in small trials (N$<$40). To avoid overfitting with few randomized units, selection was previously limited to working generalized linear models, adjusting for a single covariate. Now, we tailor Adaptive Prespecification to trials with many randomized units. Using $V$-fold cross-validation and the estimated influence curve-squared as the loss function, we select from an expanded set of candidates, including modern machine learning methods adjusting for multiple covariates. As assessed in simulations exploring a variety of data generating processes, our approach maintains Type-I error control (under the null) and offers substantial gains in precision -- equivalent to 20-43\% reductions in sample size for the same statistical power. When applied to real data from ACTG Study 175, we also see meaningful efficiency improvements overall and within subgroups.


Generating drawdown-realistic financial price paths using path signatures

arXiv.org Machine Learning

A novel generative machine learning approach for the simulation of sequences of financial price data with drawdowns quantifiably close to empirical data is introduced. Applications such as pricing drawdown insurance options or developing portfolio drawdown control strategies call for a host of drawdown-realistic paths. Historical scenarios may be insufficient to effectively train and backtest the strategy, while standard parametric Monte Carlo does not adequately preserve drawdowns. We advocate a non-parametric Monte Carlo approach combining a variational autoencoder generative model with a drawdown reconstruction loss function. To overcome issues of numerical complexity and non-differentiability, we approximate drawdown as a linear function of the moments of the path, known in the literature as path signatures. We prove the required regularity of drawdown function and consistency of the approximation. Furthermore, we obtain close numerical approximations using linear regression for fractional Brownian and empirical data. We argue that linear combinations of the moments of a path yield a mathematically non-trivial smoothing of the drawdown function, which gives one leeway to simulate drawdown-realistic price paths by including drawdown evaluation metrics in the learning objective. We conclude with numerical experiments on mixed equity, bond, real estate and commodity portfolios and obtain a host of drawdown-realistic paths.


Towards Trustworthy Explanation: On Causal Rationalization

arXiv.org Machine Learning

With recent advances in natural language processing, rationalization becomes an essential self-explaining diagram to disentangle the black box by selecting a subset of input texts to account for the major variation in prediction. Yet, existing association-based approaches on rationalization cannot identify true rationales when two or more snippets are highly inter-correlated and thus provide a similar contribution to prediction accuracy, so-called spuriousness. To address this limitation, we novelly leverage two causal desiderata, non-spuriousness and efficiency, into rationalization from the causal inference perspective. We formally define a series of probabilities of causation based on a newly proposed structural causal model of rationalization, with its theoretical identification established as the main component of learning necessary and sufficient rationales. The superior performance of the proposed causal rationalization is demonstrated on real-world review and medical datasets with extensive experiments compared to state-of-the-art methods.


Machine Learning for Tangible Effects: Natural Language Processing for Uncovering the Illicit Massage Industry & Computer Vision for Tactile Sensing

arXiv.org Artificial Intelligence

I explore two questions in this thesis: how can computer science be used to fight human trafficking? And how can computer vision create a sense of touch? I use natural language processing (NLP) to monitor the United States illicit massage industry (IMI), a multi-billion dollar industry that offers not just therapeutic massages but also commercial sexual services. Employees of this industry are often immigrant women with few job opportunities, leaving them vulnerable to fraud, coercion, and other facets of human trafficking. Monitoring spatiotemporal trends helps prevent trafficking in the IMI. By creating datasets with three publicly-accessible websites: Google Places, Rubmaps, and AMPReviews, combined with NLP techniques such as bag-of-words and Word2Vec, I show how to derive insights into the labor pressures and language barriers that employees face, as well as the income, demographics, and societal pressures affecting sex buyers. I include a call-to-action to other researchers given these datasets. I also consider how to creating synthetic financial data, which can aid with counter-trafficking in the banking sector. I use an agent-based model to create both tabular and payee-recipient graph data. I then consider the role of computer vision in making tactile sensors. I report on a novel sensor, the Digger Finger, that adapts the Gelsight sensor to finding objects in granular media. Changes include using a wedge shape to facilitate digging, replacing the internal lighting LEDs with fluorescent paint, and adding a vibrator motor to counteract jamming. Finally, I also show how to use a webcam and a printed reference marker, or fiducial, to create a low-cost six-axis force-torque sensor. This sensor is up to a hundred times less expensive than commercial sensors, allowing for a wider range of applications. For this and earlier chapters I release design files and code as open source.


Polynomial Bounds for Learning Noisy Optical Physical Unclonable Functions and Connections to Learning With Errors

arXiv.org Artificial Intelligence

It is shown that a class of optical physical unclonable functions (PUFs) can be learned to arbitrary precision with arbitrarily high probability, even in the presence of noise, given access to polynomially many challenge-response pairs and polynomially bounded computational power, under mild assumptions about the distributions of the noise and challenge vectors. This extends the results of Rh\"uramir et al. (2013), who showed a subset of this class of PUFs to be learnable in polynomial time in the absence of noise, under the assumption that the optics of the PUF were either linear or had negligible nonlinear effects. We derive polynomial bounds for the required number of samples and the computational complexity of a linear regression algorithm, based on size parameters of the PUF, the distributions of the challenge and noise vectors, and the probability and accuracy of the regression algorithm, with a similar analysis to one done by Bootle et al. (2018), who demonstrated a learning attack on a poorly implemented version of the Learning With Errors problem.


Causal thinking for decision making on Electronic Health Records: why and how

arXiv.org Machine Learning

Accurate predictions, as with machine learning, may not suffice to provide optimal healthcare for every patient. Indeed, prediction can be driven by shortcuts in the data, such as racial biases. Causal thinking is needed for data-driven decisions. Here, we give an introduction to the key elements, focusing on routinely-collected data, electronic health records (EHRs) and claims data. Using such data to assess the value of an intervention requires care: temporal dependencies and existing practices easily confound the causal effect. We present a step-by-step framework to help build valid decision making from real-life patient records by emulating a randomized trial before individualizing decisions, eg with machine learning. Our framework highlights the most important pitfalls and considerations in analysing EHRs or claims data to draw causal conclusions. We illustrate the various choices in studying the effect of albumin on sepsis mortality in the Medical Information Mart for Intensive Care database (MIMIC-IV). We study the impact of various choices at every step, from feature extraction to causal-estimator selection. In a tutorial spirit, the code and the data are openly available.


Interpretation of High-Dimensional Linear Regression: Effects of Nullspace and Regularization Demonstrated on Battery Data

arXiv.org Machine Learning

High-dimensional linear regression is important in many scientific fields. This article considers discrete measured data of underlying smooth latent processes, as is often obtained from chemical or biological systems. Interpretation in high dimensions is challenging because the nullspace and its interplay with regularization shapes regression coefficients. The data's nullspace contains all coefficients that satisfy $\mathbf{Xw}=\mathbf{0}$, thus allowing very different coefficients to yield identical predictions. We developed an optimization formulation to compare regression coefficients and coefficients obtained by physical engineering knowledge to understand which part of the coefficient differences are close to the nullspace. This nullspace method is tested on a synthetic example and lithium-ion battery data. The case studies show that regularization and z-scoring are design choices that, if chosen corresponding to prior physical knowledge, lead to interpretable regression results. Otherwise, the combination of the nullspace and regularization hinders interpretability and can make it impossible to obtain regression coefficients close to the true coefficients when there is a true underlying linear model. Furthermore, we demonstrate that regression methods that do not produce coefficients orthogonal to the nullspace, such as fused lasso, can improve interpretability. In conclusion, the insights gained from the nullspace perspective help to make informed design choices for building regression models on high-dimensional data and reasoning about potential underlying linear models, which are important for system optimization and improving scientific understanding.


An Experimental Evaluation of Machine Learning Training on a Real Processing-in-Memory System

arXiv.org Artificial Intelligence

Training machine learning (ML) algorithms is a computationally intensive process, which is frequently memory-bound due to repeatedly accessing large training datasets. As a result, processor-centric systems (e.g., CPU, GPU) suffer from costly data movement between memory units and processing units, which consumes large amounts of energy and execution cycles. Memory-centric computing systems, i.e., with processing-in-memory (PIM) capabilities, can alleviate this data movement bottleneck. Our goal is to understand the potential of modern general-purpose PIM architectures to accelerate ML training. To do so, we (1) implement several representative classic ML algorithms (namely, linear regression, logistic regression, decision tree, K-Means clustering) on a real-world general-purpose PIM architecture, (2) rigorously evaluate and characterize them in terms of accuracy, performance and scaling, and (3) compare to their counterpart implementations on CPU and GPU. Our evaluation on a real memory-centric computing system with more than 2500 PIM cores shows that general-purpose PIM architectures can greatly accelerate memory-bound ML workloads, when the necessary operations and datatypes are natively supported by PIM hardware. For example, our PIM implementation of decision tree is $27\times$ faster than a state-of-the-art CPU version on an 8-core Intel Xeon, and $1.34\times$ faster than a state-of-the-art GPU version on an NVIDIA A100. Our K-Means clustering on PIM is $2.8\times$ and $3.2\times$ than state-of-the-art CPU and GPU versions, respectively. To our knowledge, our work is the first one to evaluate ML training on a real-world PIM architecture. We conclude with key observations, takeaways, and recommendations that can inspire users of ML workloads, programmers of PIM architectures, and hardware designers & architects of future memory-centric computing systems.


Linear Regression using Heterogeneous Data Batches

arXiv.org Machine Learning

In many learning applications, data are collected from multiple sources, each providing a \emph{batch} of samples that by itself is insufficient to learn its input-output relationship. A common approach assumes that the sources fall in one of several unknown subgroups, each with an unknown input distribution and input-output relationship. We consider one of this setup's most fundamental and important manifestations where the output is a noisy linear combination of the inputs, and there are $k$ subgroups, each with its own regression vector. Prior work~\cite{kong2020meta} showed that with abundant small-batches, the regression vectors can be learned with only few, $\tilde\Omega( k^{3/2})$, batches of medium-size with $\tilde\Omega(\sqrt k)$ samples each. However, the paper requires that the input distribution for all $k$ subgroups be isotropic Gaussian, and states that removing this assumption is an ``interesting and challenging problem". We propose a novel gradient-based algorithm that improves on the existing results in several ways. It extends the applicability of the algorithm by: (1) allowing the subgroups' underlying input distributions to be different, unknown, and heavy-tailed; (2) recovering all subgroups followed by a significant proportion of batches even for infinite $k$; (3) removing the separation requirement between the regression vectors; (4) reducing the number of batches and allowing smaller batch sizes.