Bayesian Learning
Probabilistic Regressor Chains with Monte Carlo Methods
A large number and diversity of techniques have been offered in the literature in recent years for solving multi-label classification tasks, including classifier chains where predictions are cascaded to other models as additional features. The idea of extending this chaining methodology to multi-output regression has already been suggested and trialed: regressor chains. However, this has so-far been limited to greedy inference and has provided relatively poor results compared to individual models, and of limited applicability. In this paper we identify and discuss the main limitations, including an analysis of different base models, loss functions, explainability, and other desiderata of real-world applications. To overcome the identified limitations we study and develop methods for regressor chains. In particular we present a sequential Monte Carlo scheme in the framework of a probabilistic regressor chain, and we show it can be effective, flexible and useful in several types of data. We place regressor chains in context in general terms of multi-output learning with continuous outputs, and in doing this shed additional light on classifier chains.
Audits as Evidence: Experiments, Ensembles, and Enforcement
Kline, Patrick, Walters, Christopher
We develop tools for utilizing correspondence experiments to detect illegal discrimination by individual employers. Employers violate US employment law if their propensity to contact applicants depends on protected characteristics such as race or sex. We establish identification of higher moments of the causal effects of protected characteristics on callback rates as a function of the number of fictitious applications sent to each job ad. These moments are used to bound the fraction of jobs that illegally discriminate. Applying our results to three experimental datasets, we find evidence of significant employer heterogeneity in discriminatory behavior, with the standard deviation of gaps in job-specific callback probabilities across protected groups averaging roughly twice the mean gap. In a recent experiment manipulating racially distinctive names, we estimate that at least 85% of jobs that contact both of two white applications and neither of two black applications are engaged in illegal discrimination. To assess the tradeoff between type I and II errors presented by these patterns, we consider the performance of a series of decision rules for investigating suspicious callback behavior under a simple two-type model that rationalizes the experimental data. Though, in our preferred specification, only 17% of employers are estimated to discriminate on the basis of race, we find that an experiment sending 10 applications to each job would enable accurate detection of 7-10% of discriminators while falsely accusing fewer than 0.2% of non-discriminators. A minimax decision rule acknowledging partial identification of the joint distribution of callback rates yields higher error rates but more investigations than our baseline two-type model. Our results suggest illegal labor market discrimination can be reliably monitored with relatively small modifications to existing audit designs.
Robust Nonlinear Component Estimation with Tikhonov Regularization
Feinman, Reuben, Parthasarathy, Nikhil
Learning reduced component representations of data using nonlinear transformations is a central problem in unsupervised learning with a rich history. Recently, a new family of algorithms based on maximum likelihood optimization with change of variables has demonstrated an impressive ability to model complex nonlinear data distributions. These algorithms learn to map from arbitrary random variables to independent components using invertible nonlinear function approximators. Despite the potential of this framework, the underlying optimization objective is ill-posed for a large class of variables, inhibiting accurate component estimates in many use cases. We present a new Tikhonov regularization technique for nonlinear independent component estimation that mediates the instability of the algorithm and facilitates robust component estimates. In addition, we provide a theoretically grounded procedure for feature extraction that produces PCA-like representations of nonlinear distributions using the learned model. We apply our technique to a handful of nonlinear data manifolds and show that the resulting representations possess important consistencies lacked by unregularized models.
Learning Effective Embeddings From Crowdsourced Labels: An Educational Case Study
Xu, Guowei, Ding, Wenbiao, Tang, Jiliang, Yang, Songfan, Huang, Gale Yan, Liu, Zitao
Learning representation has been proven to be helpful in numerous machine learning tasks. The success of the majority of existing representation learning approaches often requires a large amount of consistent and noise-free labels. However, labels are not accessible in many real-world scenarios and they are usually annotated by the crowds. In practice, the crowdsourced labels are usually inconsistent among crowd workers given their diverse expertise and the number of crowdsourced labels is very limited. Thus, directly adopting crowdsourced labels for existing representation learning algorithms is inappropriate and suboptimal. In this paper, we investigate the above problem and propose a novel framework of \textbf{R}epresentation \textbf{L}earning with crowdsourced \textbf{L}abels, i.e., "RLL", which learns representation of data with crowdsourced labels by jointly and coherently solving the challenges introduced by limited and inconsistent labels. The proposed representation learning framework is evaluated in two real-world education applications. The experimental results demonstrate the benefits of our approach on learning representation from limited labeled data from the crowds, and show RLL is able to outperform state-of-the-art baselines. Moreover, detailed experiments are conducted on RLL to fully understand its key components and the corresponding performance.
Subspace Inference for Bayesian Deep Learning
Izmailov, Pavel, Maddox, Wesley J., Kirichenko, Polina, Garipov, Timur, Vetrov, Dmitry, Wilson, Andrew Gordon
Bayesian inference was once a gold standard for learning with neural networks, providing accurate full predictive distributions and well calibrated uncertainty. However, scaling Bayesian inference techniques to deep neural networks is challenging due to the high dimensionality of the parameter space. In this paper, we construct low-dimensional subspaces of parameter space, such as the first principal components of the stochastic gradient descent (SGD) trajectory, which contain diverse sets of high performing models. In these subspaces, we are able to apply elliptical slice sampling and variational inference, which struggle in the full parameter space. We show that Bayesian model averaging over the induced posterior in these subspaces produces accurate predictions and well calibrated predictive uncertainty for both regression and image classification.
Information processing constraints in travel behaviour modelling: A generative learning approach
In recent years, the use of data-driven modelling and integration of behavioural and psychological factors in discrete choice and travel behaviour analysis have become active areas of research [2, 3, 4]. In the context of data-driven models, behavioural variations describe the correlation between observed choice attributes and unobserved socioeconomic factors using a flexible and tractable model specification. These variations include: decision-protocols, choice sets, unobserved taste variations and unobserved attributes [5]. Under these considerations, recent studies on travel behaviour analysis have so far primarily focused on representing heterogeneity in the error correction function and incorporating it into utility based multinomial logit (MNL) models [3]. Models such as mixed multinomial logit (MMNL) or latent class (LC) model offers flexibility in representing heterogeneity and substitution patterns. In addition, recent conceptual frameworks such as the integrated choice and latent variable (ICLV) use individuals' psychometric indicators to represent unobserved behavioural and perception heterogeneity [6]. It is also possible to apply a generative machine learning to identify informative latent constructs in travel decision making without subjective behaviour indicators [7, 8]. However, the true underlying behavioural patterns are often unknown and usually approximated by some predetermined exogenous indicator variables that would often lead to model misspecification due to lack of complete information, or error in data collection [9]. Furthermore, accurate specification of the underlying distribution assumes individuals have access to all available information regarding the travel activity (e.g.
Electroencephalography based Classification of Long-term Stress using Psychological Labeling
Saeed, Sanay Muhammad Umar, Anwar, Syed Muhammad, Khalid, Humaira, Majid, Muhammad, Bagci, Ulas
Stress research is a rapidly emerging area in thefield of electroencephalography (EEG) based signal processing.The use of EEG as an objective measure for cost effective andpersonalized stress management becomes important in particularsituations such as the non-availability of mental health facilities.In this study, long-term stress is classified using baseline EEGsignal recordings. The labelling for the stress and control groupsis performed using two methods (i) the perceived stress scalescore and (ii) expert evaluation. The frequency domain featuresare extracted from five-channel EEG recordings in addition tothe frontal and temporal alpha and beta asymmetries. The alphaasymmetry is computed from four channels and used as a feature.Feature selection is also performed using a t-test to identifystatistically significant features for both stress and control groups.We found that support vector machine is best suited to classifylong-term human stress when used with alpha asymmetry asa feature. It is observed that expert evaluation based labellingmethod has improved the classification accuracy up to 85.20%.Based on these results, it is concluded that alpha asymmetry maybe used as a potential bio-marker for stress classification, when labels are assigned using expert evaluation.
End-To-End Prediction of Emotion From Heartbeat Data Collected by a Consumer Fitness Tracker
Harper, Ross, Southern, Joshua
Automatic detection of emotion has the potential to revolutionize mental health and wellbeing. Recent work has been successful in predicting affect from unimodal electrocardiogram (ECG) data. However, to be immediately relevant for real-world applications, physiology-based emotion detection must make use of ubiquitous photoplethysmogram (PPG) data collected by affordable consumer fitness trackers. Additionally, applications of emotion detection in healthcare settings will require some measure of uncertainty over model predictions. We present here a Bayesian deep learning model for end-to-end classification of emotional valence, using only the unimodal heartbeat time series collected by a consumer fitness tracker (Garmin V\'ivosmart 3). We collected a new dataset for this task, and report a peak F1 score of 0.7. This demonstrates a practical relevance of physiology-based emotion detection `in the wild' today.
Adversarial Security Attacks and Perturbations on Machine Learning and Deep Learning Methods
Cybersecurity also benefits from ML and DL methods for various types of applications. These methods however are susceptible to security attacks. The adversaries can exploit the training and testing data of the learning models or can explore the workings of those models for launching advanced future attacks. The topic of adversarial security attacks and perturbations within the ML and DL domains is a recent exploration and a great interest is expressed by the security researchers and practitioners. The literature covers different adversarial security attacks and perturbations on ML and DL methods and those have their own presentation styles and merits. A need to review and consolidate knowledge that is comprehending of this increasingly focused and growing topic of research; however, is the current demand of the research communities. In this review paper, we specifically aim to target new researchers in the cybersecurity domain who may seek to acquire some basic knowledge on the machine learning and deep learning models and algorithms, as well as some of the relevant adversarial security attacks and perturbations.