Goto

Collaborating Authors

 Regression


Provable Detection of Propagating Sampling Bias in Prediction Models

arXiv.org Artificial Intelligence

With an increased focus on incorporating fairness in machine learning models, it becomes imperative not only to assess and mitigate bias at each stage of the machine learning pipeline but also to understand the downstream impacts of bias across stages. Here we consider a general, but realistic, scenario in which a predictive model is learned from (potentially biased) training data, and model predictions are assessed post-hoc for fairness by some auditing method. We provide a theoretical analysis of how a specific form of data bias, differential sampling bias, propagates from the data stage to the prediction stage. Unlike prior work, we evaluate the downstream impacts of data biases quantitatively rather than qualitatively and prove theoretical guarantees for detection. Under reasonable assumptions, we quantify how the amount of bias in the model predictions varies as a function of the amount of differential sampling bias in the data, and at what point this bias becomes provably detectable by the auditor. Through experiments on two criminal justice datasets -- the well-known COMPAS dataset and historical data from NYPD's stop and frisk policy -- we demonstrate that the theoretical results hold in practice even when our assumptions are relaxed.


Masked Multi-Step Probabilistic Forecasting for Short-to-Mid-Term Electricity Demand

arXiv.org Artificial Intelligence

Predicting the demand for electricity with uncertainty helps in planning and operation of the grid to provide reliable supply of power to the consumers. Machine learning (ML)-based demand forecasting approaches can be categorized into (1) sample-based approaches, where each forecast is made independently, and (2) time series regression approaches, where some historical load and other feature information is used. When making a short-to-mid-term electricity demand forecast, some future information is available, such as the weather forecast and calendar variables. However, in existing forecasting models this future information is not fully incorporated. To overcome this limitation of existing approaches, we propose Masked Multi-Step Multivariate Probabilistic Forecasting (MMMPF), a novel and general framework to train any neural network model capable of generating a sequence of outputs, that combines both the temporal information from the past and the known information about the future to make probabilistic predictions. Experiments are performed on a real-world dataset for short-to-mid-term electricity demand forecasting for multiple regions and compared with various ML methods. They show that the proposed MMMPF framework outperforms not only sample-based methods but also existing time-series forecasting models with the exact same base models. Models trainded with MMMPF can also generate desired quantiles to capture uncertainty and enable probabilistic planning for grid of the future.


Low-dimensional Data-based Surrogate Model of a Continuum-mechanical Musculoskeletal System Based on Non-intrusive Model Order Reduction

arXiv.org Artificial Intelligence

In recent decades, the main focus of computer modeling has been on supporting the design and development of engineering prototyes, but it is now ubiquitous in non-traditional areas such as medical rehabilitation. Conventional modeling approaches like the finite element~(FE) method are computationally costly when dealing with complex models, making them of limited use for purposes like real-time simulation or deployment on low-end hardware, if the model at hand cannot be simplified in a useful manner. Consequently, non-traditional approaches such as surrogate modeling using data-driven model order reduction are used to make complex high-fidelity models more widely available anyway. They often involve a dimensionality reduction step, in which the high-dimensional system state is transformed onto a low-dimensional subspace or manifold, and a regression approach to capture the reduced system behavior. While most publications focus on one dimensionality reduction, such as principal component analysis~(PCA) (linear) or autoencoder (nonlinear), we consider and compare PCA, kernel PCA, autoencoders, as well as variational autoencoders for the approximation of a structural dynamical system. In detail, we demonstrate the benefits of the surrogate modeling approach on a complex FE model of a human upper-arm. We consider both the models deformation and the internal stress as the two main quantities of interest in a FE context. By doing so we are able to create a computationally low cost surrogate model which captures the system behavior with high approximation quality and fast evaluations.


Generalization Ability of Wide Neural Networks on $\mathbb{R}$

arXiv.org Artificial Intelligence

Deep neural networks have been successfully applied in various fields such as image analysis, natural language processing, protein structure prediction, etc.[40, 22, 35]. Since the number of parameters appeared in deep neural networks is often ten times or hundred times larger than the sample size of data, the successes of neural network methods have challenged the traditional bias variances trade-off principle, one of the primary doctrines in the classical statistical learning theories [61]. For example, many influential experiments [9, 67, 8, 48, 7] suggested that if one trains a neural network till it overfits the data, the resulting network can still generalize well. This observation, often referred to as the "benign overfitting phenomenon" [4, 53, 26, 45], actually reshaped the landscape of the studies in neural networks. For example, some researchers built giant neural networks in practice which can easily achieve nearly zero training error and possess the state-of-the-art performances [31, 50, 21]. Inspired by these experiments and observations, researchers proposed various new theories to explain why overfitted neural networks do generalize well on certain data [9, 43, 26, 47]. Several groups of statisticians tried to explain the generalization ability of neural networks from statistical decision theory with various carefully designed nonparametric regression frameworks. For example, assuming that the regression function belongs to a carefully designed sub-class of the Hölder continuous functions, [5] proved that there exists a neural network with sigmoid activation function achieving the corresponding minimax rate; [54] further established similar results for ReLU neural networks based on the approximation theory from [66]; [59] then extended these results to regression functions in Besov space and its variants.


Deep Neural Networks for Nonparametric Interaction Models with Diverging Dimension

arXiv.org Machine Learning

Deep neural networks have achieved tremendous success due to their representation power and adaptation to low-dimensional structures. Their potential for estimating structured regression functions has been recently established in the literature. However, most of the studies require the input dimension to be fixed and consequently ignore the effect of dimension on the rate of convergence and hamper their applications to modern big data with high dimensionality. In this paper, we bridge this gap by analyzing a $k^{th}$ order nonparametric interaction model in both growing dimension scenarios ($d$ grows with $n$ but at a slower rate) and in high dimension ($d \gtrsim n$). In the latter case, sparsity assumptions and associated regularization are required in order to obtain optimal rates of convergence. A new challenge in diverging dimension setting is in calculation mean-square error, the covariance terms among estimated additive components are an order of magnitude larger than those of the variances and they can deteriorate statistical properties without proper care. We introduce a critical debiasing technique to amend the problem. We show that under certain standard assumptions, debiased deep neural networks achieve a minimax optimal rate both in terms of $(n, d)$. Our proof techniques rely crucially on a novel debiasing technique that makes the covariances of additive components negligible in the mean-square error calculation. In addition, we establish the matching lower bounds.


Hybrid Feature- and Similarity-Based Models for Joint Prediction and Interpretation

arXiv.org Artificial Intelligence

Electronic health records (EHRs) include simple features like patient age together with more complex data like care history that are informative but not easily represented as individual features. To better harness such data, we developed an interpretable hybrid feature- and similarity-based model for supervised learning that combines feature and kernel learning for prediction and for investigation of causal relationships. We fit our hybrid models by convex optimization with a sparsity-inducing penalty on the kernel. Depending on the desired model interpretation, the feature and kernel coefficients can be learned sequentially or simultaneously. The hybrid models showed comparable or better predictive performance than solely feature- or similarity-based approaches in a simulation study and in a case study to predict two-year risk of loneliness or social isolation with EHR data from a complex primary health care population. Using the case study we also present new kernels for high-dimensional indicator-coded EHR data that are based on deviations from population-level expectations, and we identify considerations for causal interpretations.


Machine Learning: Concepts and Applications

#artificialintelligence

This course gives you a comprehensive introduction to both the theory and practice of machine learning. You will learn to use Python along with industry-standard libraries and tools, including Pandas, Scikit-learn, and Tensorflow, to ingest, explore, and prepare data for modeling and then train and evaluate models using a wide variety of techniques. Those techniques include linear regression with ordinary least squares, logistic regression, support vector machines, decision trees and ensembles, clustering, principal component analysis, hidden Markov models, and deep learning. A key feature of this course is that you not only learn how to apply these techniques, you also learn the conceptual basis underlying them so that you understand how they work, why you are doing what you are doing, and what your results mean. The course also features real-world datasets, drawn primarily from the realm of public policy.


Explainable Artificial Intelligence: Precepts, Methods, and Opportunities for Research in Construction

arXiv.org Artificial Intelligence

Explainable artificial intelligence has received limited attention in construction despite its growing importance in various other industrial sectors. In this paper, we provide a narrative review of XAI to raise awareness about its potential in construction. Our review develops a taxonomy of the XAI literature comprising its precepts and approaches. Opportunities for future XAI research focusing on stakeholder desiderata and data and information fusion are identified and discussed. We hope the opportunities we suggest stimulate new lines of inquiry to help alleviate the scepticism and hesitancy toward AI adoption and integration in construction.


Improving the Generalizability of Collaborative Dialogue Analysis with Multi-Feature Embeddings

arXiv.org Artificial Intelligence

Conflict prediction in communication is integral to the design of virtual agents that support successful teamwork by providing timely assistance. The aim of our research is to analyze discourse to predict collaboration success. Unfortunately, resource scarcity is a problem that teamwork researchers commonly face since it is hard to gather a large number of training examples. To alleviate this problem, this paper introduces a multi-feature embedding (MFeEmb) that improves the generalizability of conflict prediction models trained on dialogue sequences. MFeEmb leverages textual, structural, and semantic information from the dialogues by incorporating lexical, dialogue acts, and sentiment features. The use of dialogue acts and sentiment features reduces performance loss from natural distribution shifts caused mainly by changes in vocabulary. This paper demonstrates the performance of MFeEmb on domain adaptation problems in which the model is trained on discourse from one task domain and applied to predict team performance in a different domain. The generalizability of MFeEmb is quantified using the similarity measure proposed by Bontonou et al. (2021). Our results show that MFeEmb serves as an excellent domain-agnostic representation for meta-pretraining a few-shot model on collaborative multiparty dialogues.


Instrumental Variable Regression via Kernel Maximum Moment Loss

arXiv.org Artificial Intelligence

Instrumental variables (IV) have become standard tools for economists, epidemiologists, and social scientists to uncover causal relationships from observational data [3, 46]. Randomization of treatments or policies has been perceived as the gold standard for such tasks, but is generally prohibitive in many real-world scenarios due to time constraints or ethical concerns. When treatment assignment is not randomized, it is generally impossible to discern between the causal effect of treatments and spurious correlations that are induced by unobserved factors. Instead, IVs enable the investigators to incorporate natural variation through an IV that is associated with the treatments, but not with the outcome variable, other than through its effect on the treatments. In economics, for instance, the season-of-birth was used as an IV to study the return from schooling, which measures causal effect of education on labor market earning [16]. In genetic epidemiology, the idea to use genetic variants as IVs, known as Mendelian randomization, has also gained increasing popularity [13, 14].