Goto

Collaborating Authors

 Regression


Conformal Off-policy Prediction

arXiv.org Artificial Intelligence

Off-policy evaluation is critical in a number of applications where new policies need to be evaluated offline before online deployment. Most existing methods focus on the expected return, define the target parameter through averaging and provide a point estimator only. In this paper, we develop a novel procedure to produce reliable interval estimators for a target policy's return starting from any initial state. Our proposal accounts for the variability of the return around its expectation, focuses on the individual effect and offers valid uncertainty quantification. Our main idea lies in designing a pseudo policy that generates subsamples as if they were sampled from the target policy so that existing conformal prediction algorithms are applicable to prediction interval construction. Our methods are justified by theories, synthetic data and real data from short-video platforms.


On Explanations, Fairness, and Appropriate Reliance in Human-AI Decision-Making

arXiv.org Artificial Intelligence

Proponents of explainable AI have often argued that it constitutes an essential path towards algorithmic fairness. Prior works examining these claims have primarily evaluated explanations based on their effects on humans' perceptions, but there is scant research on the relationship between explanations and distributive fairness of AI-assisted decisions. In this paper, we conduct an empirical study to examine the relationship between feature-based explanations and distributive fairness, mediated by human perceptions and reliance on AI recommendations. Our findings show that explanations influence fairness perceptions, which, in turn, relate to humans' tendency to adhere to AI recommendations. However, our findings suggest that such explanations do not enable humans to discern correct and wrong AI recommendations. Instead, we show that they may affect reliance irrespective of the correctness of AI recommendations. Depending on which features an explanation highlights, this can foster or hinder distributive fairness: when explanations highlight features that are task-irrelevant and evidently associated with the sensitive attribute, this prompts overrides that counter stereotype-aligned AI recommendations. Meanwhile, if explanations appear task-relevant, this induces reliance behavior that reinforces stereotype-aligned errors. These results show that feature-based explanations are not a reliable mechanism to improve distributive fairness, as their ability to do so relies on a human-in-the-loop operationalization of the flawed notion of "fairness through unawareness". Finally, our study design provides a blueprint to evaluate the suitability of other explanations as pathways towards improved distributive fairness of AI-assisted decisions.


Fast Linear Model Trees by PILOT

arXiv.org Artificial Intelligence

Linear model trees are regression trees that incorporate linear models in the leaf nodes. This preserves the intuitive interpretation of decision trees and at the same time enables them to better capture linear relationships, which is hard for standard decision trees. But most existing methods for fitting linear model trees are time consuming and therefore not scalable to large data sets. In addition, they are more prone to overfitting and extrapolation issues than standard regression trees. In this paper we introduce PILOT, a new algorithm for linear model trees that is fast, regularized, stable and interpretable. PILOT trains in a greedy fashion like classic regression trees, but incorporates an $L^2$ boosting approach and a model selection rule for fitting linear models in the nodes. The abbreviation PILOT stands for $PI$ecewise $L$inear $O$rganic $T$ree, where `organic' refers to the fact that no pruning is carried out. PILOT has the same low time and space complexity as CART without its pruning. An empirical study indicates that PILOT tends to outperform standard decision trees and other linear model trees on a variety of data sets. Moreover, we prove its consistency in an additive model setting under weak assumptions. When the data is generated by a linear model, the convergence rate is polynomial.


A distribution-free mixed-integer optimization approach to hierarchical modelling of clustered and longitudinal data

arXiv.org Machine Learning

We create a mixed-integer optimization (MIO) approach for doing cluster-aware regression, i.e. linear regression that takes into account the inherent clustered structure of the data. We compare to the linear mixed effects regression (LMEM) which is the most used current method, and design simulation experiments to show superior performance to LMEM in terms of both predictive and inferential metrics in silico. Furthermore, we show how our method is formulated in a very interpretable way; LMEM cannot generalize and make cluster-informed predictions when the cluster of new data points is unknown, but we solve this problem by training an interpretable classification tree that can help decide cluster effects for new data points, and demonstrate the power of this generalizability on a real protein expression dataset.


Uncertainty estimation for time series forecasting via Gaussian process regression surrogates

arXiv.org Artificial Intelligence

Machine learning models are widely used to solve real-world problems in science and industry. To build robust models, we should quantify the uncertainty of the model's predictions on new data. This study proposes a new method for uncertainty estimation based on the surrogate Gaussian process model. Our method can equip any base model with an accurate uncertainty estimate produced by a separate surrogate. Compared to other approaches, the estimate remains computationally effective with training only one additional model and doesn't rely on data-specific assumptions. The only requirement is the availability of the base model as a black box, which is typical. Experiments for challenging time-series forecasting data show that surrogate model-based methods provide more accurate confidence intervals than bootstrap-based methods in both medium and small-data regimes and different families of base models, including linear regression, ARIMA, and gradient boosting.


Unsupervised Learning, Recommenders, Reinforcement Learning

#artificialintelligence

The Machine Learning Specialization is a foundational online program created in collaboration between DeepLearning.AI and Stanford Online. In this beginner-friendly program, you will learn the fundamentals of machine learning and how to use these techniques to build real-world AI applications. This Specialization is taught by Andrew Ng, an AI visionary who has led critical research at Stanford University and groundbreaking work at Google Brain, Baidu, and Landing.AI to advance the AI field. This 3-course Specialization is an updated and expanded version of Andrew's pioneering Machine Learning course, rated 4.9 out of 5 and taken by over 4.8 million learners since it launched in 2012. It provides a broad introduction to modern machine learning, including supervised learning (multiple linear regression, logistic regression, neural networks, and decision trees), unsupervised learning (clustering, dimensionality reduction, recommender systems), and some of the best practices used in Silicon Valley for artificial intelligence and machine learning innovation (evaluating and tuning models, taking a data-centric approach to improving performance, and more.)


Adapting to Continuous Covariate Shift via Online Density Ratio Estimation

arXiv.org Artificial Intelligence

Dealing with distribution shifts is one of the central challenges for modern machine learning. One fundamental situation is the \emph{covariate shift}, where the input distributions of data change from training to testing stages while the input-conditional output distribution remains unchanged. In this paper, we initiate the study of a more challenging scenario -- \emph{continuous} covariate shift -- in which the test data appear sequentially, and their distributions can shift continuously. Our goal is to adaptively train the predictor such that its prediction risk accumulated over time can be minimized. Starting with the importance-weighted learning, we show the method works effectively if the time-varying density ratios of test and train inputs can be accurately estimated. However, existing density ratio estimation methods would fail due to data scarcity at each time step. To this end, we propose an online method that can appropriately reuse historical information. Our density ratio estimation method is proven to perform well by enjoying a dynamic regret bound, which finally leads to an excess risk guarantee for the predictor. Empirical results also validate the effectiveness.


Dictionary-based Manifold Learning

arXiv.org Artificial Intelligence

We propose a paradigm for interpretable Manifold Learning for scientific data analysis, whereby we parametrize a manifold with $d$ smooth functions from a scientist-provided dictionary of meaningful, domain-related functions. When such a parametrization exists, we provide an algorithm for finding it based on sparse non-linear regression in the manifold tangent bundle, bypassing more standard manifold learning algorithms. We also discuss conditions for the existence of such parameterizations in function space and for successful recovery from finite samples. We demonstrate our method with experimental results from a real scientific domain.


Polynomial Regressions in R - Lituptech Digital

#artificialintelligence

Hello and welcome to this tutorial. We have learnt how to create Single and Multiple linear regression models. Now, let’s learn how to create Polynomial regression Models in R and where we would apply it to solve real life problems. According to Wikipedia, Polynomial regression is a form of regression analysis in which the relationship between the independent variable x and the dependent variable y is modeled as an nth degree polynomial in x. Polynomial regression fits a nonlinear relationship between the value of x and the correspondent conditional mean of y. In this tutorial we are going to be […]


From Robustness to Privacy and Back

arXiv.org Artificial Intelligence

We study the relationship between two desiderata of algorithms in statistical inference and machine learning: differential privacy and robustness to adversarial data corruptions. Their conceptual similarity was first observed by Dwork and Lei (STOC 2009), who observed that private algorithms satisfy robustness, and gave a general method for converting robust algorithms to private ones. However, all general methods for transforming robust algorithms into private ones lead to suboptimal error rates. Our work gives the first black-box transformation that converts any adversarially robust algorithm into one that satisfies pure differential privacy. Moreover, we show that for any low-dimensional estimation task, applying our transformation to an optimal robust estimator results in an optimal private estimator. Thus, we conclude that for any low-dimensional task, the optimal error rate for $\varepsilon$-differentially private estimators is essentially the same as the optimal error rate for estimators that are robust to adversarially corrupting $1/\varepsilon$ training samples. We apply our transformation to obtain new optimal private estimators for several high-dimensional tasks, including Gaussian (sparse) linear regression and PCA. Finally, we present an extension of our transformation that leads to approximate differentially private algorithms whose error does not depend on the range of the output space, which is impossible under pure differential privacy.