Goto

Collaborating Authors

 Regression


NYC Data Science Academy

#artificialintelligence

This 20-hour Machine Learning with Python course covers all the basic machine learning methods and Python modules (especially Scikit-Learn) for implementing them. The five sessions cover: simple and multiple Linear regressions; classification methods including logistic regression, discriminant analysis and naive bayes, support vector machines (SVMs) and tree based methods; cross-validation and feature selection; regularization; principal component analysis (PCA) and clustering algorithms. After successfully completing of this course, you will be able to explain the principles of machine learning algorithms and implement these methods to analyze complex datasets and make predictions in Python.


Explainable Machine Learning Control -- robust control and stability analysis

arXiv.org Artificial Intelligence

Recently, the term explainable AI became known as an approach to produce models from artificial intelligence which allow interpretation. Since a long time, there are models of symbolic regression in use that are perfectly explainable and mathematically tractable: in this contribution we demonstrate how to use symbolic regression methods to infer the optimal control of a dynamical system given one or several optimization criteria, or cost functions. In previous publications, network control was achieved by automatized machine learning control using genetic programming. Here, we focus on the subsequent analysis of the analytical expressions which result from the machine learning. In particular, we use AUTO to analyze the stability properties of the controlled oscillator system which served as our model. As a result, we show that there is a considerable advantage of explainable models over less accessible neural networks.


The Reciprocal Bayesian LASSO

arXiv.org Machine Learning

Throughout the course of the paper, we assume that y and X have been centered at 0 so there is no intercept in the model, where y is the n 1 vector of centered responses, X is the n p matrix of standardized regressors, ฮฒ is the p 1 vector of coefficients to be estimated, and null is the n 1 vector of independent and identically distributed normal errors with mean 0 and variance ฯƒ 2 . Compared to traditional penalization functions that are usually symmetric about 0, continuous and nondecreasing in (0,), the rLASSO penalty functions are decreasing in (0,), discontinuous at 0, and converge to infinity when the coefficients approach zero. From a theoretical standpoint, rLASSO shares the same oracle property and same rate of estimation error with other LASSOtype penalty functions. An early reference to this class of models can be found in Song and Liang (2015), with more recent papers focusing on large sample asymptotics, along with computational strategies for frequentist estimation (Shin et al., 2018; Song, 2018). Our approach differs from this line of work in adopting a Bayesian perspective on rLASSO estimation. Ideally, a Bayesian solution can be obtained by placing appropriate priors on the regression coefficients that will mimic the effects of the rLASSO penalty. As apparent from (1), this arises in assuming a prior for ฮฒ that decomposes as a product of independent inverse Laplace (double exponential) densities: ฯ€ (ฮฒ) p null j 1 ฮป 2ฮฒ 2 j exp{ ฮป ฮฒ j }I { ฮฒ j null 0 }.


Oracle Efficient Estimation of Structural Breaks in Cointegrating Regressions

arXiv.org Machine Learning

In this paper, we propose an adaptive group lasso procedure to efficiently estimate structural breaks in cointegrating regressions. It is well-known that the group lasso estimator is not simultaneously estimation consistent and model selection consistent in structural break settings. Hence, we use a first step group lasso estimation of a diverging number of breakpoint candidates to produce weights for a second adaptive group lasso estimation. We prove that parameter changes are estimated consistently by group lasso if it is tuned correctly and show that the number of estimated breaks is greater than the true number but still sufficiently close to it. Then, we use these results and prove that the adaptive group lasso has oracle properties if weights are obtained from our first step estimation and the tuning parameter satisfies some further restrictions. Simulation results show that the proposed estimator delivers the expected results. An economic application to the long-run US money demand function demonstrates the practical importance of this methodology.


Bayesian Product Ranking at Wayfair Wayfair

#artificialintelligence

Given sufficient data, we could just use the logistic regression model without further changes. Wayfair handled more than 9 million orders last quarter alone, which initially might sound like more than enough. However, those orders were spread out among millions of products, yielding just a few orders per product at most. Small integers like these can be extremely noisy, so we always have to worry that one product simply seems better than another because of random chance. For example, it is hard to tell if a product that happened to attract three orders is actually any better than one that happened to attract two, or if it just got lucky.


Lasso for hierarchical polynomial models

arXiv.org Machine Learning

In a polynomial regression model, the divisibility conditions implicit in polynomial hierarchy give way to a natural construction of constraints for the model parameters. We use this principle to derive versions of strong and weak hierarchy and to extend existing work in the literature, which at the moment is only concerned with models of degree two. We discuss how to estimate parameters in lasso using standard quadratic programming techniques and apply our proposal to both simulated data and examples from the literature. The proposed methodology compares favorably with existing techniques in terms of low validation error and model size.


R2DE: a NLP approach to estimating IRT parameters of newly generated questions

arXiv.org Machine Learning

The main objective of exams consists in performing an assessment of students' expertise on a specific subject. Such expertise, also referred to as skill or knowledge level, can then be leveraged in different ways (e.g., to assign a grade to the students, to understand whether a student might need some support, etc.). Similarly, the questions appearing in the exams have to be assessed in some way before being used to evaluate students. Standard approaches to questions' assessment are either subjective (e.g., assessment by human experts) or introduce a long delay in the process of question generation (e.g., pretesting with real students). In this work we introduce R2DE (which is a Regressor for Difficulty and Discrimination Estimation), a model capable of assessing newly generated multiple-choice questions by looking at the text of the question and the text of the possible choices. In particular, it can estimate the difficulty and the discrimination of each question, as they are defined in Item Response Theory. We also present the results of extensive experiments we carried out on a real world large scale dataset coming from an e-learning platform, showing that our model can be used to perform an initial assessment of newly created questions and ease some of the problems that arise in question generation.


Generalization Bounds and Representation Learning for Estimation of Potential Outcomes and Causal Effects

arXiv.org Machine Learning

Practitioners in diverse fields such as healthcare, economics and education are eager to apply machine learning to improve decision making. The cost and impracticality of performing experiments and a recent monumental increase in electronic record keeping has brought attention to the problem of evaluating decisions based on non-experimental observational data. This is the setting of this work. In particular, we study estimation of individual-level causal effects, such as a single patient's response to alternative medication, from recorded contexts, decisions and outcomes. We give generalization bounds on the error in estimated effects based on distance measures between groups receiving different treatments, allowing for sample re-weighting. We provide conditions under which our bound is tight and show how it relates to results for unsupervised domain adaptation. Led by our theoretical results, we devise representation learning algorithms that minimize our bound, by regularizing the representation's induced treatment group distance, and encourage sharing of information between treatment groups. We extend these algorithms to simultaneously learn a weighted representation to further reduce treatment group distances. Finally, an experimental evaluation on real and synthetic data shows the value of our proposed representation architecture and regularization scheme.


Estimating Latent Demand of Shared Mobility through Censored Gaussian Processes

arXiv.org Machine Learning

Transport demand is highly dependent on supply, especially for shared transport services where availability is often limited. As observed demand cannot be higher than available supply, historical transport data typically represents a biased, or censored, version of the true underlying demand pattern. Without explicitly accounting for this inherent distinction, predictive models of demand would necessarily represent a biased version of true demand, thus less effectively predicting the needs of service users. To counter this problem, we propose a general method for censorship-aware demand modeling, for which we devise a censored likelihood function. We apply this method to the task of shared mobility demand prediction by incorporating the censored likelihood within a Gaussian Process model, which can flexibly approximate arbitrary functional forms. Experiments on artificial and real-world datasets show how taking into account the limiting effect of supply on demand is essential in the process of obtaining an unbiased predictive model of user demand behavior.


Intelligence, physics and information -- the tradeoff between accuracy and simplicity in machine learning

arXiv.org Machine Learning

How can we enable machines to make sense of the world, and become better at learning? To approach this goal, I believe viewing intelligence in terms of many integral aspects, and also a universal two-term tradeoff between task performance and complexity, provides two feasible perspectives. In this thesis, I address several key questions in some aspects of intelligence, and study the phase transitions in the two-term tradeoff, using strategies and tools from physics and information. Firstly, how can we make the learning models more flexible and efficient, so that agents can learn quickly with fewer examples? Inspired by how physicists model the world, we introduce a paradigm and an AI Physicist agent for simultaneously learning many small specialized models (theories) and the domain they are accurate, which can then be simplified, unified and stored, facilitating few-shot learning in a continual way. Secondly, for representation learning, when can we learn a good representation, and how does learning depend on the structure of the dataset? We approach this question by studying phase transitions when tuning the tradeoff hyperparameter. In the information bottleneck, we theoretically show that these phase transitions are predictable and reveal structure in the relationships between the data, the model, the learned representation and the loss landscape. Thirdly, how can agents discover causality from observations? We address part of this question by introducing an algorithm that combines prediction and minimizing information from the input, for exploratory causal discovery from observational time series. Fourthly, to make models more robust to label noise, we introduce Rank Pruning, a robust algorithm for classification with noisy labels. I believe that building on the work of my thesis we will be one step closer to enable more intelligent machines that can make sense of the world.