Collaborating Authors


On Moving from Statistics to Machine Learning, the Final Stage of Grief


I've spent the last few months preparing for and applying for data science jobs. It's possible the data science world may reject me and my lack of both experience and a credential above a bachelors degree, in which case I'll do something else. Regardless of what lies in store for my future, I think I've gotten a good grasp of the mindset underlying machine learning and how it differs from traditional statistics, so I thought I'd write about it for those who have a similar background to me considering a similar move.1 This post is geared toward people who are excellent at statistics but don't really "get" machine learning and want to understand the gist of it in about 15 minutes of reading. If you have a traditional academic stats backgrounds (be it econometrics, biostatistics, psychometrics, etc.), there are two good reasons to learn more about data science: The world of data science is, in many ways, hiding in plain sight from the more academically-minded quantitative disciplines.

Data-Driven Algorithm Design

Communications of the ACM

The best algorithm for a computational problem generally depends on the "relevant inputs," a concept that depends on the application domain and often defies formal articulation. Although there is a large literature on empirical approaches to selecting the best algorithm for a given application domain, there has been surprisingly little theoretical analysis of the problem. Our framework captures several state-of-the-art empirical and theoretical approaches to the problem, and our results identify conditions under which these approaches are guaranteed to perform well. We interpret our results in the contexts of learning greedy heuristics, instance feature-based algorithm selection, and parameter tuning in machine learning. Rigorously comparing algorithms is hard. Two different algorithms for a computational problem generally have incomparable performance: one algorithm is better on some inputs but worse on the others. The simplest and most common solution in the theoretical analysis of algorithms is to summarize the performance of an algorithm using a single number, such as its worst-case performance or its average-case performance with respect to an input distribution. This approach effectively advocates using the algorithm with the best summarizing value (e.g., the smallest worst-case running time). Solving a problem "in practice" generally means identifying an algorithm that works well for most or all instances of interest. When the "instances of interest" are easy to specify formally in advance--say, planar graphs, the traditional analysis approaches often give accurate performance predictions and identify useful algorithms.

Artificial Intelligence in Cardiology: Present and Future


For the purpose of this narrative review, we searched PubMed and MEDLINE databases with no date restriction using search terms related to AI and medicine and cardiology subspecialties. Articles were reviewed and selected for inclusion on the basis of relevance. This article highlights that the role of ML in cardiovascular medicine is rapidly emerging, and mounting evidence indicates it will power the new tools that drive the field. Among other uses, AI has been deployed to interpret echocardiograms, to automatically identify heart rhythms from an ECG, to uniquely identify an individual using the ECG as a biometric signal, and to detect the presence of heart disease such as left ventricular dysfunction from the surface ECG.6x6Attia, Z.I., Kapa, S., Lopez-Jimenez, F. et al.

Variational Bayes In Private Settings (VIPS)

Journal of Artificial Intelligence Research

Many applications of Bayesian data analysis involve sensitive information such as personal documents or medical records, motivating methods which ensure that privacy is protected. We introduce a general privacy-preserving framework for Variational Bayes (VB), a widely used optimization-based Bayesian inference method. Our framework respects differential privacy, the gold-standard privacy criterion, and encompasses a large class of probabilistic models, called the Conjugate Exponential (CE) family. We observe that we can straightforwardly privatise VB's approximate posterior distributions for models in the CE family, by perturbing the expected sufficient statistics of the complete-data likelihood. For a broadly-used class of non-CE models, those with binomial likelihoods, we show how to bring such models into the CE family, such that inferences in the modified model resemble the private variational Bayes algorithm as closely as possible, using the Pólya-Gamma data augmentation scheme. The iterative nature of variational Bayes presents a further challenge since iterations increase the amount of noise needed. We overcome this by combining: (1) an improved composition method for differential privacy, called the moments accountant, which provides a tight bound on the privacy cost of multiple VB iterations and thus significantly decreases the amount of additive noise; and (2) the privacy amplification effect of subsampling mini-batches from large-scale data in stochastic learning. We empirically demonstrate the effectiveness of our method in CE and non-CE models including latent Dirichlet allocation, Bayesian logistic regression, and sigmoid belief networks, evaluated on real-world datasets.

Vocabulary Alignment in Openly Specified Interactions

Journal of Artificial Intelligence Research

The problem of achieving common understanding between agents that use different vocabularies has been mainly addressed by techniques that assume the existence of shared external elements, such as a meta-language or a physical environment. In this article, we consider agents that use different vocabularies and only share knowledge of how to perform a task, given by the specification of an interaction protocol. We present a framework that lets agents learn a vocabulary alignment from the experience of interacting. Unlike previous work in this direction, we use open protocols that constrain possible actions instead of defining procedures, making our approach more general. We present two techniques that can be used either to learn an alignment from scratch or to repair an existent one, and we evaluate their performance experimentally.

A Comprehensive Survey on Traffic Prediction Artificial Intelligence

Traffic prediction plays an essential role in intelligent transportation system. Accurate traffic prediction can assist route planing, guide vehicle dispatching, and mitigate traffic congestion. This problem is challenging due to the complicated and dynamic spatio-temporal dependencies between different regions in the road network. Recently, a significant amount of research efforts have been devoted to this area, greatly advancing traffic prediction abilities. The purpose of this paper is to provide a comprehensive survey for traffic prediction. Specifically, we first summarize the existing traffic prediction methods, and give a taxonomy of them. Second, we list the common applications of traffic prediction and the state-of-the-art in these applications. Third, we collect and organize widely used public datasets in the existing literature. Furthermore, we give an evaluation by conducting extensive experiments to compare the performance of methods related to traffic demand and speed prediction respectively on two datasets. Finally, we discuss potential future directions.

Machine learning for causal inference: on the use of cross-fit estimators Machine Learning

Modern causal inference methods allow machine learning to be used to weaken parametric modeling assumptions. However, the use of machine learning may result in bias and incorrect inferences due to overfitting. Cross-fit estimators have been proposed to eliminate this bias and yield better statistical properties. We conducted a simulation study to assess the performance of several different estimators for the average causal effect (ACE). The data generating mechanisms for the simulated treatment and outcome included log-transforms, polynomial terms, and discontinuities. We compared singly-robust estimators (g-computation, inverse probability weighting) and doubly-robust estimators (augmented inverse probability weighting, targeted maximum likelihood estimation). Nuisance functions were estimated with parametric models and ensemble machine learning, separately. We further assessed cross-fit doubly-robust estimators. With correctly specified parametric models, all of the estimators were unbiased and confidence intervals achieved nominal coverage. When used with machine learning, the cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage. Due to the difficulty of properly specifying parametric models in high dimensional data, doubly-robust estimators with ensemble learning and cross-fitting may be the preferred approach for estimation of the ACE in most epidemiologic studies. However, these approaches may require larger sample sizes to avoid finite-sample issues.

High-dimensional macroeconomic forecasting using message passing algorithms Machine Learning

As a response to the increasing linkages between the macroeconomy and the financial sector, as well as the expanding interconnectedness of the global economy, empirical macroeconomic models have increased both in complexity and size. For that reason, estimation of modern models that inform macroeconomic decisions - such as linear and nonlinear versions of dynamic stochastic general equilibrium (DSGE) and vector autoregressive (VAR) models - many times relies on Bayesian inference via powerful Markov chain Monte Carlo (MCMC) methods. 1 However, existing posterior simulation algorithms cannot scale up to very high-dimensions due to the computational inefficiency and the larger numerical error associated with repeated sampling via Monte Carlo; see Angelino et al. (2016) for a thorough review of such computational issues from a machine learning and high-dimensional data perspective. In that respect, while Bayesian inference is a natural probabilistic framework for learning about parameters by utilizing all information in the data likelihood and prior, computational restrictions might make it less suitable for supporting real-time decision-making in very high dimensions. This paper introduces to the econometric literature the framework of factor graphs (Kschischang et al., 2001) for the purpose of designing computationally efficient, and easy to maintain, Bayesian estimation algorithms. The focus is not only on "faster" posterior inference broadly interpreted, but on designing algorithms that have such low complexity that are future-proof and can be used in high-dimensional econometric problems with possibly thousands or millions of coefficients.

Chronnet: a network-based model for spatiotemporal data analysis Machine Learning

The amount and size of spatiotemporal data sets from different domains have been rapidly increasing in the last years, which demands the development of robust and fast methods to analyze and extract information from them. In this paper, we propose a network-based model for spatiotemporal data analysis called chronnet. It consists of dividing a geometrical space into grid cells represented by nodes connected chronologically. The main goal of this model is to represent consecutive recurrent events between cells with strong links in the network. This representation permits the use of network science and graphing mining tools to extract information from spatiotemporal data. The chronnet construction process is fast, which makes it suitable for large data sets. In this paper, we describe how to use our model considering artificial and real data. For this purpose, we propose an artificial spatiotemporal data set generator to show how chronnets capture not just simple statistics, but also frequent patterns, spatial changes, outliers, and spatiotemporal clusters. Additionally, we analyze a real-world data set composed of global fire detections, in which we describe the frequency of fire events, outlier fire detections, and the seasonal activity, using a single chronnet.

Bayesian nonparametric modeling for predicting dynamic dependencies in multiple object tracking Machine Learning

Some challenging problems in tracking multiple objects include the time-dependent cardinality, unordered measurements and object parameter labeling. In this paper, we employ Bayesian Bayesian nonparametric methods to address these challenges. In particular, we propose modeling the multiple object parameter state prior using the dependent Dirichlet and Pitman-Yor processes. These nonparametric models have been shown to be more flexible and robust, when compared to existing methods, for estimating the time-varying number of objects, providing object labeling and identifying measurement to object associations. Monte Carlo sampling methods are then proposed to efficiently learn the trajectory of objects from noisy measurements. Using simulations, we demonstrate the estimation performance advantage of the new methods when compared to existing algorithms such as the generalized labeled multi-Bernoulli filter.