IBM Watson Health has formed a medical imaging collaborative with more than 15 leading healthcare organizations. The goal: To take on some of the most deadly diseases. The collaborative, which includes health systems, academic medical centers, ambulatory radiology providers and imaging technology companies, aims to help doctors address breast, lung, and other cancers; diabetes; eye health; brain disease; and heart disease and related conditions, such as stroke. Watson will mine insights from what IBM calls previously invisible unstructured imaging data and combine it with a broad variety of data from other sources, such as data from electronic health records, radiology and pathology reports, lab results, doctors' progress notes, medical journals, clinical care guidelines and published outcomes studies. As the work of the collaborative evolves, Watson's rationale and insights will evolve, informed by the latest combined thinking of the participating organizations.
We propose a "soft greedy" learning algorithm for building small conjunctions of simple threshold functions, called rays, defined on single real-valued attributes. We also propose a PAC-Bayes risk bound which is minimized for classifiers achieving a nontrivial tradeoff between sparsity (the number of rays used) and the magnitude ofthe separating margin of each ray. Finally, we test the soft greedy algorithm on four DNA micro-array data sets.
One of the most fundamental problems in causal inference is the estimation of a causal effect when variables are confounded. This is difficult in an observational study, because one has no direct evidence that all confounders have been adjusted for. We introduce a novel approach for estimating causal effects that exploits observational conditional independencies to suggest "weak" paths in a unknown causal graph. The widely used faithfulness condition of Spirtes et al. is relaxed to allow for varying degrees of "path cancellations" that imply conditional independencies but do not rule out the existence of confounding causal paths. The outcome is a posterior distribution over bounds on the average causal effect via a linear programming approach and Bayesian inference. We claim this approach should be used in regular practice along with other default tools in observational studies.
In our previous study, we introduced stable specification search for cross-sectional data (S3C). It is an exploratory causal method that combines stability selection concept and multi-objective optimization to search for stable and parsimonious causal structures across the entire range of model complexities. In this study, we extended S3C to S3C-Latent, to model causal relations between latent variables. We evaluated S3C-Latent on simulated data and compared the results to those of PC-MIMBuild, an extension of the PC algorithm, the state-of-the-art causal discovery method. The comparison showed that S3C-Latent achieved better performance. We also applied S3C-Latent to real-world data of children with attention deficit/hyperactivity disorder and data about measuring mental abilities among pupils. The results are consistent with those of previous studies.
This article proposes a Bayesian nonparametric method for forecasting, imputation, and clustering in sparsely observed, multivariate time series. The method is appropriate for jointly modeling hundreds of time series with widely varying, non-stationary dynamics. Given a collection of $N$ time series, the Bayesian model first partitions them into independent clusters using a Chinese restaurant process prior. Within a cluster, all time series are modeled jointly using a novel "temporally-coupled" extension of the Chinese restaurant process mixture. Markov chain Monte Carlo techniques are used to obtain samples from the posterior distribution, which are then used to form predictive inferences. We apply the technique to challenging prediction and imputation tasks using seasonal flu data from the US Center for Disease Control and Prevention, demonstrating competitive imputation performance and improved forecasting accuracy as compared to several state-of-the art baselines. We also show that the model discovers interpretable clusters in datasets with hundreds of time series using macroeconomic data from the Gapminder Foundation.