Goto

Collaborating Authors

 Industry


EHRs Connect Research and Practice: Where Predictive Modeling, Artificial Intelligence, and Clinical Decision Support Intersect

arXiv.org Machine Learning

Objectives: Electronic health records (EHRs) are only a first step in capturing and utilizing health-related data - the challenge is turning that data into useful information. Furthermore, EHRs are increasingly likely to include data relating to patient outcomes, functionality such as clinical decision support, and genetic information as well, and, as such, can be seen as repositories of increasingly valuable information about patients' health conditions and responses to treatment over time. Methods: We describe a case study of 423 patients treated by Centerstone within Tennessee and Indiana in which we utilized electronic health record data to generate predictive algorithms of individual patient treatment response. Multiple models were constructed using predictor variables derived from clinical, financial and geographic data. Results: For the 423 patients, 101 deteriorated, 223 improved and in 99 there was no change in clinical condition. Based on modeling of various clinical indicators at baseline, the highest accuracy in predicting individual patient response ranged from 70-72% within the models tested. In terms of individual predictors, the Centerstone Assessment of Recovery Level - Adult (CARLA) baseline score was most significant in predicting outcome over time (odds ratio 4.1 + 2.27). Other variables with consistently significant impact on outcome included payer, diagnostic category, location and provision of case management services. Conclusions: This approach represents a promising avenue toward reducing the current gap between research and practice across healthcare, developing data-driven clinical decision support based on real-world populations, and serving as a component of embedded clinical artificial intelligences that "learn" over time.


Structured sparsity through convex optimization

arXiv.org Machine Learning

Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. While naturally cast as a combinatorial optimization problem, variable or feature selection admits a convex relaxation through the regularization by the $\ell_1$-norm. In this paper, we consider situations where we are not only interested in sparsity, but where some structural prior knowledge is available as well. We show that the $\ell_1$-norm can then be extended to structured norms built on either disjoint or overlapping groups of variables, leading to a flexible framework that can deal with various structures. We present applications to unsupervised learning, for structured sparse principal component analysis and hierarchical dictionary learning, and to supervised learning in the context of non-linear variable selection.


The Discrete Infinite Logistic Normal Distribution

arXiv.org Machine Learning

We present the discrete infinite logistic normal distribution (DILN), a Bayesian nonparametric prior for mixed membership models. DILN is a generalization of the hierarchical Dirichlet process (HDP) that models correlation structure between the weights of the atoms at the group level. We derive a representation of DILN as a normalized collection of gamma-distributed random variables, and study its statistical properties. We consider applications to topic modeling and derive a variational inference algorithm for approximate posterior inference. We study the empirical performance of the DILN topic model on four corpora, comparing performance with the HDP and the correlated topic model (CTM). To deal with large-scale data sets, we also develop an online inference algorithm for DILN and compare with online HDP and online LDA on the Nature magazine, which contains approximately 350,000 articles.


The Complexity of Manipulating $k$-Approval Elections

arXiv.org Artificial Intelligence

An important problem in computational social choice theory is the complexity of undesirable behavior among agents, such as control, manipulation, and bribery in election systems. These kinds of voting strategies are often tempting at the individual level but disastrous for the agents as a whole. Creating election systems where the determination of such strategies is difficult is thus an important goal. An interesting set of elections is that of scoring protocols. Previous work in this area has demonstrated the complexity of misuse in cases involving a fixed number of candidates, and of specific election systems on unbounded number of candidates such as Borda. In contrast, we take the first step in generalizing the results of computational complexity of election misuse to cases of infinitely many scoring protocols on an unbounded number of candidates. Interesting families of systems include $k$-approval and $k$-veto elections, in which voters distinguish $k$ candidates from the candidate set. Our main result is to partition the problems of these families based on their complexity. We do so by showing they are polynomial-time computable, NP-hard, or polynomial-time equivalent to another problem of interest. We also demonstrate a surprising connection between manipulation in election systems and some graph theory problems.


Avian Influenza (H5N1) Expert System using Dempster-Shafer Theory

arXiv.org Artificial Intelligence

Based on Cumulative Number of Confirmed Human Cases of Avian Influenza (H5N1) Reported to World Health Organization (WHO) in the 2011 from 15 countries, Indonesia has the largest number death because Avian Influenza which 146 deaths. In this research, the researcher built an Avian Influenza (H5N1) Expert System for identifying avian influenza disease and displaying the result of identification process. In this paper, we describe five symptoms as major symptoms which include depression, combs, wattle, bluish face region, swollen face region, narrowness of eyes, and balance disorders. We use chicken as research object. Research location is in the Lampung Province, South Sumatera. The researcher reason to choose Lampung Province in South Sumatera on the basis that has a high poultry population. Dempster-Shafer theory to quantify the degree of belief as inference engine in expert system, our approach uses Dempster-Shafer theory to combine beliefs under conditions of uncertainty and ignorance, and allows quantitative measurement of the belief and plausibility in our identification result. The result reveal that Avian Influenza (H5N1) Expert System has successfully identified the existence of avian influenza and displaying the result of identification process.


Avian Influenza (H5N1) Warning System using Dempster-Shafer Theory and Web Mapping

arXiv.org Artificial Intelligence

Based on Cumulative Number of Confirmed Human Cases of Avian Influenza (H5N1) Reported to World Health Organization (WHO) in the 2011 from 15 countries, Indonesia has the largest number death because Avian Influenza which 146 deaths. In this research, the researcher built a Web Mapping and Dempster-Shafer theory as early warning system of avian influenza. Early warning is the provision of timely and effective information, through identified institutions, that allows individuals exposed to a hazard to take action to avoid or reduce their risk and prepare for effective response. In this paper as example we use five symptoms as major symptoms which include depression, combs, wattle, bluish face region, swollen face region, narrowness of eyes, and balance disorders. Research location is in the Lampung Province, South Sumatera. The researcher reason to choose Lampung Province in South Sumatera on the basis that has a high poultry population. Geographically, Lampung province is located at 103040' to 105050' East Longitude and 6045' - 3045' South latitude, confined with: South Sumatera and Bengkulu on North Side, Sunda Strait on the Side, Java Sea on the East Side, Indonesia Ocean on the West Side. Our approach uses Dempster Shafer theory to combine beliefs in certain hypotheses under conditions of uncertainty and ignorance, and allows quantitative measurement of the belief and plausibility in our identification result. Web Mapping is also used for displaying maps on a screen to visualize the result of the identification process. The result reveal that avian influenza warning system has successfully identified the existence of avian influenza and the maps can be displayed as the visualization.


The Artificial Regression Market

arXiv.org Machine Learning

The Artificial Prediction Market is a recent machine learning technique for multi-class classification, inspired from the financial markets. It involves a number of trained market participants that bet on the possible outcomes and are rewarded if they predict correctly. This paper generalizes the scope of the Artificial Prediction Markets to regression, where there are uncountably many possible outcomes and the error is usually the MSE. For that, we introduce the reward kernel that rewards each participant based on its prediction error and we derive the price equations. Using two reward kernels we obtain two different learning rules, one of which is approximated using Hermite-Gauss quadrature. The market setting makes it easy to aggregate specialized regressors that only predict when an observation falls into their specialization domain. Experiments show that regression markets based on the two learning rules outperform Random Forest Regression on many UCI datasets and are rarely outperformed.


An existing, ecologically-successful genus of collectively intelligent artificial creatures

arXiv.org Artificial Intelligence

ABSTRACT People sometimes worry about the Singularity (Vinge 1993, Kurzweil 2005), or about the world being taken over by artificially intelligent robots. I believe the risks of these are very small. However, few people recognize that we already share our world with artificial creatures that participate as intelligent agents in our society: corporations. Our planet is inhabited by two distinct kinds of intelligent beings -- individual humans and corporate entities -- whose natures and interests are intimately linked. To coexist well, we need to find ways to define the rights and responsibilities of both individual humans and corporate entities, and to find ways to ensure that corporate entities behave as responsible members of society. CORPORATIONS ARE INTELLIGENT AGENTS A corporation is an artificial legal entity, created by the state through a particular kind of legal agreement. A corporation can own property, can sign contracts, can sue and be sued in court, and can be prosecuted and punished for crimes. It can act as an economic agent on its own behalf in our society. A corporation can have goals, can make plans to achieve those goals, and can use its resources to act to carry out those plans. It solves problems and makes decisions about how best to achieve its goals, so it can be considered as an intelligent agent, as defined by a leading text in Artificial Intelligence (Russell & Norvig 2010, p. 34). An agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators.... A human agent has eyes, ears, and other organs for sensors and hands, legs, vocal tract, and so on for actuators.


Solution Representations and Local Search for the bi-objective Inventory Routing Problem

arXiv.org Artificial Intelligence

The solution of the biobjective IRP is rather challenging, even for metaheuristics. We are still lacking a profound understanding of appropriate solution representations and effective neighborhood structures. Clearly, both the delivery volumes and the routing aspects of the alternatives need to be reflected in an encoding, and must be modified when searching by means of local search. Our work contributes to the better understanding of such solution representations. On the basis of an experimental investigation, the advantages and drawbacks of two encodings are studied and compared.


Regularized Partial Least Squares with an Application to NMR Spectroscopy

arXiv.org Machine Learning

Department of Statistics, Rice University Abstract High-dimensional data common in genomics, proteomics, and chemometrics often contains complicated correlation structures. Recently, partial least squares (PLS) and Sparse PLS methods have gained attention in these areas as dimension reduction techniques in the context of supervised data analysis. We introduce a framework for Regularized PLS by solving a relaxation of the SIMPLS optimization problem with penalties on the PLS loadings vectors. Our approach enjoys many advantages including flexibility, general penalties, easy interpretation of results, and fast computation in high-dimensional settings. We also outline extensions of our methods leading to novel methods for Nonnegative PLS and Generalized PLS, an adaption of PLS for structured data. We demonstrate the utility of our methods through simulations and a case study on proton Nuclear Magnetic Resonance (NMR) spectroscopy data. To whom correspondence should be addressed; Department of Statistics, Rice University, MS 138, 6100 Main St., Houston, TX 77005 (email: gallen@rice.edu) 1 Introduction Technologies to measure high-throughput biomedical data in proteomics, chemometrics, and genomics have led to a proliferation of high-dimensional data that pose many statistical challenges. As genes, proteins, and metabolites, are biologically interconnected, the variables in these data sets are often highly correlated. In this context, several have recently advocated using partial least squares (PLS) for dimension reduction of supervised data, or data with a response or labels (Nguyen and Rocke, 2002b; Boulesteix and Strimmer, 2007; Rossouw et al., 2008; Chun and Keleş, 2010). First introduced by Wold (1966) as a regression method that uses least squares on a set of derived inputs accounting for multi-colinearities, others have since proposed alternative methods for PLS with multiple responses (de Jong, 1993) and for classification (Marx, 1996; Barker and Rayens, 2003).