Life's most valuable asset is health. Continuously understanding the state of our health and modeling how it evolves is essential if we wish to improve it. Given the opportunity that people live with more data about their life today than any other time in history, the challenge rests in interweaving this data with the growing body of knowledge to compute and model the health state of an individual continually. This dissertation presents an approach to build a personal model and dynamically estimate the health state of an individual by fusing multi-modal data and domain knowledge. The system is stitched together from four essential abstraction elements: 1. the events in our life, 2. the layers of our biological systems (from molecular to an organism), 3. the functional utilities that arise from biological underpinnings, and 4. how we interact with these utilities in the reality of daily life. Connecting these four elements via graph network blocks forms the backbone by which we instantiate a digital twin of an individual. Edges and nodes in this graph structure are then regularly updated with learning techniques as data is continuously digested. Experiments demonstrate the use of dense and heterogeneous real-world data from a variety of personal and environmental sensors to monitor individual cardiovascular health state. State estimation and individual modeling is the fundamental basis to depart from disease-oriented approaches to a total health continuum paradigm. Precision in predicting health requires understanding state trajectory. By encasing this estimation within a navigational approach, a systematic guidance framework can plan actions to transition a current state towards a desired one. This work concludes by presenting this framework of combining the health state and personal graph model to perpetually plan and assist us in living life towards our goals.
A broad spectrum of data from different modalities are generated in the healthcare domain every day, including scalar data (e.g., clinical measures collected at hospitals), tensor data (e.g., neuroimages analyzed by research institutes), graph data (e.g., brain connectivity networks), and sequence data (e.g., digital footprints recorded on smart sensors). Capability for modeling information from these heterogeneous data sources is potentially transformative for investigating disease mechanisms and for informing therapeutic interventions. Our works in this thesis attempt to facilitate healthcare applications in the setting of broad learning which focuses on fusing heterogeneous data sources for a variety of synergistic knowledge discovery and machine learning tasks. We are generally interested in computer-aided diagnosis, precision medicine, and mobile health by creating accurate user profiles which include important biomarkers, brain connectivity patterns, and latent representations. In particular, our works involve four different data mining problems with application to the healthcare domain: multi-view feature selection, subgraph pattern mining, brain network embedding, and multi-view sequence prediction.
Such information includes: the database in modern hospital systems, usually known as Electronic Health Records (EHR), which store the patients' diagnosis, medication, laboratory test results, medical image data, etc.; information on various health behaviors tracked and stored by wearable devices, ubiquitous sensors and mobile applications, such as the smoking status, alcoholism history, exercise level, sleeping conditions, etc.; information collected by census or various surveys regarding sociodemographic factors of the target cohort; and information on people's mental health inferred from their social media activities or social networks such as Twitter, Facebook, etc. These health-related data come from heterogeneous sources, describe assorted aspects of the individual's health conditions. Such data is rich in structure and information which has great research potentials for revealing unknown medical knowledge about genomic epidemiology, disease developments and correlations, drug discoveries, medical diagnosis, mental illness prevention, health behavior adaption, etc. In real-world problems, the number of features relating to a certain health condition could grow exponentially with the development of new information techniques for collecting and measuring data. To reveal the causal influence between various factors and a certain disease or to discover the correlations among diseases from data at such a tremendous scale, requires the assistance of advanced information technology such as data mining, machine learning, text mining, etc. Machine learning technology not only provides a way for learning qualitative relationships among features and patients, but also the quantitative parameters regarding the strength of such correlations.
In this chapter, we provide a brief overview of applying machine learning techniques for clinical prediction tasks. We begin with a quick introduction to the concepts of machine learning and outline some of the most common machine learning algorithms. Next, we demonstrate how to apply the algorithms with appropriate toolkits to conduct machine learning experiments for clinical prediction tasks. The objectives of this chapter are to (1) understand the basics of machine learning techniques and the reasons behind why they are useful for solving clinical prediction problems, (2) understand the intuition behind some machine learning models, including regression, decision trees, and support vector machines, and (3) understand how to apply these models to clinical prediction problems using publicly available datasets via case studies.