This thesis focuses on the research and development of the Hemodynamic Tissue Signature (HTS) method: an unsupervised machine learning approach to describe the vascular heterogeneity of glioblastomas by means of perfusion MRI analysis. The HTS builds on the concept of habitats. An habitat is defined as a sub-region of the lesion with a particular MRI profile describing a specific physiological behavior. The HTS method delineates four habitats within the glioblastoma: the High Angiogenic Tumor (HAT) habitat, as the most perfused region of the enhancing tumor; the Low Angiogenic Tumor (LAT) habitat, as the region of the enhancing tumor with a lower angiogenic profile; the potentially Infiltrated Peripheral Edema (IPE) habitat, as the non-enhancing region adjacent to the tumor with elevated perfusion indexes; and the Vasogenic Peripheral Edema (VPE) habitat, as the remaining edema of the lesion with the lowest perfusion profile. The results of this thesis have been published in ten scientific contributions, including top-ranked journals and conferences in the areas of Medical Informatics, Statistics and Probability, Radiology & Nuclear Medicine, Machine Learning and Data Mining and Biomedical Engineering. An industrial patent registered in Spain (ES201431289A), Europe (EP3190542A1) and EEUU (US20170287133A1) was also issued, summarizing the efforts of the thesis to generate tangible assets besides the academic revenue obtained from research publications. Finally, the methods, technologies and original ideas conceived in this thesis led to the foundation of ONCOANALYTICS CDX, a company framed into the business model of companion diagnostics for pharmaceutical compounds, thought as a vehicle to facilitate the industrialization of the ONCOhabitats technology.
This graduate textbook on machine learning tells a story of how patterns in data support predictions and consequential actions. Starting with the foundations of decision making, we cover representation, optimization, and generalization as the constituents of supervised learning. A chapter on datasets as benchmarks examines their histories and scientific bases. Self-contained introductions to causality, the practice of causal inference, sequential decision making, and reinforcement learning equip the reader with concepts and tools to reason about actions and their consequences. Throughout, the text discusses historical context and societal impact. We invite readers from all backgrounds; some experience with probability, calculus, and linear algebra suffices.
This thesis develops a conceptual framework considering social data as representing the surface layer of a hierarchy of human social behaviours, needs and cognition which is employed to transform social data into representations that preserve social behaviours and their causalities. Based on this framework two platforms were built to capture insights from fast-paced and slow-paced social data. For fast-paced, a self-structuring and incremental learning technique was developed to automatically capture salient topics and corresponding dynamics over time. An event detection technique was developed to automatically monitor those identified topic pathways for significant fluctuations in social behaviours using multiple indicators such as volume and sentiment. This platform is demonstrated using two large datasets with over 1 million tweets. The separated topic pathways were representative of the key topics of each entity and coherent against topic coherence measures. Identified events were validated against contemporary events reported in news. Secondly for the slow-paced social data, a suite of new machine learning and natural language processing techniques were developed to automatically capture self-disclosed information of the individuals such as demographics, emotions and timeline of personal events. This platform was trialled on a large text corpus of over 4 million posts collected from online support groups. This was further extended to transform prostate cancer related online support group discussions into a multidimensional representation and investigated the self-disclosed quality of life of patients (and partners) against time, demographics and clinical factors. The capabilities of this extended platform have been demonstrated using a text corpus collected from 10 prostate cancer online support groups comprising of 609,960 prostate cancer discussions and 22,233 patients.
The emergence and continued reliance on the Internet and related technologies has resulted in the generation of large amounts of data that can be made available for analyses. However, humans do not possess the cognitive capabilities to understand such large amounts of data. Machine learning (ML) provides a mechanism for humans to process large amounts of data, gain insights about the behavior of the data, and make more informed decision based on the resulting analysis. ML has applications in various fields. This review focuses on some of the fields and applications such as education, healthcare, network security, banking and finance, and social media. Within these fields, there are multiple unique challenges that exist. However, ML can provide solutions to these challenges, as well as create further research opportunities. Accordingly, this work surveys some of the challenges facing the aforementioned fields and presents some of the previous literature works that tackled them. Moreover, it suggests several research opportunities that benefit from the use of ML to address these challenges.
What if I told a story here, how would that story start?" Thus, the summarization prompt: "My second grader asked me what this passage means: …" When a given prompt isn't working and GPT-3 keeps pivoting into other modes of completion, that may mean that one hasn't constrained it enough by imitating a correct output, and one needs to go further; writing the first few words or sentence of the target output may be necessary.