Industry
Improving Forecasting Accuracy Using Bayesian Network Decomposition in Prediction Markets
Berea, Anamaria (George Mason University) | Maxwell, Daniel (George Mason University) | Twardy, Charles (George Mason University)
We propose to improve the accuracy of prediction market forecasts by using Bayesian networks to constrain probabilities among related questions. Prediction markets are already known to increase forecast accuracy compared to single best estimates. Our own flat prediction market substantially beat a baseline linear opinion pool during the first year. One way to improve performance is by expressing relationships among the questions. Elsewhere we describe work on combinatorial markets. Here we show how to use Bayesian networks within a flat market. The general approach is to decompose a target question (hypothesis) into a set of related variables (causal factors and evidence), when the relationship among the variables is known with some confidence. Then the marginal probabilities for the variables in the Bayes net are updated using the market estimates, with the Bayes net enforcing coherence. This paper describes the overall concept, shows the results for a particular model of the potential Greek exit from the European Union, and describes the teamโs future research plan.
Automatic Identification of Key Concepts in Large PubMed Retrievals
Yeganova, Lana (National Library of Medicine, National Institutes of Health) | Grigoryan, Vahan (National Library of Medicine, National Institutes of Health) | Kim, Won (National Library of Medicine, National Institutes of Health) | Wilbur, W. John (National Library of Medicine, National Institutes of Health)
PubMed queries frequently retrieve thousands of documents making it very challenging for a user to identify information of interest. In this paper we propose a method for automatically identifying central concepts in large PubMed retrievals. The centrality of concept is modeled using the hypergeometric distribution. Retrieved documents are grouped by concept, which can help users navigate the retrieval. We test our method on five datasets, each representing a medical condition.
Multi-Tweet Summarization for Flu Outbreak Detection
Wenerstrom, Brent (University of Louisville) | Kantardzic, Mehmed (University of Louisville) | Arabmakki, Elaheh (University of Louisville) | Hindi, Musa (University of Louisville)
Twitter provides the freshest source of data about what is happening in the lives people across the world. The publicly available streams of status updates available on Twitter have been used to track earthquakes, forest fires and most especially flu outbreaks. Current techniques for tracking flu outbreaks rely on count data for a number of keywords. However, count data alone on the noisy Twitter streams is not reliable enough for health officials to make critical decisions. We propose a semi-automatic outbreak detection system. Rather than providing only alarms backed by count data, we propose a summarization system that will allow health officials to quickly verify outbreak alarms. This will lead to higher levels of trust in the system and allow the system to be used by health organizations around the world. We experimentally verify our summarization system and have found system users to have an accuracy of 0.86 when identifying multi-tweet summaries.
BioASQ: A Challenge on Large-Scale Biomedical Semantic Indexing and Question Answering
Tsatsaronis, George (Technische Universitรคt Dresden) | Schroeder, Michael (Technische Universitรคt Dresden) | Paliouras, Georgios (NCSR Demokritos, Athens) | Almirantis, Yannis (NCSR Demokritos, Athens) | Androutsopoulos, Ion (Athens University of Economics and Business) | Gaussier, Eric (Universitรฉ Joseph Fourier) | Gallinari, Patrick (Universitรฉ Pierre et Marie Curie LIP6) | Artieres, Thierry (Universitรฉ Pierre et Marie Curie LIP6) | Alvers, Michael R. (Transinsight GmbH) | Zschunke, Matthias (Transinsight GmbH) | Ngomo, Axel-Cyrille Ngonga (University of Leipzig)
This article provides an overview of BioASQ, a new competition on biomedical semantic indexing and question answering (QA). BioASQ aims to push towards systems that will allow biomedical workers to express their information needs in natural language and that will return concise and user-understandable answers by combining information from multiple sources of different kinds, including biomedical articles, databases, and ontologies. BioASQ encourages participants to adopt semantic indexing as a means to combine multiple information sources and to facilitate the matching of questions to answers. It also adopts a broad semantic indexing and QA architecture that subsumes current relevant approaches, even though no current system instantiates all of its components. Hence, the architecture can also be seen as our view of how relevant work from fields such as information retrieval, hierarchical classification, question answering, ontologies, and linked data can be combined, extended, and applied to biomedical question answering. BioASQ will develop publicly available benchmarks and it will adopt and possibly refine existing evaluation measures. The evaluation infrastructure of the competition will remain publicly available beyond the end of BioASQ.
Efficient Classification of Clinical Reports Utilizing Natural Language Processing
Sarioglu, Efsun (The George Washington University) | Yadav, Kabir (The George Washington University) | Choi, Hyeong-Ah (The George Washington University)
The recent emphasis on health information technology has highlighted the importance of leveraging the large amount of electronic clinical data to help guide medical decision-making. Developing such clinical decision aids requires manual review of many past patient reports in order to generate a good predictive model. In this research, we investigate classification of clinical reports using natural language processing (NLP). The proposed system uses NLP to generate structured output from computed tomography (CT) reports and then machine learning techniques to code for the presence of clinically important injuries for traumatic orbital fracture victims. Our results show that NLP improves upon raw text classification results.
Towards Semantic Literature Based Discovery
Preiss, Judita (University of Sheffield) | Stevenson, Mark (University of Sheffield) | McClure, M. Heidi (University of Sheffield and Intelligent Software Solutions, Inc)
Previous systems for literature based discovery, an automatic method of identifying hidden knowledge, have largely been based on bag of words approaches which perform only limited semantic analysis and interpretation. We describe the shortcomings of these approaches and suggest possible solutions that make use of techniques from Natural Language Processing.
Investigating Twitter as a Source for Studying Behavioral Responses to Epidemics
Lamb, Alex (Johns Hopkins University) | Paul, Michael J. (Johns Hopkins University) | Dredze, Mark (Johns Hopkins University)
Recent studies have shown an ability to track influenza rates from Twitter since Twitter users tweet illnesses (โi am home sick with the fluโ). However, users may also tweet concerned awareness of illness (โdonโt want to get sick, need a flu shotโ). Identifying these messages can support computational epidemic response models. We present preliminary results for mining concerned awareness of influenza tweets. We describe our data set construction and experiments with binary classification of data into influenza versus general messages and classification into concerned awareness and existing infection.
Term Evolution: Use of Biomedical Terminologies
Grigonyte, Gintare (University of Zurich) | Rinaldi, Fabio (University of Zurich) | Volk, Martin (University of Zurich)
This extended abstract presents a work in progress of using terminological resources from the biomedical domain to systematically study the change of domain terminology over time. In particular we investigate term replacement. In order to study term replacement over time, semantic knowledge like conceptual granularity of a term is necessary. We analyze three popular biomedical terminology resources (UMLS, CTD, SNOMED CT) and show how information provided there can be used to extract lexically distinctive synonym sets that exclude variants. We use the entire PubMed dataset to chronologically study occurrences of extracted synonyms. Our experiments on the disease subsets of three terminologies reveal that the phenomenon of term replacement can be observed in around 60% of the extracted synonym sets.
Towards Effective Representation of Clinical Documents for Search and Retrieval
Davis, Anthony R. (3M Health Information Systems) | Nossal, Michael (3M Health Information Systems) | Ober, N. Stephen (3M Health Information Systems)
Recent studies have demonstrated the advantages of structured search of PubMed abstracts when compared with unstructured key word search. We explore whether search on clinical text is similarly enhanced by representing domain specific structures, information, and knowledge. Examples include representations of document structure and sections, local context such as negation, and appropriate modeling of scalar quantities. We examine tasks ranging from recruitment of suitable patients for studies, to chronic disease prevention and management, to longitudinal studies of individual patients or groups, as well as comparative experiments performed on an NLP enhanced clinical search tool that operates on large corpora of clinical text.
PROBE: Periodic Random Orbiter Algorithm for Machine Learning
Smith, Larry (National Institutes of Health) | Kim, Won (National Institutes of Health) | Wilbur, W. John
We present a new algorithm, which we call PROBE, to find the minimum of a convex function. Such a minimization is important in many machine learning methods, including Support Vector Machines (SVM). We show that PROBE is a viable alternative to published algorithms for SVM learning with several important advantages. PROBE is a simple and easily programmed algorithm, with a well-defined, parametrized stopping criterion; it is not limited to SVM, but can be applied to other convex loss functions, such as the Huber and Maximum Entropy models; and its time and memory requirements are consistently modest in handling very large training sets.