Goto

Collaborating Authors

 Industry


Improving Forecasting Accuracy Using Bayesian Network Decomposition in Prediction Markets

AAAI Conferences

We propose to improve the accuracy of prediction market forecasts by using Bayesian networks to constrain probabilities among related questions. Prediction markets are already known to increase forecast accuracy compared to single best estimates. Our own flat prediction market substantially beat a baseline linear opinion pool during the first year. One way to improve performance is by expressing relationships among the questions. Elsewhere we describe work on combinatorial markets. Here we show how to use Bayesian networks within a flat market. The general approach is to decompose a target question (hypothesis) into a set of related variables (causal factors and evidence), when the relationship among the variables is known with some confidence. Then the marginal probabilities for the variables in the Bayes net are updated using the market estimates, with the Bayes net enforcing coherence. This paper describes the overall concept, shows the results for a particular model of the potential Greek exit from the European Union, and describes the teamโ€™s future research plan.


Automatic Identification of Key Concepts in Large PubMed Retrievals

AAAI Conferences

PubMed queries frequently retrieve thousands of documents making it very challenging for a user to identify information of interest. In this paper we propose a method for automatically identifying central concepts in large PubMed retrievals. The centrality of concept is modeled using the hypergeometric distribution. Retrieved documents are grouped by concept, which can help users navigate the retrieval. We test our method on five datasets, each representing a medical condition.


Multi-Tweet Summarization for Flu Outbreak Detection

AAAI Conferences

Twitter provides the freshest source of data about what is happening in the lives people across the world. The publicly available streams of status updates available on Twitter have been used to track earthquakes, forest fires and most especially flu outbreaks. Current techniques for tracking flu outbreaks rely on count data for a number of keywords. However, count data alone on the noisy Twitter streams is not reliable enough for health officials to make critical decisions. We propose a semi-automatic outbreak detection system. Rather than providing only alarms backed by count data, we propose a summarization system that will allow health officials to quickly verify outbreak alarms. This will lead to higher levels of trust in the system and allow the system to be used by health organizations around the world. We experimentally verify our summarization system and have found system users to have an accuracy of 0.86 when identifying multi-tweet summaries.


BioASQ: A Challenge on Large-Scale Biomedical Semantic Indexing and Question Answering

AAAI Conferences

This article provides an overview of BioASQ, a new competition on biomedical semantic indexing and question answering (QA). BioASQ aims to push towards systems that will allow biomedical workers to express their information needs in natural language and that will return concise and user-understandable answers by combining information from multiple sources of different kinds, including biomedical articles, databases, and ontologies. BioASQ encourages participants to adopt semantic indexing as a means to combine multiple information sources and to facilitate the matching of questions to answers. It also adopts a broad semantic indexing and QA architecture that subsumes current relevant approaches, even though no current system instantiates all of its components. Hence, the architecture can also be seen as our view of how relevant work from fields such as information retrieval, hierarchical classification, question answering, ontologies, and linked data can be combined, extended, and applied to biomedical question answering. BioASQ will develop publicly available benchmarks and it will adopt and possibly refine existing evaluation measures. The evaluation infrastructure of the competition will remain publicly available beyond the end of BioASQ.


Efficient Classification of Clinical Reports Utilizing Natural Language Processing

AAAI Conferences

The recent emphasis on health information technology has highlighted the importance of leveraging the large amount of electronic clinical data to help guide medical decision-making. Developing such clinical decision aids requires manual review of many past patient reports in order to generate a good predictive model. In this research, we investigate classification of clinical reports using natural language processing (NLP). The proposed system uses NLP to generate structured output from computed tomography (CT) reports and then machine learning techniques to code for the presence of clinically important injuries for traumatic orbital fracture victims. Our results show that NLP improves upon raw text classification results.


Towards Semantic Literature Based Discovery

AAAI Conferences

Previous systems for literature based discovery, an automatic method of identifying hidden knowledge, have largely been based on bag of words approaches which perform only limited semantic analysis and interpretation. We describe the shortcomings of these approaches and suggest possible solutions that make use of techniques from Natural Language Processing.


Investigating Twitter as a Source for Studying Behavioral Responses to Epidemics

AAAI Conferences

Recent studies have shown an ability to track influenza rates from Twitter since Twitter users tweet illnesses (โ€œi am home sick with the fluโ€). However, users may also tweet concerned awareness of illness (โ€œdonโ€™t want to get sick, need a flu shotโ€). Identifying these messages can support computational epidemic response models. We present preliminary results for mining concerned awareness of influenza tweets. We describe our data set construction and experiments with binary classification of data into influenza versus general messages and classification into concerned awareness and existing infection.


Term Evolution: Use of Biomedical Terminologies

AAAI Conferences

This extended abstract presents a work in progress of using terminological resources from the biomedical domain to systematically study the change of domain terminology over time. In particular we investigate term replacement. In order to study term replacement over time, semantic knowledge like conceptual granularity of a term is necessary. We analyze three popular biomedical terminology resources (UMLS, CTD, SNOMED CT) and show how information provided there can be used to extract lexically distinctive synonym sets that exclude variants. We use the entire PubMed dataset to chronologically study occurrences of extracted synonyms. Our experiments on the disease subsets of three terminologies reveal that the phenomenon of term replacement can be observed in around 60% of the extracted synonym sets.


Towards Effective Representation of Clinical Documents for Search and Retrieval

AAAI Conferences

Recent studies have demonstrated the advantages of structured search of PubMed abstracts when compared with unstructured key word search. We explore whether search on clinical text is similarly enhanced by representing domain specific structures, information, and knowledge. Examples include representations of document structure and sections, local context such as negation, and appropriate modeling of scalar quantities. We examine tasks ranging from recruitment of suitable patients for studies, to chronic disease prevention and management, to longitudinal studies of individual patients or groups, as well as comparative experiments performed on an NLP enhanced clinical search tool that operates on large corpora of clinical text.


PROBE: Periodic Random Orbiter Algorithm for Machine Learning

AAAI Conferences

We present a new algorithm, which we call PROBE, to find the minimum of a convex function. Such a minimization is important in many machine learning methods, including Support Vector Machines (SVM). We show that PROBE is a viable alternative to published algorithms for SVM learning with several important advantages. PROBE is a simple and easily programmed algorithm, with a well-defined, parametrized stopping criterion; it is not limited to SVM, but can be applied to other convex loss functions, such as the Huber and Maximum Entropy models; and its time and memory requirements are consistently modest in handling very large training sets.