Queen's University
OCR-Based Image Features for Biomedical Image and Article Classification: Identifying Documents Relevant to Genomic Cis-Regulatory Elements
Shatkay, Hagit ( University of Delaware ) | Narayanaswamy, Ramya (University of Delaware) | Nagaral, Santosh S. (University of Delaware) | Harrington, Na (Queen's University) | MV, Rohith (University of Delaware) | Somanath, Gowri (University of Delaware) | Tarpine, Ryan (Brown University) | Schutter, Kyle (Brown University) | Johnstone, Tim (Brown University) | Blostein, Dorothea (Queen's University) | Istrail, Sorin (Brown University) | Kambhamettu, Chandra (University of Delaware)
Images form a significant, yet under-utilized, information source in published biomedical articles. Much current work on biomedical image retrieval and classification uses simple, standard image representation employing features such as edge direction or gray scale histograms. In our earlier work we have used such features as well to classify images, where image-class-tags have been used to represent and classify complete articles. Here we focus on a different literature classification task: identifying articles discussing cis-regulatory elements and modules, motivated by the need to understand complex gene-networks. Curators attempting to identify such articles use as a major cue a certain type of image in which the conserved cis-regulatory region on the DNA is shown. Our experiments show that automatically identifying such images using common image features (such as gray scale) is highly error prone. However, using Optical Character Recognition (OCR) to extract alphabet characters from images, calculating character distribution and using the distribution parameters as image features, forms a novel image representation, which allows us to identify DNA-content in images with high precision and recall (over 0.9). Utilizing the occurrence of DNA-rich images within articles, we train a classifier to identify articles pertaining to cis-regulatory elements with a similarly high precision and recall. Using OCR-based image features has much potential beyond the current task, to identify other types of biomedical sequence-based images showing DNA, RNA and proteins. Moreover, automatically identifying such images is applicable beyond the current use-case, in other important biomedical document classification tasks.
Computational Pool: A New Challenge for Game Theory Pragmatics
Archibald, Christopher (Stanford University) | Altman, Alon (Stanford University) | Greenspan, Michael (Queen's University) | Shoham, Yoav (Stanford University)
Computational pool is a relatively recent entrant into the group of games played by computer agents. It features a unique combination of properties that distinguish it from oth- ers such games, including continuous action and state spaces, uncertainty in execution, a unique turn-taking structure, and of course an adversarial nature. This article discusses some of the work done to date, focusing on the software side of the pool-playing problem. We discuss in some depth CueCard, the program that won the 2008 computational pool tournament. Research questions and ideas spawned by work on this problem are also discussed. We close by announcing the 2011 computational pool tournament, which will take place in conjunction with the Twenty-Fifth AAAI Conference.