Hossain, M. Shahriar
Connecting the Dots Using Contextual Information Hidden in Text and Images
Kader, Md Abdul (The University of Texas at El Paso) | Naim, Sheikh Motahar (The University of Texas at El Paso) | Boedihardjo, Arnold P. (U. S. Army Corps of Engineers, Alexandria, VA) | Hossain, M. Shahriar (The University of Texas at El Paso)
Creation of summaries of events of interest from multitude of unstructured data is a challenging task commonly faced by intelligence analysts while seeking increased situational awareness. This paper proposes a framework called Storyboarding that leverages unstructured text and images to explain events as sets of sub-events. The framework first generates a textual context for each human face detected from images and then builds a chain of coherent documents where two consecutive documents of the chain contain a common theme as well as a context. Storyboarding helps analysts quickly narrow down large number of possibilities to a few significant ones for further investigation. Empirical studies on Wikipedia documents, images and news articles show that Storyboarding is able to provide deeper insights on events of interests.
Encoding Lineage in Scholarly Articles
Naim, Sheikh Motahar (University of Texas at El Paso) | Kader, Md Abdul (University of Texas at El Paso) | Boedihardjo, Arnold P. (US Army Corps of Engineers) | Hossain, M. Shahriar (University of Texas at El Paso)
The development of new scientific concepts today is an outcome of the accumulated knowledge built over time. Every scientific domain requires understanding of the trends of the dependencies between its subdomains. Analyses of trends to capture such dependencies using conventional document modeling techniques is a challenging task due to two reasons: (1) conventional vector-space modeling based representation of documents does not realize the history of the content, and (2) neither feature-level nor document-level causality is provided with any digital library metadata or citation network. In this paper, we propose an intuitive temporal representation of a scientific article that encodes inherent historic characteristics of the content. This intuitive representation of each document is then leveraged to discover causal relationships between scientific articles. In addition, we provide a mechanism to explore the lineage of each document in terms of other previously published documents, which illustrates how the theme of the document under analysis evolved over time. Empirical studies reported in the paper show that the proposed technique identifies meaningful causal relationships and discovers meaningful lineage in the scientific literature that could not be discovered through the citation network of the articles.
Efficiently Discovering Hammock Paths from Induced Similarity Networks
Hossain, M. Shahriar, Narayan, Michael, Ramakrishnan, Naren
Similarity networks are important abstractions in many information management applications such as recommender systems, corpora analysis, and medical informatics. For instance, by inducing similarity networks between movies rated similarly by users, or between documents containing common terms, and or between clinical trials involving the same themes, we can aim to find the global structure of connectivities underlying the data, and use the network as a basis to make connections between seemingly disparate entities. In the above applications, composing similarities between objects of interest finds uses in serendipitous recommendation, in storytelling, and in clinical diagnosis, respectively. We present an algorithmic framework for traversing similarity paths using the notion of `hammock' paths which are generalization of traditional paths. Our framework is exploratory in nature so that, given starting and ending objects of interest, it explores candidate objects for path following, and heuristics to admissibly estimate the potential for paths to lead to a desired destination. We present three diverse applications: exploring movie similarities in the Netflix dataset, exploring abstract similarities across the PubMed corpus, and exploring description similarities in a database of clinical trials. Experimental results demonstrate the potential of our approach for unstructured knowledge discovery in similarity networks.