Goto

Collaborating Authors

 Karthik, Trishank


Contextual Information Portals

AAAI Conferences

There is a wealth of information on the Web about any number of topics. Many communities in developing regions are often interested in information relating to specific topics. For example, health workers are interested in specific medical information regarding epidemic diseases in their region while teachers and students are interested in educational information relating to their curriculum. This paper presents the design of Contextual Information Portals, searchable information portals that contain a vertical slice of the Web about arbitrary topics tailored to a specific context. Contextual portals are particularly useful for communities that lack Internet or Web access or in regions with very poor network connectivity. This paper outlines the design space for constructing contextual information portals and describes the key technical challenges involved. We have implemented a proof-of-concept of our ideas, and performed an initial evaluation on a variety of topics relating to epidemiology, agriculture, and education.


Document Classification for Focused Topics

AAAI Conferences

Feature extraction is one of the fundamental challenges in improving the accuracy of document classification. While there has been a large body of research literature on document classification, most existing approaches either do not have a high classification accuracy or require massive training sets. In this paper, we propose a simple feature extraction algorithm that can achieve high document classification accuracy in the context of development-centric topics. Our feature extraction algorithm exploits two distinct aspects in development-centric topics: most of these topics tend to be very focused (unlike semantically hard classification topics such as chemistry or banks) due to local language and cultural underpinnings in these topics, the authentic pages tend to use several region specific features. Our algorithm uses a combination of popularity and rarity as two separate metrics to extract features that describe a topic. Given a topic, our output feature set comprises of: (i) a list of popular keywords closely related to the topic; (ii) a list of rare keywords closely related to the topic. We show that a simple joint classifier based on these two feature sets can achieve high classification accuracy while each feature sub-set in itself is insufficient. We have tested our algorithm across a wide range of development-centric topics.