Education
Contextual Information Portals
Chen, Jay Chen (New York University) | Karthik, Trishank (New York University) | Subramanian, Lakshminarayanan (New York University)
There is a wealth of information on the Web about any number of topics. Many communities in developing regions are often interested in information relating to specific topics. For example, health workers are interested in specific medical information regarding epidemic diseases in their region while teachers and students are interested in educational information relating to their curriculum. This paper presents the design of Contextual Information Portals, searchable information portals that contain a vertical slice of the Web about arbitrary topics tailored to a specific context. Contextual portals are particularly useful for communities that lack Internet or Web access or in regions with very poor network connectivity. This paper outlines the design space for constructing contextual information portals and describes the key technical challenges involved. We have implemented a proof-of-concept of our ideas, and performed an initial evaluation on a variety of topics relating to epidemiology, agriculture, and education.
A Formal Approach to Modeling the Memory of a Living Organism
We consider a living organism as an observer of the evolution of its environment recording sensory information about the state space X of the environment in real time. Sensory information is sampled and then processed on two levels. On the biological level, the organism serves as an evaluation mechanism of the subjective relevance of the incoming data to the observer: the observer assigns excitation values to events in X it could recognize using its sensory equipment. On the algorithmic level, sensory input is used for updating a database - the memory of the observer - whose purpose is to serve as a geometric/combinatorial model of X, whose nodes are weighted by the excitation values produced by the evaluation mechanism. These values serve as a guidance system for deciding how the database should transform as observation data mounts. We define a searching problem for the proposed model and discuss the model's flexibility and its computational efficiency, as well as the possibility of implementing it as a dynamic network of neuron-like units. We show how various easily observable properties of the human memory and thought process can be explained within the framework of this model. These include: reasoning (with efficiency bounds), errors, temporary and permanent loss of information. We are also able to define general learning problems in terms of the new model, such as the language acquisition problem.
Particle Filtering on the Audio Localization Manifold
We present a novel particle filtering algorithm for tracking a moving sound source using a microphone array. If there are N microphones in the array, we track all $N \choose 2$ delays with a single particle filter over time. Since it is known that tracking in high dimensions is rife with difficulties, we instead integrate into our particle filter a model of the low dimensional manifold that these delays lie on. Our manifold model is based off of work on modeling low dimensional manifolds via random projection trees [1]. In addition, we also introduce a new weighting scheme to our particle filtering algorithm based on recent advancements in online learning. We show that our novel TDOA tracking algorithm that integrates a manifold model can greatly outperform standard particle filters on this audio tracking task.
From Frequency to Meaning: Vector Space Models of Semantics
Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field.
Less Regret via Online Conditioning
Streeter, Matthew, McMahan, H. Brendan
In the past few years, online algorithms have emerged as state-of-the-art techniques for solving large-scale machine learning problems [2, 13, 16]. In addition to their simplicity and generality, online algorithms are natural choices for problems where new data is constantly arriving and rapid adaptation is imporant. Compared to the study of convex optimization in the batch (offline) setting, the study of online convex optimization is relatively new. In light of this, it is not surprising that performance-improving techniques that are well known and widely used in the batch setting do not yet have online analogues. In particular, convergence rates in the batch setting can often be dramatically improved through the use of preconditioning. Yet, the online convex optimization literature provides no comparable method for improving regret(the online analogue of convergence rates).
Text Relatedness Based on a Word Thesaurus
Tsatsaronis, G., Varlamis, I., Vazirgiannis, M.
The computation of relatedness between two fragments of text in an automated manner requires taking into account a wide range of factors pertaining to the meaning the two fragments convey, and the pairwise relations between their words. Without doubt, a measure of relatedness between text segments must take into account both the lexical and the semantic relatedness between words. Such a measure that captures well both aspects of text relatedness may help in many tasks, such as text retrieval, classification and clustering. In this paper we present a new approach for measuring the semantic relatedness between words based on their implicit semantic links. The approach exploits only a word thesaurus in order to devise implicit semantic links between words. Based on this approach, we introduce Omiotis, a new measure of semantic relatedness between texts which capitalizes on the word-to-word semantic relatedness measure (SR) and extends it to measure the relatedness between texts. We gradually validate our method: we first evaluate the performance of the semantic relatedness measure between individual words, covering word-to-word similarity and relatedness, synonym identification and word analogy; then, we proceed with evaluating the performance of our method in measuring text-to-text semantic relatedness in two tasks, namely sentence-to-sentence similarity and paraphrase recognition. Experimental evaluation shows that the proposed method outperforms every lexicon-based method of semantic relatedness in the selected tasks and the used data sets, and competes well against corpus-based and hybrid approaches.
A parameter-free hedging algorithm
Chaudhuri, Kamalika, Freund, Yoav, Hsu, Daniel
We study the problem of decision-theoretic online learning (DTOL). Motivated by practical applications, we focus on DTOL when the number of actions is very large. Previous algorithms for learning in this framework have a tunable learning rate parameter, and a barrier to using online-learning in practical applications is that it is not understood how to set this parameter optimally, particularly when the number of actions is large. In this paper, we offer a clean solution by proposing a novel and completely parameter-free algorithm for DTOL. We introduce a new notion of regret, which is more natural for applications with a large number of actions. We show that our algorithm achieves good performance with respect to this new notion of regret; in addition, it also achieves performance close to that of the best bounds achieved by previous algorithms with optimally-tuned parameters, according to previous notions of regret.
Tracking using explanation-based modeling
Chaudhuri, Kamalika, Freund, Yoav, Hsu, Daniel
We study the tracking problem, namely, estimating the hidden state of an object over time, from unreliable and noisy measurements. The standard framework for the tracking problem is the generative framework, which is the basis of solutions such as the Bayesian algorithm and its approximation, the particle filters. However, the problem with these solutions is that they are very sensitive to model mismatches. In this paper, motivated by online learning, we introduce a new framework -- an {\em explanatory} framework -- for tracking. We provide an efficient tracking algorithm for this framework. We provide experimental results comparing our algorithm to the Bayesian algorithm on simulated data. Our experiments show that when there are slight model mismatches, our algorithm vastly outperforms the Bayesian algorithm.
Report on the 22nd International FLAIRS Conference
Guesgen, Hans Werner (Massey University)
The 22nd International Florida Artificial Intelligence Research Society Conference (FLAIRS-22) was held 19th – 21st May 2009 at the Sundial Beach and Golf Resort on Sanibel Island, Florida, USA. It continued a long tradition of FLAIRS conferences, which attract researchers from around the world. The conference featured technical papers, special tracks, and invited speakers. This year’s conference was chaired by Susan Haller, from the State University of New York at Potsdam. Conference program co-chairs were Hans W. Guesgen, from Massey University in New Zealand, and H. Chad Lane, from the University of Southern California. The special tracks were coordinated by Philip McCarthy, from the University of Memphis.