Government
Mining Meaning from Wikipedia
Medelyan, Olena, Milne, David, Legg, Catherine, Witten, Ian H.
Wikipedia is a goldmine of information; not just for its many readers, but also for the growing community of researchers who recognize it as a resource of exceptional scale and utility. It represents a vast investment of manual effort and judgment: a huge, constantly evolving tapestry of concepts and relations that is being applied to a host of tasks. This article provides a comprehensive description of this work. It focuses on research that extracts and makes use of the concepts, relations, facts and descriptions found in Wikipedia, and organizes the work into four broad categories: applying Wikipedia to natural language processing; using it to facilitate information retrieval and information extraction; and as a resource for ontology building. The article addresses how Wikipedia is being used as is, how it is being improved and adapted, and how it is being combined with other structures to create entirely new resources. We identify the research groups and individuals involved, and how their work has developed in the last few years. We provide a comprehensive list of the open-source software they have produced.
Inferring Shallow-Transfer Machine Translation Rules from Small Parallel Corpora
Sánchez-Martínez, F., Forcada, M. L.
This paper describes a method for the automatic inference of structural transfer rules to be used in a shallow-transfer machine translation (MT) system from small parallel corpora. The structural transfer rules are based on alignment templates, like those used in statistical MT. Alignment templates are extracted from sentence-aligned parallel corpora and extended with a set of restrictions which are derived from the bilingual dictionary of the MT system and control their application as transfer rules. The experiments conducted using three different language pairs in the free/open-source MT platform Apertium show that translation quality is improved as compared to word-for-word translation (when no transfer rules are used), and that the resulting translation quality is close to that obtained using hand-coded transfer rules. The method we present is entirely unsupervised and benefits from information in the rest of modules of the MT system in which the inferred rules are applied.
Wikipedia-based Semantic Interpretation for Natural Language Processing
Gabrilovich, E., Markovitch, S.
Adequate representation of natural language semantics requires access to vast amounts of common sense and domain-specific world knowledge. Prior work in the field was based on purely statistical techniques that did not make use of background knowledge, on limited lexicographic knowledge bases such as WordNet, or on huge manual efforts such as the CYC project. Here we propose a novel method, called Explicit Semantic Analysis (ESA), for fine-grained semantic interpretation of unrestricted natural language texts. Our method represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence. We explicitly represent the meaning of any text in terms of Wikipedia-based concepts. We evaluate the effectiveness of our method on text categorization and on computing the degree of semantic relatedness between fragments of natural language text. Using ESA results in significant improvements over the previous state of the art in both tasks. Importantly, due to the use of natural concepts, the ESA model is easy to explain to human users.
Identification of Pleonastic It Using the Web
Li, Y., Musilek, P., Reformat, M., Wyard-Scott, L.
In a significant minority of cases, certain pronouns, especially the pronoun it, can be used without referring to any specific entity. This phenomenon of pleonastic pronoun usage poses serious problems for systems aiming at even a shallow understanding of natural language texts. In this paper, a novel approach is proposed to identify such uses of it: the extrapositional cases are identified using a series of queries against the web, and the cleft cases are identified using a simple set of syntactic rules. The system is evaluated with four sets of news articles containing 679 extrapositional cases as well as 78 cleft constructs. The identification results are comparable to those obtained by human efforts.
AAAI-08 and IAAI-08 Conferences Provide Focal Point for AI
Hedberg, Sara Reese (Emergent, In.c)
This year's conferences were held in Perhaps one of the true litmus tests of any conference is the caliber of the invited speakers. Sensibility: Sentiment Analysis, Opinion and research manager at Microsoft Research) The distinguished Robert S. Englemore Mining, and the Computational who gave his AAAI presidential Memorial Award Lecture was delivered Treatment of Subjective Language"), address, "Artificial Intelligence in the by Kenneth Ford (Florida Institute while Seth C. Goldstein (Carnegie Open World." Mel lon University) discussed revolutionary Chris Urmson (Carnegie Mellon In his lecture, "Toward Cognitive work in self-reconfiguring programmable University), a leading member of the Prostheses," Ford discussed human-centered matter composed of ensembles of submillimeter robots in his DARPA Urban Grand Challenge winning computing to amplify talk, "Realizing Claytronics: A Challenge team, described the race and winning human cognition and perception. Instead of the learning for network analysis in ("From Images to Scenes: Using popular competition, which has his talk, "Making Sense of Complex Lots of Data to Infer Geometric, Photometric, pushed the envelope of mobile robotics Networks." David Haussler (University and Semantic Scene Properties since its inception, this year was of California, Santa Cruz) traced the from a Single Image"), and Lillian host to a Robot Workshop and Exhibition.
Unsupervised Methods for Determining Object and Relation Synonyms on the Web
The task of identifying synonymous relations and objects, or synonym resolution, is critical for high-quality information extraction. This paper investigates synonym resolution in the context of unsupervised information extraction, where neither hand-tagged training examples nor domain knowledge is available. The paper presents a scalable, fully-implemented system that runs in O(KN log N) time in the number of extractions, N, and the maximum number of synonyms per word, K. The system, called Resolver , introduces a probabilistic relational model for predicting whether two strings are co-referential based on the similarity of the assertions containing them. On a set of two million assertions extracted from the Web, Resolver resolves objects with 78% precision and 68% recall, and resolves relations with 90% precision and 35% recall. Several variations of resolver's probabilistic model are explored, and experiments demonstrate that under appropriate conditions these variations can improve F1 by 5%. An extension to the basic Resolver system allows it to handle polysemous names with 97% precision and 95% recall on a data set from the TREC corpus.
The Seventeenth Annual AAAI Robot Exhibition and Manipulation and Mobility Workshop
Anderson, Monica (The University of Alabama) | Jenkins, Odest Chadwicke (Brown University) | Oh, Paul (Drexel University)
Moving toward true robot autonomy may require new paradigms, hardware, and ways of thinking. The goal of the AAAI 2008 Workshop on Mobility and Manipulation was not only to demonstrate current research successes to the AAAI community but also to road-map future mobility and manipulation challenges that create synergies between artificial intelligence and robotics. The half-day workshop included both a session on the exhibits and a panel discussion. The panel consisted of five prominent researchers who led a discussion of future directions for mobility and manipulation research. Andrew Ng of Stanford University (along with students Ashutosh Saxena and Ellen Klingbeil) focuses on opening arbitrary doors through learning a few visual keypoints, such as the location and type of door handle.
The 2008 Scheduling and Planning Applications Workshop (SPARK'08)
Castillo, Luis (University of Granada) | Cortellessa, Gabriella (ISTC-CNR) | Yorke-Smith, Neil (SRI International)
SPARK'08 was the first edition of a workshop series designed to provide a stable, longterm forum where researchers could discuss Workshop (SPARK) was established to help address this issue. Building on precursory events, SPARK'08 was the first workshop designed Scheduling (ICAPS-08) held in Sydney, Australia, in September 2008. Like its immediate predecessor (the ICAPS'07 Workshop on Moving Planning and Scheduling Systems), the 2008 SPARK workshop was collocated with the International Conference on Automated Planning and Scheduling (ICAPS), a premier forum for research in AI planning and scheduling, and the International Conference on Principles and Practice of Constraint Programming (CP). A handful of outstanding application-oriented papers are presented each year at the ICAPS conference. Time and again, in invited talks and in open microphone discussion sessions such as ICAPS's Festivus (where conference participants air their grievances in an open and entertaining way), researchers have lamented the small number of applications papers accepted at conferences such as ICAPS, CP, and the AAAI Conference on Artificial Intelligence.
AAAI 2008 Workshop Reports
Anand, Sarabjot Singh (University of Warwick) | Bunescu, Razvan C. (Ohio University) | Carvalho, Vitor R. (Microsoft Live Labs) | Chomicki, Jan (University of Buffalo) | Conitzer, Vincent (Duke University) | Cox, Michael T. (BBN Technologies) | Dignum, Virginia (Utrecht University) | Dodds, Zachary (Harvey Mudd College) | Dredze, Mark (University of Pennsylvania) | Furcy, David (University of Wisconsin Oshkosh) | Gabrilovich, Evgeniy (Yahoo! Research) | Göker, Mehmet H. (PricewaterhouseCoopers) | Guesgen, Hans Werner (Massey University) | Hirsh, Haym (Rutgers University) | Jannach, Dietmar (Dortmund University of Technology) | Junker, Ulrich (ILOG) | Ketter, Wolfgang (Erasmus University) | Kobsa, Alfred (University of California, Irvine) | Koenig, Sven (University of Southern California) | Lau, Tessa (IBM Almaden Research Center) | Lewis, Lundy (Southern New Hampshire University) | Matson, Eric (Purdue University) | Metzler, Ted (Oklahoma City University) | Mihalcea, Rada (University of North Texas) | Mobasher, Bamshad (DePaul University) | Pineau, Joelle (McGill University) | Poupart, Pascal (University of Waterloo) | Raja, Anita (University of North Carolina at Charlotte) | Ruml, Wheeler (University of New Hampshire) | Sadeh, Norman M. (Carnegie Mellon University) | Shani, Guy (Microsoft Research) | Shapiro, Daniel (Applied Reactivity, Inc.) | Smith, Trey (Carnegie Mellon University West) | Taylor, Matthew E. (University of Southern California) | Wagstaff, Kiri (Jet Propulsion Laboratory) | Walsh, William (CombineNet) | Zhou, Ron (Palo Alto Research Center)
AAAI was pleased to present the AAAI-08 Workshop Program, held Sunday and Monday, July 13–14, in Chicago, Illinois, USA. The program included the following 15 workshops: Advancements in POMDP Solvers; AI Education Workshop Colloquium; Coordination, Organizations, Institutions, and Norms in Agent Systems, Enhanced Messaging; Human Implications of Human-Robot Interaction; Intelligent Techniques for Web Personalization and Recommender Systems; Metareasoning: Thinking about Thinking; Multidisciplinary Workshop on Advances in Preference Handling; Search in Artificial Intelligence and Robotics; Spatial and Temporal Reasoning; Trading Agent Design and Analysis; Transfer Learning for Complex Tasks; What Went Wrong and Why: Lessons from AI Research and Applications; and Wikipedia and Artificial Intelligence: An Evolving Synergy.