South America
Mining Meaning from Wikipedia
Medelyan, Olena, Milne, David, Legg, Catherine, Witten, Ian H.
Wikipedia is a goldmine of information; not just for its many readers, but also for the growing community of researchers who recognize it as a resource of exceptional scale and utility. It represents a vast investment of manual effort and judgment: a huge, constantly evolving tapestry of concepts and relations that is being applied to a host of tasks. This article provides a comprehensive description of this work. It focuses on research that extracts and makes use of the concepts, relations, facts and descriptions found in Wikipedia, and organizes the work into four broad categories: applying Wikipedia to natural language processing; using it to facilitate information retrieval and information extraction; and as a resource for ontology building. The article addresses how Wikipedia is being used as is, how it is being improved and adapted, and how it is being combined with other structures to create entirely new resources. We identify the research groups and individuals involved, and how their work has developed in the last few years. We provide a comprehensive list of the open-source software they have produced.
Inferring Shallow-Transfer Machine Translation Rules from Small Parallel Corpora
Sánchez-Martínez, F., Forcada, M. L.
This paper describes a method for the automatic inference of structural transfer rules to be used in a shallow-transfer machine translation (MT) system from small parallel corpora. The structural transfer rules are based on alignment templates, like those used in statistical MT. Alignment templates are extracted from sentence-aligned parallel corpora and extended with a set of restrictions which are derived from the bilingual dictionary of the MT system and control their application as transfer rules. The experiments conducted using three different language pairs in the free/open-source MT platform Apertium show that translation quality is improved as compared to word-for-word translation (when no transfer rules are used), and that the resulting translation quality is close to that obtained using hand-coded transfer rules. The method we present is entirely unsupervised and benefits from information in the rest of modules of the MT system in which the inferred rules are applied.
Exploiting Single-Cycle Symmetries in Continuous Constraint Problems
Ruiz de Angulo, V., Torras, C.
Symmetries in discrete constraint satisfaction problems have been explored and exploited in the last years, but symmetries in continuous constraint problems have not received the same attention. Here we focus on permutations of the variables consisting of one single cycle. We propose a procedure that takes advantage of these symmetries by interacting with a continuous constraint solver without interfering with it. A key concept in this procedure are the classes of symmetric boxes formed by bisecting a n-dimensional cube at the same point in all dimensions at the same time. We analyze these classes and quantify them as a function of the cube dimensionality. Moreover, we propose a simple algorithm to generate the representatives of all these classes for any number of variables at very high rates. A problem example from the chemical field and the cyclic n-roots problem are used to show the performance of the approach in practice.
Wikipedia-based Semantic Interpretation for Natural Language Processing
Gabrilovich, E., Markovitch, S.
Adequate representation of natural language semantics requires access to vast amounts of common sense and domain-specific world knowledge. Prior work in the field was based on purely statistical techniques that did not make use of background knowledge, on limited lexicographic knowledge bases such as WordNet, or on huge manual efforts such as the CYC project. Here we propose a novel method, called Explicit Semantic Analysis (ESA), for fine-grained semantic interpretation of unrestricted natural language texts. Our method represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence. We explicitly represent the meaning of any text in terms of Wikipedia-based concepts. We evaluate the effectiveness of our method on text categorization and on computing the degree of semantic relatedness between fragments of natural language text. Using ESA results in significant improvements over the previous state of the art in both tasks. Importantly, due to the use of natural concepts, the ESA model is easy to explain to human users.
AAAI-08 and IAAI-08 Conferences Provide Focal Point for AI
Hedberg, Sara Reese (Emergent, In.c)
This year's conferences were held in Perhaps one of the true litmus tests of any conference is the caliber of the invited speakers. Sensibility: Sentiment Analysis, Opinion and research manager at Microsoft Research) The distinguished Robert S. Englemore Mining, and the Computational who gave his AAAI presidential Memorial Award Lecture was delivered Treatment of Subjective Language"), address, "Artificial Intelligence in the by Kenneth Ford (Florida Institute while Seth C. Goldstein (Carnegie Open World." Mel lon University) discussed revolutionary Chris Urmson (Carnegie Mellon In his lecture, "Toward Cognitive work in self-reconfiguring programmable University), a leading member of the Prostheses," Ford discussed human-centered matter composed of ensembles of submillimeter robots in his DARPA Urban Grand Challenge winning computing to amplify talk, "Realizing Claytronics: A Challenge team, described the race and winning human cognition and perception. Instead of the learning for network analysis in ("From Images to Scenes: Using popular competition, which has his talk, "Making Sense of Complex Lots of Data to Infer Geometric, Photometric, pushed the envelope of mobile robotics Networks." David Haussler (University and Semantic Scene Properties since its inception, this year was of California, Santa Cruz) traced the from a Single Image"), and Lillian host to a Robot Workshop and Exhibition.
Efficiently Learning a Detection Cascade with Sparse Eigenvectors
Shen, Chunhua, Paisitkriangkrai, Sakrapee, Zhang, Jian
In this work, we first show that feature selection methods other than boosting can also be used for training an efficient object detector. In particular, we introduce Greedy Sparse Linear Discriminant Analysis (GSLDA) \cite{Moghaddam2007Fast} for its conceptual simplicity and computational efficiency; and slightly better detection performance is achieved compared with \cite{Viola2004Robust}. Moreover, we propose a new technique, termed Boosted Greedy Sparse Linear Discriminant Analysis (BGSLDA), to efficiently train a detection cascade. BGSLDA exploits the sample re-weighting property of boosting and the class-separability criterion of GSLDA.
Learning with Tree-Averaged Densities and Distributions
We utilize the ensemble of trees framework, a tractable mixture over superexponential number of tree-structured distributions [1], to develop a new model for multivariate density estimation. The model is based on a construction of treestructured copulas - multivariate distributions with uniform on [0, 1] marginals.
Subspace-Based Face Recognition in Analog VLSI
Carvajal, Gonzalo, Valenzuela, Waldo, Figueroa, Miguel
We describe an analog-VLSI neural network for face recognition based on subspace methods. The system uses a dimensionality-reduction network whose coefficients can be either programmed or learned on-chip to perform PCA, or programmed to perform LDA. A second network with userprogrammed coefficients performs classification with Manhattan distances. The system uses on-chip compensation techniques to reduce the effects of device mismatch. Using the ORL database with 12x12-pixel images, our circuit achieves up to 85% classification performance (98% of an equivalent software implementation).