Asia
Goal-Oriented Knowledge Collection
Kuo, Yen-Ling (National Taiwan University) | Hsu, Jane Yung-jen (National Taiwan University)
Games with A Purpose (GWAP) has been demonstrated to be efficient in collecting large amount of knowledge from online users, e.g. Verbosity and Virtual Pet game. However, its effectiveness in knowledge base (KB) construction has not been explored in previous research. This paper examines the knowledge collected in the Vir- tual Pet game and presents an approach to collect more knowledge driven by the existing relations in KB. In this paper, goal-oriented knowledge collection successfully draws 10572 answers for the "food” domain. The answers are verified by online voting to show that 92.07% of them are good sentences and 95.89% of them are new sentences. This result is a significant improvement over the original Virtual Pet game, with 80.58% good sentences and 67.56% weekly new information.
Coarse Word-Sense Disambiguation Using Common Sense
Havasi, Catherine (MIT Media Lab) | Speer, Robert (MIT Media Lab) | Pustejovsky, James (Brandeis University)
Coarse word sense disambiguation (WSD) is an NLP task that is both important and practical: it aims to distinguish senses of a word that have very different meanings, while avoiding the complexity that comes from trying to finely distinguish every possible word sense. Reasoning techniques that make use of common sense information can help to solve the WSD problem by taking word meaning and context into account. We have created a system for coarse word sense disambiguation using blending, a common sense reasoning technique, to combine information from SemCor, WordNet, ConceptNet and Extended WordNet. Within that space, a correct sense is suggested based on the similarity of the ambiguous word to each of its possible word senses. The general blending-based system performed well at the task, achieving an f-score of 80.8\% on the 2007 SemEval Coarse Word Sense Disambiguation task.
Cross-Domain Scruffy Inference
Arnold, Kenneth Charles (Massachusetts Institute of Technology) | Lieberman, Henry (Massachusetts Institute of Technology)
Reasoning about Commonsense knowledge poses many problems that traditional logical inference doesn't handle well. Among these is cross-domain inference: how to draw on multiple independently produced knowledge bases. Since knowledge bases may not have the same vocabulary, level of detail, or accuracy, that inference should be "scruffy." The AnalogySpace technique showed that a factored inference approach is useful for approximate reasoning over noisy knowledge bases like ConceptNet. A straightforward extension of factored inference to multiple datasets, called Blending, has seen productive use for commonsense reasoning. We show that Blending is a kind of Collective Matrix Factorization (CMF): the factorization spreads out the prediction loss between each dataset. We then show that blending additional data causes the singular vectors to rotate between the two domains, which enables cross-domain inference. We show, in a simplified example, that the maximum interaction occurs when the magnitudes (as defined by the largest singular values) of the two matrices are equal, confirming previous empirical conclusions. Finally, we describe and mathematically justify Bridge Blending, which facilitates inference between datasets by specifically adding knowledge that "bridges" between the two, in terms of CMF.
How to Support Meta-Cognitive Skills for Finding and Correcting Errors?
Melis, Erica (German Research Center for Artificial Intelligence (DFKI)) | Sander, Andreas (University of Saarlandes) | Tsovaltzi, Dimitra (German Research Center for Artificial Intelligence (DFKI))
Meta-cognitive skills to be developed in learning for the 21st century is the detection and correction of errors in solutions. These meta-cognitive skills can help to detect errors the learner has made her/himself as well as errors others have made. Our investigations in learning from errors have the ultimate goal to adapt the selection and presentation to the learner so that he/she can better learn from erroneous examples others have made. In our experiments we found that (1) erroneous examples with help provision can promote students skill of find errors, (2) the benefit from erroneous examples depends on the relation between the student's level and the example's difficulty, i.e. if the student is prepared for the problem, (3) for many students it is very difficult to correct errors.
Significance of Classification Techniques in Prediction of Learning Disabilities
Balakrishnan, Julie M. David And Kannan
The aim of this study is to show the importance of two classification techniques, viz. decision tree and clustering, in prediction of learning disabilities (LD) of school-age children. LDs affect about 10 percent of all children enrolled in schools. The problems of children with specific learning disabilities have been a cause of concern to parents and teachers for some time. Decision trees and clustering are powerful and popular tools used for classification and prediction in Data mining. Different rules extracted from the decision tree are used for prediction of learning disabilities. Clustering is the assignment of a set of observations into subsets, called clusters, which are useful in finding the different signs and symptoms (attributes) present in the LD affected child. In this paper, J48 algorithm is used for constructing the decision tree and K-means algorithm is used for creating the clusters. By applying these classification techniques, LD in any child can be identified.
CUR from a Sparse Optimization Viewpoint
Bien, Jacob, Xu, Ya, Mahoney, Michael W.
The CUR decomposition provides an approximation of a matrix $X$ that has low reconstruction error and that is sparse in the sense that the resulting approximation lies in the span of only a few columns of $X$. In this regard, it appears to be similar to many sparse PCA methods. However, CUR takes a randomized algorithmic approach, whereas most sparse PCA methods are framed as convex optimization problems. In this paper, we try to understand CUR from a sparse optimization viewpoint. We show that CUR is implicitly optimizing a sparse regression objective and, furthermore, cannot be directly cast as a sparse PCA method. We also observe that the sparsity attained by CUR possesses an interesting structure, which leads us to formulate a sparse PCA method that achieves a CUR-like sparsity.
A Very Fast Algorithm for Matrix Factorization
Nikulin, Vladimir, Huang, Tian-Hsiang, Ng, Shu-Kay, Rathnayake, Suren I, McLachlan, Geoffrey J
We present a very fast algorithm for general matrix factorization of a data matrix for use in the statistical analysis of high-dimensional data via latent factors. Such data are prevalent across many application areas and generate an ever-increasing demand for methods of dimension reduction in order to undertake the statistical analysis of interest. Our algorithm uses a gradient-based approach which can be used with an arbitrary loss function provided the latter is differentiable. The speed and effectiveness of our algorithm for dimension reduction is demonstrated in the context of supervised classification of some real high-dimensional data sets from the bioinformatics literature.
Theta*: Any-Angle Path Planning on Grids
Daniel, K., Nash, A., Koenig, S., Felner, A.
Grids with blocked and unblocked cells are often used to represent terrain in robotics and video games. However, paths formed by grid edges can be longer than true shortest paths in the terrain since their headings are artificially constrained. We present two new correct and complete any-angle path-planning algorithms that avoid this shortcoming. Basic Theta* and Angle-Propagation Theta* are both variants of A* that propagate information along grid edges without constraining paths to grid edges. Basic Theta* is simple to understand and implement, fast and finds short paths. However, it is not guaranteed to find true shortest paths. Angle-Propagation Theta* achieves a better worst-case complexity per vertex expansion than Basic Theta* by propagating angle ranges when it expands vertices, but is more complex, not as fast and finds slightly longer paths. We refer to Basic Theta* and Angle-Propagation Theta* collectively as Theta*. Theta* has unique properties, which we analyze in detail. We show experimentally that it finds shorter paths than both A* with post-smoothed paths and Field D* (the only other version of A* we know of that propagates information along grid edges without constraining paths to grid edges) with a runtime comparable to that of A* on grids. Finally, we extend Theta* to grids that contain unblocked cells with non-uniform traversal costs and introduce variants of Theta* which provide different tradeoffs between path length and runtime.
Kalman Temporal Differences
Because reinforcement learning suffers from a lack of scalability, online value (and Q-) function approximation has received increasing interest this last decade. This contribution introduces a novel approximation scheme, namely the Kalman Temporal Differences (KTD) framework, that exhibits the following features: sample-efficiency, non-linear approximation, non-stationarity handling and uncertainty management. A first KTD-based algorithm is provided for deterministic Markov Decision Processes (MDP) which produces biased estimates in the case of stochastic transitions. Than the eXtended KTD framework (XKTD), solving stochastic MDP, is described. Convergence is analyzed for special cases for both deterministic and stochastic transitions. Related algorithms are experimented on classical benchmarks. They compare favorably to the state of the art while exhibiting the announced features.
On the Foundations of Adversarial Single-Class Classification
El-Yaniv, Ran, Nisenson, Mordechai
Motivated by authentication, intrusion and spam detection applications we consider single-class classification (SCC) as a two-person game between the learner and an adversary. In this game the learner has a sample from a target distribution and the goal is to construct a classifier capable of distinguishing observations from the target distribution from observations emitted from an unknown other distribution. The ideal SCC classifier must guarantee a given tolerance for the false-positive error (false alarm rate) while minimizing the false negative error (intruder pass rate). Viewing SCC as a two-person zero-sum game we identify both deterministic and randomized optimal classification strategies for different game variants. We demonstrate that randomized classification can provide a significant advantage. In the deterministic setting we show how to reduce SCC to two-class classification where in the two-class problem the other class is a synthetically generated distribution. We provide an efficient and practical algorithm for constructing and solving the two class problem. The algorithm distinguishes low density regions of the target distribution and is shown to be consistent.