Country
Applying Part-of-Seech Enhanced LSA to Automatic Essay Grading
Kakkonen, Tuomo, Myller, Niko, Sutinen, Erkki
Latent Semantic Analysis (LSA) is a widely used Information Retrieval method based on "bag-of-words" assumption. However, according to general conception, syntax plays a role in representing meaning of sentences. Thus, enhancing LSA with part-of-speech (POS) information to capture the context of word occurrences appears to be theoretically feasible extension. The approach is tested empirically on a automatic essay grading system using LSA for document similarity comparisons. A comparison on several POS-enhanced LSA models is reported. Our findings show that the addition of contextual information in the form of POS tags can raise the accuracy of the LSA-based scoring models up to 10.77 per cent.
DepAnn - An Annotation Tool for Dependency Treebanks
DepAnn is an interactive annotation tool for dependency treebanks, providing both graphical and text-based annotation interfaces. The tool is aimed for semi-automatic creation of treebanks. It aids the manual inspection and correction of automatically created parses, making the annotation process faster and less error-prone. A novel feature of the tool is that it enables the user to view outputs from several parsers as the basis for creating the final tree to be saved to the treebank. DepAnn uses TIGER-XML, an XML-based general encoding format for both, representing the parser outputs and saving the annotated treebank. The tool includes an automatic consistency checker for sentence structures. In addition, the tool enables users to build structures manually, add comments on the annotations, modify the tagsets, and mark sentences for further revision.
Characterizing Solution Concepts in Games Using Knowledge-Based Programs
Halpern, Joseph Y., Moses, Yoram
We show how solution concepts in games such as Nash equilibrium, correlated equilibrium, rationalizability, and sequential equilibrium can be given a uniform definition in terms of \emph{knowledge-based programs}. Intuitively, all solution concepts are implementations of two knowledge-based programs, one appropriate for games represented in normal form, the other for games represented in extensive form. These knowledge-based programs can be viewed as embodying rationality. The representation works even if (a) information sets do not capture an agent's knowledge, (b) uncertainty is not represented by probability, or (c) the underlying game is not common knowledge.
On Geometric Algebra representation of Binary Spatter Codes
Aerts, Diederik, Czachor, Marek, De Moor, Bart
Distributed representation is a way of representing information in a pattern of activation over a set of neurons, in which each concept is represented by activation over multiple neuro ns, and each neuron participates in the representation of multiple concepts [1]. Examples of distributed representat ions include Recursive Auto-Associative Memory (RAAM) [2], Tensor Product Representations [3], Holographic Reduc ed Representations (HRRs) [4, 5], and Binary Spatter Codes (BSC) [6, 7, 8]. BSC is a powerful and simple method of representing hierarchical st ructures in connectionist systems and may be regarded as a binary version of HRRs. Yet, BSC has some drawback s associated with the representation of chunking. This is why different versions of BSC can be found in the literature.
Comparing Typical Opening Move Choices Made by Humans and Chess Engines
The opening book is an important component of a chess engine, and thus computer chess programmers have been developing automated methods to improve the quality of their books. For chess, which has a very rich opening theory, large databases of high-quality games can be used as the basis of an opening book, from which statistics relating to move choices from given positions can be collected. In order to find out whether the opening books used by modern chess engines in machine versus machine competitions are ``comparable'' to those used by chess players in human versus human competitions, we carried out analysis on 26 test positions using statistics from two opening books one compiled from humans' games and the other from machines' games. Our analysis using several nonparametric measures, shows that, overall, there is a strong association between humans' and machines' choices of opening moves when using a book to guide their choices.
Farthest-Point Heuristic based Initialization Methods for K-Modes Clustering
The k -modes algorithm [1] extends the k -means paradigm to cluster categorical data by using (1) a simple matching dissimilarity measure for categorical objects, (2) modes instead of means for clusters, and (3) a frequency-based method to update modes in the k -means fashion to minimize the cost function of clustering. Because the k -modes algorithm uses the same clustering process as k -means, it preserves the efficiency of the k -means algorithm. Although the k -modes algorithm is very efficient, it suffers the problem that the clustering results are sensitive to the selection of the initial points. Hence, a better initial points selection procedure would improve the reliability and accuracy of clustering results. To that end, an iterative initial-points refinement algorithm for k -modes clustering has been presented in [2]. As shown in [2], the new initialization pr ocedure greatly improves the reliability and accuracy of final clustering results. Despite the su ccess of Ref. [2], the following observations motivate us to further pursue other alternative initialization methods.
The Application of Fuzzy Logic to the Construction of the Ranking Function of Information Retrieval Systems
The quality of the ranking function is an important factor that determines the quality of the Information Retrieval system. Each document is assigned a score by the ranking function; the score indicates the likelihood of relevance of the document given a query. In the vector space model, the ranking function is defined by a mathematic expression. We propose a fuzzy logic (FL) approach to defining the ranking function. FL provides a convenient way of converting knowledge expressed in a natural language into fuzzy logic rules. The resulting ranking function could be easily viewed, extended, and verified: * if (tf is high) and (idf is high) > (relevance is high); * if (overlap is high) > (relevance is high). By using above FL rules, we are able to achieve performance approximately equal to the state of the art search engine Apache Lucene (deltaP10 +0.92%; deltaMAP -0.1%). The fuzzy logic approach allows combining the logic-based model with the vector model. The resulting model possesses simplicity and formalism of the logic based model, and the flexibility and performance of the vector model.
Mining Generalized Graph Patterns based on User Examples
There has been a lot of recent interest in mining patterns from graphs. Often, the exact structure of the patterns of interest is not known. This happens, for example, when molecular structures are mined to discover fragments useful as features in chemical compound classification task, or when web sites are mined to discover sets of web pages representing logical documents. Such patterns are often generated from a few small subgraphs (cores), according to certain generalization rules (GRs). We call such patterns "generalized patterns"(GPs). While being structurally different, GPs often perform the same function in the network. Previously proposed approaches to mining GPs either assumed that the cores and the GRs are given, or that all interesting GPs are frequent. These are strong assumptions, which often do not hold in practical applications. In this paper, we propose an approach to mining GPs that is free from the above assumptions. Given a small number of GPs selected by the user, our algorithm discovers all GPs similar to the user examples. First, a machine learning-style approach is used to find the cores. Second, generalizations of the cores in the graph are computed to identify GPs. Evaluation on synthetic data, generated using real cores and GRs from biological and web domains, demonstrates effectiveness of our approach.
ECA-LP / ECA-RuleML: A Homogeneous Event-Condition-Action Logic Programming Language
Event-driven reactive functionalities are an urgent need in nowadays distributed service-oriented applications and (Semantic) Web-based environments. An important problem to be addressed is how to correctly and efficiently capture and process the event-based behavioral, reactive logic represented as ECA rules in combination with other conditional decision logic which is represented as derivation rules. In this paper we elaborate on a homogeneous integration approach which combines derivation rules, reaction rules (ECA rules) and other rule types such as integrity constraint into the general framework of logic programming. The developed ECA-LP language provides expressive features such as ID-based updates with support for external and self-updates of the intensional and extensional knowledge, transactions including integrity testing and an event algebra to define and process complex events and actions based on a novel interval-based Event Calculus variant.
Modular self-organization
This paper addresses the problem of building a long-living a utonomous agent; by long-living, we mean that this agent has a large number of relatively complex and varying tasks to perform. Biology sugge sts some ideas about the way animals deal with a variety of tasks: brains are made of specialized and complementary areas/modules; skills are spre ad over modules. On the one hand, distributing functions and representation s has immediate advantages: parallel processing implies reaction speed-u p; a relative independence between modules gives more robustness. Both prope rties might clearly increase the agent's efficiency. On the other hand, th e fact of distributing a system raises a fundamental issue: how does the o rganization process of the modules happen during the life-time? 1 There has been much research about the design of modular inte lligent architectures (see for instance [15] [5] [1] [7]). It is neve rtheless very often the (human) designer who decides the way modules are connect ed to each other and how they behave with respect to the others.