Goto

Collaborating Authors

 Natural Language


Coarse Word-Sense Disambiguation Using Common Sense

AAAI Conferences

Coarse word sense disambiguation (WSD) is an NLP task that is both important and practical: it aims to distinguish senses of a word that have very different meanings, while avoiding the complexity that comes from trying to finely distinguish every possible word sense. Reasoning techniques that make use of common sense information can help to solve the WSD problem by taking word meaning and context into account. We have created a system for coarse word sense disambiguation using blending, a common sense reasoning technique, to combine information from SemCor, WordNet, ConceptNet and Extended WordNet. Within that space, a correct sense is suggested based on the similarity of the ambiguous word to each of its possible word senses. The general blending-based system performed well at the task, achieving an f-score of 80.8\% on the 2007 SemEval Coarse Word Sense Disambiguation task.


The Role of Embodiment and Perspective in Direction-Giving Systems

AAAI Conferences

In this paper, we describe an evaluation of the impact of embodiment, the effect of different kinds of embodiment, and the benefits of different aspects of embodiment, on direction-giving systems. We compared a robot, embodied conversational agent (ECA), and GPS giving directions, when these systems used speaker-perspective gestures, listener-perspective gestures and no gestures. Results demonstrated that, while there was no difference in direction-giving performance between the robot and the ECA, and little difference in participants’perceptions, there was a considerable effect of the type of gesture employed, and several interesting interactions between type of embodiment and aspects of embodiment.


Quantificational Sharpening of Commonsense Knowledge

AAAI Conferences

The KNEXT system produces a large volume of factoids from text, expressing possibilistic general claims such as that 'A PERSON MAY HAVE A HEAD' or 'PEOPLE MAY SAY SOMETHING'. We present a rule-based method to sharpen certain classes of factoids into stronger, quantified claims such as 'ALL OR MOST PERSONS HAVE A HEAD' or 'ALL OR MOST PERSONS AT LEAST OCCASIONALLY SAY SOMETHING' -- statements strong enough to be used for inference. The judgement of whether and how to sharpen a factoid depends on the semantic categories of the terms involved and the strength of the quantifier depends on how strongly the subject is associated with what is predicated of it. We provide an initial assessment of the quality of such automatic strengthening of knowledge and examples of reasoning with multiple sharpened premises.


Introduction to the iDian

arXiv.org Artificial Intelligence

The iDian (previously named as the Operation Agent System) is a framework designed to enable computer users to operate software in natural language. Distinct from current speech-recognition systems, our solution supports format-free combinations of orders, and is open to both developers and customers. We used a multi-layer structure to build the entire framework, approached rule-based natural language processing, and implemented demos narrowing down to Windows, text-editing and a few other applications. This essay will firstly give an overview of the entire system, and then scrutinize the functions and structure of the system, and finally discuss the prospective de-velopment, esp. on-line interaction functions.


M\'{e}todos para la Selecci\'{o}n y el Ajuste de Caracter\'{i}sticas en el Problema de la Detecci\'{o}n de Spam

arXiv.org Artificial Intelligence

The email is used daily by millions of people to communicate around the globe and it is a mission-critical application for many businesses. Over the last decade, unsolicited bulk email has become a major problem for email users. An overwhelming amount of spam is flowing into users' mailboxes daily. In 2004, an estimated 62% of all email was attributed to spam. Spam is not only frustrating for most email users, it strains the IT infrastructure of organizations and costs businesses billions of dollars in lost productivity. In recent years, spam has evolved from an annoyance into a serious security threat, and is now a prime medium for phishing of sensitive information, as well the spread of malicious software. This work presents a first approach to attack the spam problem. We propose an algorithm that will improve a classifier's results by adjusting its training set data. It improves the document's vocabulary representation by detecting good topic descriptors and discriminators.


Online Multiple Kernel Learning for Structured Prediction

arXiv.org Machine Learning

Despite the recent progress towards efficient multiple kernel learning (MKL), the structured output case remains an open research front. Current approaches involve repeatedly solving a batch learning problem, which makes them inadequate for large scale scenarios. We propose a new family of online proximal algorithms for MKL (as well as for group-lasso and variants thereof), which overcomes that drawback. We show regret, convergence, and generalization bounds for the proposed method. Experiments on handwriting recognition and dependency parsing testify for the successfulness of the approach.


Introduction to the Special Issue on Question Answering

AI Magazine

This special issue issue of AI Magazine presents six articles on some of the most interesting question answering systems in development today. Included are articles on Project, the Semantic Research, Watson, True Knowledge, and TextRunner (University of Washington's clever use of statistical NL techniques to answer questions across the open web).


True Knowledge: Open-Domain Question Answering Using Structured Knowledge and Inference

AI Magazine

This article gives a detailed description of True Knowledge: a commercial, open-domain question answering platform. The system combines a large and growing structured knowledge base of common sense, factual and lexical knowledge; a natural language translation system that turns user questions into internal language-independent queries and an inference system that can answer those queries using both directly represented and inferred knowledge. The system is live and answers millions of questions per month asked by internet users.


Project Halo Update--Progress Toward Digital Aristotle

AI Magazine

In the winter, 2004 issue of AI Magazine, we reported Vulcan Inc.'s first step toward creating a question-answering system called "Digital Aristotle." The goal of that first step was to assess the state of the art in applied Knowledge Representation and Reasoning (KRR) by asking AI experts to represent 70 pages from the advanced placement (AP) chemistry syllabus and to deliver knowledge-based systems capable of answering questions from that syllabus. This paper reports the next step toward realizing a Digital Aristotle: we present the design and evaluation results for a system called AURA, which enables domain experts in physics, chemistry, and biology to author a knowledge base and that then allows a different set of users to ask novel questions against that knowledge base. These results represent a substantial advance over what we reported in 2004, both in the breadth of covered subjects and in the provision of sophisticated technologies in knowledge representation and reasoning, natural language processing, and question answering to domain experts and novice users.


Adapting Open Information Extraction to Domain-Specific Relations

AI Magazine

Information extraction (IE) can identify a set of relations from free text to support question answering (QA). Until recently, IE systems were domain-specific and needed a combination of manual engineering and supervised learning to adapt to each target domain. A new paradigm, Open IE operates on large text corpora without any manual tagging of relations, and indeed without any pre-specified relations. We explore the steps needed to adapt Open IE to a domain-specific ontology and demonstrate our approach of mapping domain-independent tuples to an ontology using domains from DARPA's Machine Reading Project.