AITopics

Dialogue-based intelligent tutoring systems use speech act classifiers to categorize student input into answers, questions, and other speech acts. Previous work has primarily focused on question classification. In this paper, we present a complimentary speech act classifier that focuses primarily on non-questions, which was developed using machine learning techniques. Our results show that an effective speech act classifier can be developed directly from labeled data using decision trees.

classification, classifier, dialogue act, (15 more...)

Country:

Asia > India > Karnataka > Bengaluru (0.04)
North America > United States > Tennessee > Shelby County > Memphis (0.04)
North America > United States > New York (0.04)
(7 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Technology > Educational Software > Computer Based Training (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Given Bilingual Terminology in Statistical Machine Translation: MWE-Sensitve Word Alignment and Hierarchical Pitman-Yor Process-Based Translation Model Smoothing

Okita, Tsuyoshi (Dublin City University) | Way, Andy (Dublin City University)

This paper considers a scenario when we are given almost perfect knowledge about bilingual terminology in terms of a test corpus in Statistical Machine Translation (SMT). When the given terminology is part of a training corpus, one natural strategy in SMT is to use the trained translation model ignoring the given terminology. Then, two questions arises here. 1) Can a word aligner capture the given terminology? This is since even if the terminology is in a training corpus, it is often the case that a resulted translation model may not include these terminology. 2) Are probabilities in a translation model correctly calculated? In order to answer these questions, we did experiment introducing a Multi-Word Expression-sensitive (MWE-sensitive) word aligner and a hierarchical Pitman-Yor process-based translation model smoothing. Using 200k JP--EN NTCIR corpus, our experimental results show that if we introduce an MWE-sensitive word aligner and a new translation model smoothing, the overall improvement was 1.35 BLEU point absolute and 6.0% relative compared to the case we do not introduce these two.

knowledge, terminology, translation model, (13 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Ireland (0.04)
Africa > Middle East > Egypt > Giza Governorate > Giza (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Dissimilarity Kernels for Paraphrase Identification

Lintean, Mihai (University of Memphis) | Rus, Vasile ( University of Memphis )

We present in this paper a novel solution to the problem of paraphrase identification based on lexical dissimilarity kernels. Lexical kernels in conjunction with Support Vector Machines are preferred over other learning methods, e.g. decision trees, due to their ability to handle a high number of features. Dissimilarity-based kernels emphasize dissimilarities among text fragments and therefore are appropriate for text similarity tasks characterized by high lexical overlap. We conducted experiments with our kernels on the Microsoft Research (MSR) Paraphrase Corpus, a standardized data set used for assessing approaches to paraphrase identification. Our reported accuracy results are competitive and robust when compared to state-of-the-art single-model approaches. The results were obtained using 10-fold cross-validation over the entire corpus. We also report competitive results on the test portion of the MSR Paraphrase Corpus, which is the standard way to report results on this corpus.

Country:

Asia > Middle East > Iraq > Kirkuk Governorate > Kirkuk (0.04)
North America > United States > Tennessee > Shelby County > Memphis (0.04)
North America > United States > New Jersey > Bergen County > Mahwah (0.04)
(5 more...)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.89)

The Hierarchy of Detective Fiction: A Gramulator Analysis

Lamkin, Travis Alan (University of Memphis) | McCarthy, Philip (University of Memphis)

Closely related genres have complex interrelations. An antecedent genre can constrain a subsequent genre, but changing rhetorical situations can lead to distinctions between an antecedent and its descendent. In this study, we assess two genres of detective fiction to determine their hierarchical relation to one another. We use the Gramulator, a computational tool that identifies indicative lexical features, to explain the relationship between whodunit fiction and hardboiled fiction . We conclude, based on the indicative lexical features of the expositions in texts, that the two are sibling genres.

corpora, corpus, differential, (15 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Tennessee > Shelby County > Memphis (0.04)
North America > United States > New York > New York County > New York City (0.04)
(9 more...)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.94)

Domain Independent Knowledge Base Population from Structured and Unstructured Data Sources

Gregory, Michelle (Pacific Northwest National Laboratory) | McGrath, Liam (Pacific Northwest National Laboratory) | Bell, Eric Belanga (Pacific Northwest National Laboratory) | O' (Pacific Northwest National Laboratory) | Hara, Kelly (Pacific Northwest National Laboratory) | Domico, Kelly

In this paper we introduce a system that is designed to automatically populate a knowledge base from both structured and unstructured text given an ontology. Our system is designed as a modular end-to-end system that takes structured or unstructured data as input, extracts information, maps relevant information to an ontology, and finally disambiguates entities in the knowledge base. The novelty of our approach is that it is domain independent and can easily be adapted to new ontologies and domains. Unlike most knowledge base population systems, ours includes entity detection. This feature allows one to employ very complex ontologies that include events and the entities that are involved in the events.

information, knowledge base, ontology, (12 more...)

Country:

Europe > United Kingdom (0.14)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Washington > King County > Redmond (0.04)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Simulating Human Ratings on Word Concreteness

Feng, Shi (University of Memphis) | Cai, Zhiqiang (University of Memphis) | Crossley, Scott (Georgia State University) | McNamara, Danielle S ( University of Memphis )

However, word concreteness is not an attribute that a A single word in the human language has many complex computer can directly compute. One means of assessing dimensions such as semantics, parts of speech, lexical type, the characteristics of words is by having humans rate them imagability, concreteness, familiarity, etc. It is important to on the dimensions of interest. Humans are proficient in know the dimensions of words in languages so that we can categorizing words into linguistic dimensions, but it is develop a better theoretical understanding of language and impractical to have humans rating tens of thousands of also to build tools that simulate human intelligence and words that we would need for psycholinguistic research.

concreteness, mcnamara, word concreteness, (16 more...)

Country:

North America > United States > New Jersey > Bergen County > Mahwah (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Tennessee > Shelby County > Memphis (0.04)
North America > United States > New Jersey > Somerset County > Somerset (0.04)

Genre: Research Report > New Finding (0.95)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)
Information Technology > Artificial Intelligence > Cognitive Science > Simulation of Human Behavior (0.40)

Co-Occurrence-Based Error Correction Approach to Word Segmentation

Chaowicharat, Ekawat (Mahidol University) | Naruedomkul, Kanlaya (Mahidol University)

To overcome the problems in Thai word segmentation, a number of word segmentation has been proposed during the long period of time until today. We propose a novel Thai word segmentation approach so called Co-occurrence-Based Error Correction (CBEC). CBEC generates all possible segmentation candidates using the classical maximal matching algorithm and then selects the most accurate segmentation based on co-occurrence and an error correction algorithm. CBEC was trained and evaluated on BEST 2009 corpus.

cbec, corpus, segmentation, (13 more...)

Country: Asia > Thailand > Bangkok > Bangkok (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Data Science > Data Quality > Data Cleaning (0.83)

Automatic Natural Language Processing and the Detection of Reading Skills and Reading Comprehension

Boonthum-Denecke, Chutima (Hampton University) | McCarthy, Philip (University of Memphis) | Lamkin, Travis (University of Memphis) | Jackson, G. Tanner (University of Memphis) | Magliano, Joseph P. (Northern Illinois University) | McNamara, Danielle S. (University of Memphis)

The primary goal of this study is to assess two approaches for detecting comprehension processes in R-SAT (Reading Strategy Assessment Tool). One approach is based on Latent Semantic Analysis (LSA) while the other is a combination of literal word matching and soundex. A secondary goal is to assess the potential for detecting specific reading comprehension strategies, either in isolation or combination. Participants typed “think-aloud” protocols while reading texts presented on computers. Human judges rated these protocols for the presence of the various reading comprehension strategies. LSA, word, and combined algorithms were compared and the results showed that a combination of both approaches yielded the best results. However, performance of the combined algorithm varied in terms of the type of processes and the grain size of the human coding system. Lastly, the use of reading strategies (either in isolation or combination) is positivity related to students’ Gates–MacGinitie reading comprehension scores, which illustrates the merit of this approach for assessing comprehension skill.

algorithm, protocol, regression analysis, (13 more...)

Country:

North America > United States > New Jersey > Bergen County > Mahwah (0.05)
North America > United States > Virginia > Hampton (0.04)
North America > United States > Tennessee > Shelby County > Memphis (0.04)
North America > United States > Illinois > DeKalb County > DeKalb (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Education > Assessment & Standards > Student Performance (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.31)

Automatic Reduction of a Document-Derived Noun Vocabulary

Anderson, Sven (Bard College) | Thomas, S. Rebecca (Bard College) | Segal, Camden (Bard College) | Wu, Yu (Stanford University)

We propose and evaluate five related algorithms that automatically derive limited-size noun vocabularies from text documents of 2,000-30,000 words.The proposed algorithms combine Personalized Page Rank and principles of information maximization, and are applied to the WordNet graph for nouns. For the best-performing algorithm the difference between automatically generated reduced noun lexicons and those created by human writers is approximately 1-2 WordNet edges per lexical item. Our results also indicate the importance of performing word-sense disambiguation with sentence-level context information at the earliest stage of analysis.

algorithm, lexicon, noun, (15 more...)

Country:

North America > United States > Oklahoma (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Blanco, Eduardo (The University of Texas at Dallas) | Cankaya, Hakki (Izmir University of Economics) | Moldovan, Dan (The University of Texas at Dallas)

Commonsense Knowledge Extraction Using Concepts Properties

This paper presents a semantically grounded method for extracting commonsense knowledge. First, commonsense rules are identified, e.g., one cannot see imaginary objects. Second, those rules are combined with a basic semantic representation in order to infer commonsense knowledge facts, e.g. one cannot see a flying carpet. Further combinations of semantic relations with inferred commonsense facts are proposed and analyzed. Results show that this novel method is able to extract thousands of commonsense facts with little human interaction and high accuracy.

commonsense fact, commonsense rule, knowledge, (14 more...)

Country:

North America > United States > Texas > Dallas County > Richardson (0.04)
Asia > Middle East > Republic of Türkiye > İzmir Province > İzmir (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)