AITopics

Augmenting word tokens with a phonetic representation, derived from a dictionary, improves the performance of a Natural Language Understanding component that interprets speech recognizer output: we observed a 5% to 7% reduction in errors across a wide range of response return rates. The best performance comes from mixture models incorporating both word and phone features. Since the phonetic representation is derived from a dictionary, the method can be applied easily without the need for integration with a specific speech recognizer. The method has similarities with autonomous (or bottom-up) psychological models of lexical access, where contextual information is not integrated at the stage of auditory perception but rather later.

language model, tokenizer, utterance, (15 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York > Monroe County > Rochester (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
(3 more...)

Genre: Research Report (0.46)

Industry: Government (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.93)

Villena-Román, Julio (Universidad Carlos III de Madrid) | Collada-Pérez, Sonia (Daedalus - Data, Decisions and Language, S.A.) | Lana-Serrano, Sara (Universidad Politécnica de Madrid) | González-Cristóbal, José Carlos (Universidad Politécnica de Madrid)

Hybrid Approach Combining Machine Learning and a Rule-Based Expert System for Text Categorization

This paper discusses a novel hybrid approach for text categorization that combines a machine learning algorithm, which provides a base model trained with a labeled corpus, with a rule-based expert system, which is used to improve the results provided by the previous classifier, by filtering false positives and dealing with false negatives. The main advantage is that the system can be easily fine-tuned by adding specific rules for those noisy or conflicting categories that have not been successfully trained. We also describe an implementation based on k-Nearest Neighbor and a simple rule language to express lists of positive, negative and relevant (multiword) terms appearing in the input text. The system is evaluated in several scenarios, including the popular Reuters-21578 news corpus for comparison to other approaches, and categorization using IPTC metadata, EUROVOC thesaurus and others. Results show that this approach achieves a precision that is comparable to top ranked methods, with the added value that it does not require a demanding human expert workload to train.

categorization, category, classifier, (15 more...)

Country:

North America > United States > Missouri > Jackson County > Kansas City (0.14)
Europe > Spain > Galicia > Madrid (0.05)
Oceania > Australia > Victoria > Melbourne (0.04)
(4 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Media > News (0.68)
Leisure & Entertainment > Sports (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(3 more...)

Disambiguation and Filtering Methods in Using Web Knowledge for Coreference Resolution

Uryupina, Olga (CiMEC, University of Trento) | Poesio, Massimo (CiMEC, University of Trento) | Giuliano, Claudio (Fondazione Bruno Kessler) | Tymoshenko, Kateryna (Fondazione Bruno Kessler)

We investigate two publicly available web knowledge bases, Wikipedia and Yago, in an attempt to leverage semantic information and increase the performance level of a state-of-the-art coreference resolution (CR) engine. We extract semantic compatibility and aliasing information from Wikipedia and Yago, and incorporate it into a CR system. We show that using such knowledge with no disambiguation and filtering does not bring any improvement over the baseline, mirroring the previous findings. We propose, therefore, a number of solutions to reduce the amount of noise coming from web resources: using disambiguation tools for Wikipedia, pruning Yago to eliminate the most generic categories and imposing additional constraints on affected mentions. Our evaluation experiments on the ACE-02 corpus show that the knowledge, extracted from Wikipedia and Yago, improves our system's performance by 2-3 percentage points.

information, knowledge, resolution, (16 more...)

Country:

Asia > Afghanistan (0.05)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
Europe > Czechia (0.04)
(13 more...)

Genre: Research Report (0.47)

Industry: Government (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Sellmi, Oussama (SOIE, ISG de Tunis)

Event Extraction Approach for French Language

S. Tenier, A. Napoli, X. Polanco and Y.Toussaint (2006) With the proliferation of news articles from thousands of developed an automatic WebPages semantic annotation different sources now available on the Web, summarization system. The objective is to classify pages concerning teams of such information is becoming increasingly important. of research, in order to be able to determine for example Considering the large number of news source (for who works where, on what and with whom (use of examples, BBC, Reuters, CNN…), every day, thousands of ontology of the domain). It consists, first, of the articles are produced in the entire world concerning a given identification of the syntactic structure characterizing the event.

annotation, fsim, similarity, (16 more...)

Country:

Asia > Middle East > Iraq > Baghdad Governorate > Baghdad (0.05)
North America > United States > New York (0.04)
North America > United States > Florida > Volusia County > Daytona Beach (0.04)
(7 more...)

Industry: Government > Military (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.67)
Information Technology > Communications > Web > Semantic Web (0.49)

A Linguistic Analysis of Student-Generated Paraphrases

Rus, Vasile (The University of Memphis) | Feng, Shi (The University of Memphis) | Brandon, Russell (The University of Memphis) | Crossley, Scott (Georgia State University) | McNamara, Danielle S. (The University of Memphis)

Paraphrase identification is a core Natural Language Processing task that involves assessing the semantic similarity of two texts. To foster systematic studies of this task, standardized datasets were created on which various approaches could be compared more fairly. However, a better understanding and more precise operational definition of a paraphrase are needed before any further datasets or systematic evaluations of the task of paraphrase identification are proposed. This study develops the concept of paraphrasing as a writing strategy. Six types of paraphrases are defined through the creation of a relatively large corpus of student-generated paraphrases. These paraphrases are analyzed along several dozen linguistic dimensions ranging from cohesion to lexical diversity. The most significant indices from these dimensions were then used to build a prediction model that could identify true and false paraphrases and each of the six paraphrase types.

original passage, original text, student, (17 more...)

Country:

North America > United States > New Jersey > Bergen County > Mahwah (0.04)
North America > United States > Missouri (0.04)
North America > United States > Tennessee > Shelby County > Memphis (0.04)
(4 more...)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.89)

Fairy Tales and ESL Texts: An Analysis of Linguistic Features Using the Gramulator

Rufenacht, Rachel M. (University of Memphis) | McCarthy, Philip M. (University of Memphis) | Lamkin, Travis A (University of Memphis)

Using the Gramulator, we analyzed the linguistic features of ESL texts and fairy tales. Our goal was to determine if fairy tales had the potential to be used as reading material for English language learners. The results of our analyses suggest that there are significant similarities between fairy tales and ESL texts, but that differences lie in the content of the text types with fairy tales appearing significantly more narrative in style and ESL texts appearing more expository.

esl text, fairy tale, student, (15 more...)

Country:

North America > United States > California > San Mateo County > Menlo Park (0.04)
North America > United States > Tennessee > Shelby County > Memphis (0.04)
North America > United States > New York > Westchester County > White Plains (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.69)
Research Report > Experimental Study (0.47)

Industry: Education > Educational Setting (0.47)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Automated Assessment of Paragraph Quality: Introduction, Body, and Conclusion Paragraphs

Roscoe, Rod (University of Memphis) | Crossley, Scott (Georgia State University) | Weston, Jennifer (University of Memphis) | McNamara, Danielle (University of Memphis)

Natural language processing and statistical methods were used to identify linguistic features associated with the quality of student-generated paragraphs. Linguistic features were assessed using Coh-Metrix. The resulting computational models demonstrated small to medium effect sizes for predicting paragraph quality: introduction quality r2 = .25, body quality r2 = .10, and conclusion quality r2 = .11. Although the variance explained was somewhat low, the linguistic features identified were consistent with the rhetorical goals of paragraph types. Avenues for bolstering this approach by considering individual writing styles and techniques are considered.

information, paragraph, paragraph quality, (17 more...)

Country:

North America > United States > California (0.04)
North America > United States > Tennessee > Shelby County > Memphis (0.04)
North America > United States > Mississippi (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Education > Assessment & Standards > Student Performance (0.69)
Education > Educational Technology > Educational Software (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Student Speech Act Classification Using Machine Learning

Rasor, Travis (University of Memphis) | Olney, Andrew ( University of Memphis ) | D' ( University of Memphis ) | Mello, Sidney

Dialogue-based intelligent tutoring systems use speech act classifiers to categorize student input into answers, questions, and other speech acts. Previous work has primarily focused on question classification. In this paper, we present a complimentary speech act classifier that focuses primarily on non-questions, which was developed using machine learning techniques. Our results show that an effective speech act classifier can be developed directly from labeled data using decision trees.

classification, classifier, dialogue act, (15 more...)

Country:

Asia > India > Karnataka > Bengaluru (0.04)
North America > United States > Tennessee > Shelby County > Memphis (0.04)
North America > United States > New York (0.04)
(7 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Technology > Educational Software > Computer Based Training (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Automatic Natural Language Processing and the Detection of Reading Skills and Reading Comprehension

Boonthum-Denecke, Chutima (Hampton University) | McCarthy, Philip (University of Memphis) | Lamkin, Travis (University of Memphis) | Jackson, G. Tanner (University of Memphis) | Magliano, Joseph P. (Northern Illinois University) | McNamara, Danielle S. (University of Memphis)

The primary goal of this study is to assess two approaches for detecting comprehension processes in R-SAT (Reading Strategy Assessment Tool). One approach is based on Latent Semantic Analysis (LSA) while the other is a combination of literal word matching and soundex. A secondary goal is to assess the potential for detecting specific reading comprehension strategies, either in isolation or combination. Participants typed “think-aloud” protocols while reading texts presented on computers. Human judges rated these protocols for the presence of the various reading comprehension strategies. LSA, word, and combined algorithms were compared and the results showed that a combination of both approaches yielded the best results. However, performance of the combined algorithm varied in terms of the type of processes and the grain size of the human coding system. Lastly, the use of reading strategies (either in isolation or combination) is positivity related to students’ Gates–MacGinitie reading comprehension scores, which illustrates the merit of this approach for assessing comprehension skill.

algorithm, protocol, regression analysis, (13 more...)

Country:

North America > United States > New Jersey > Bergen County > Mahwah (0.05)
North America > United States > Virginia > Hampton (0.04)
North America > United States > Tennessee > Shelby County > Memphis (0.04)
North America > United States > Illinois > DeKalb County > DeKalb (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Education > Assessment & Standards > Student Performance (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.31)

Stent, Amanda J. (AT&T Labs &ndash)

Shared Experiences, Shared Representations, and the Implications for Applied Natural Language Processing

When people interact with language-producing agents (other people or computers), they assume that the shared experience leads to shared representations — of the world, the interaction, and the language used in the interaction. This phenomenon occurs even during interaction with systems that give no evidence of building shared representations. The absence of shared representations leads to errors and delays; alternatively, even simple shared representations can lead to reduced error rates and more efficient interaction. In this talk, we present three case studies: a mobile local business search application that builds no interaction representations; a telephone-based recommendation and review system that builds limited representations of the shared language in the interaction; and computer models of coreference that use shared representations to permit both coreference resolution and referring expression generation. We lay out a range of possibilities for shared representations, show that they can be built incrementally as an interaction progresses, and point to possibilities for future work in probabilistic shared representations for interactive systems.

dialog system, proceedings, representation, (16 more...)

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > India > Karnataka > Bengaluru (0.05)
North America > United States > Texas > Dallas County > Dallas (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.48)

Industry: Consumer Products & Services (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)