AITopics

#artificialintelligenceAug-27-2019, 07:01:06 GMT

Part of speech - Word Tagger

The process of classifying words into their parts of speech and labeling them accordingly is known as part-of-speech tagging or POS-tagging, or simply tagging. Parts of speech are also known as word classes or lexical categories. The part of speech explains how a word is used in a sentence. There are eight main parts of speech -- nouns, pronouns, adjectives, verbs, adverbs, prepositions, conjunctions and interjections. The collection of tags used for a particular task is known as a Tagset.

artificial intelligence, natural language, speech, (7 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Jauhiainen, Tommi, Lui, Marco, Zampieri, Marcos, Baldwin, Timothy, Lindén, Krister

Automatic Language Identification in Texts: A Survey

Journal of Artificial Intelligence ResearchAug-25-2019

Language identification ("LI") is the problem of determining the natural language that a document or part thereof is written in. Automatic LI has been extensively researched for over fifty years. Today, LI is a key part of many text processing pipelines, as text processing techniques generally assume that the language of the input text is known. Research in this area has recently been especially active. This article provides a brief history of LI research, and an extensive survey of the features and methods used in the LI literature. We describe the features and methods using a unified notation, to make the relationships between methods clearer. We discuss evaluation methods, applications of LI, as well as off-the-shelf LI systems that do not require training by the end user. Finally, we identify open issues, survey the work to date on each issue, and propose future directions for research in LI.

pattern recognition association, text-based language identification, word-level language identification, (16 more...)

doi: 10.1613/jair.1.11675

AI Access Foundation

11675

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
(135 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.92)

Industry:

Information Technology > Services (1.00)
Education (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(10 more...)

Besharati, MohammadReza, Izadi, Mohammad

DAST Model: Deciding About Semantic Complexity of a Text

arXiv.org Artificial IntelligenceAug-23-2019

Measuring of text complexity is a needed task in several domains and applications (such as NLP, semantic web, smart education and etc.). The Semantic layer of a text is more tacit than its syntactic structure and as a result, calculation of semantic complexity is more difficult. Whereas there are famous and powerful academic and commercial syntactic complexity measures, the problem of measuring Semantic complexity is a challenging one, yet. In this article, we introduce the DAST model which stands for Deciding About Semantic Complexity of a Text. In this model, an intuitionistic approach to semantics lets us have a well-defined definition for semantic of a text and its complexity: we consider semantic and meaning as a lattice of intuitions. Semantic complexity is defined as the result of a calculation on this lattice. A set theoretic formal definition of semantic complexity, as a 6-tuple formal system, is provided. By using this formal system, a method for measuring semantic complexity is presented. The evaluation of the proposed approach is done by a detailed example and a case study, a set of eighteen human-judgment experiments and a corpus-based evaluation. The results show that DAST model is capable of deciding about semantic complexity of a text. Furthermore, Analysis of the experiment results leads us to introduce a Markovian model for the process of common-sense multi-steps semantic-complexity reasoning in people. The Experiments-result demonstrates that our method consistently outperforms the random baseline in terms of better precision and accuracy.

logic & formal reasoning, machine learning, natural language, (20 more...)

1908.0908

Country: Europe (0.28)

Genre: Research Report > New Finding (0.86)

Industry: Education (0.93)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(4 more...)

arXiv.org Artificial IntelligenceAug-22-2019

The compositionality of neural networks: integrating symbolism and connectionism

Hupkes, Dieuwke, Dankers, Verna, Mul, Mathijs, Bruni, Elia

Despite a multitude of empirical studies, little consensus exists on whether neural networks are able to generalise compositionally, a controversy that, in part, stems from a lack of agreement about what it means for a neural model to be compositional. As a response to this controversy, we present a set of tests that provide a bridge between, on the one hand, the vast amount of linguistic and philosophical theory about compositionality and, on the other, the successful neural models of language. We collect different interpretations of compositionality and translate them into five theoretically grounded tests that are formulated on a task-independent level. In particular, we provide tests to investigate (i) if models systematically recombine known parts and rules (ii) if models can extend their predictions beyond the length they have seen in the training data (iii) if models' composition operations are local or global (iv) if models' predictions are robust to synonym substitutions and (v) if models favour rules or exceptions during training. To demonstrate the usefulness of this evaluation paradigm, we instantiate these five tests on a highly compositional data set which we dub PCFG SET and apply the resulting tests to three popular sequence-to- sequence models: a recurrent, a convolution based and a transformer model. We provide an in depth analysis of the results, that uncover the strengths and weaknesses of these three architectures and point to potential areas of improvement.

compositionality, input sequence, sequence, (17 more...)

1908.08351

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(2 more...)

Genre: Research Report > New Finding (0.92)

Industry: Energy (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceAug-13-2019

Reasoning-Driven Question-Answering for Natural Language Understanding

Khashabi, Daniel

Natural language understanding (NLU) of text is a fundamental challenge in AI, and it has received significant attention throughout the history of NLP research. This primary goal has been studied under different tasks, such as Question Answering (QA) and Textual Entailment (TE). In this thesis, we investigate the NLU problem through the QA task and focus on the aspects that make it a challenge for the current state-of-the-art technology. This thesis is organized into three main parts: In the first part, we explore multiple formalisms to improve existing machine comprehension systems. We propose a formulation for abductive reasoning in natural language and show its effectiveness, especially in domains with limited training data. Additionally, to help reasoning systems cope with irrelevant or redundant information, we create a supervised approach to learn and detect the essential terms in questions. In the second part, we propose two new challenge datasets. In particular, we create two datasets of natural language questions where (i) the first one requires reasoning over multiple sentences; (ii) the second one requires temporal common sense reasoning. We hope that the two proposed datasets will motivate the field to address more complex problems. In the final part, we present the first formal framework for multi-step reasoning algorithms, in the presence of a few important properties of language use, such as incompleteness, ambiguity, etc. We apply this framework to prove fundamental limitations for reasoning algorithms. These theoretical results provide extra intuition into the existing empirical evidence in the field.

comprehension task, machine reading comprehension, reading comprehension dataset, (16 more...)

1908.04926

Country:

North America > United States > California > San Francisco County > San Francisco (0.13)
North America > United States > New York (0.04)
North America > United States > Pennsylvania (0.04)
(18 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Experimental Study (0.92)

Industry:

Education > Curriculum > Subject-Specific Education (1.00)
Education > Educational Setting > K-12 Education (0.92)
Health & Medicine (0.92)
Education > Assessment & Standards > Student Performance (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(13 more...)

arXiv.org Artificial IntelligenceAug-13-2019

Fine-grained Information Status Classification Using Discourse Context-Aware Self-Attention

Hou, Yufang

Previous work on bridging anaphora recognition (Hou et al., 2013a) casts the problem as a subtask of learning fine-grained information status (IS). However, these systems heavily depend on many handcrafted linguistic features. In this paper, we propose a discourse context-aware self-attention neural network model for fine-grained IS classification. On the ISNotes corpus (Markert et al., 2012), our model with the contextually-encoded word representations (BERT) (Devlin et al., 2018) achieves new state-of-the-art performances on fine-grained IS classification, obtaining a 4.1% absolute overall accuracy improvement compared to Hou et al. (2013a). More importantly, we also show an improvement of 3.9% F1 for bridging anaphora recognition without using any complex handcrafted semantic features designed for capturing the bridging phenomenon. 1 Introduction Information Structure (Halliday, 1967; Prince, 1981, 1992; Gundel et al., 1993; Lambrecht, 1994; Birner and Ward, 1998; Kruijff-Korbayov a and Steedman, 2003) studies structural and semantic properties of a sentence according to its relation to the discourse context.

classification, machine learning, natural language, (15 more...)

1908.04755

Country:

North America > United States > Louisiana (0.15)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > Middle East > Qatar (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Ruder, Sebastian, Vulić, Ivan, Søgaard, Anders

A Survey of Cross-lingual Word Embedding Models

Journal of Artificial Intelligence ResearchAug-12-2019

Cross-lingual representations of words enable us to reason about word meaning in multilingual contexts and are a key facilitator of cross-lingual transfer when developing natural language processing models for low-resource languages. In this survey, we provide a comprehensive typology of cross-lingual word embedding models. We compare their data requirements and objective functions. The recurring theme of the survey is that many of the models presented in the literature optimize for the same objectives, and that seemingly different models are often equivalent, modulo optimization strategies, hyper-parameters, and such. We also discuss the different ways cross-lingual word embeddings are evaluated, as well as future challenges and research horizons.

cross-lingual word, proceedings, representation, (17 more...)

doi: 10.1613/jair.1.11640

AI Access Foundation

11640

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.27)
Europe > Denmark > Capital Region > Copenhagen (0.04)
Asia > Middle East > Jordan (0.04)
(3 more...)

Genre: Overview (1.00)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
(4 more...)

Mitra, Arindam, Baral, Chitta, Bhattacharjee, Aurgho, Shrivastava, Ishan

A Generate-Validate Approach to Answering Questions about Qualitative Relationships

arXiv.org Artificial IntelligenceAug-9-2019

Qualitative relationships describe how increasing or decreasing one property (e.g. altitude) affects another (e.g. temperature). They are an important aspect of natural language question answering and are crucial for building chatbots or voice agents where one may enquire about qualitative relationships. Recently a dataset about question answering involving qualitative relationships has been proposed, and a few approaches to answer such questions have been explored, in the heart of which lies a semantic parser that converts the natural language input to a suitable logical form. A problem with existing semantic parsers is that they try to directly convert the input sentences to a logical form. Since the output language varies with each application, it forces the semantic parser to learn almost everything from scratch. In this paper, we show that instead of using a semantic parser to produce the logical form, if we apply the generate-validate framework i.e. generate a natural language description of the logical form and validate if the natural language description is followed from the input text, we get a better scope for transfer learning and our method outperforms the state-of-the-art by a large margin of 7.93%.

artificial intelligence, friction, natural language, (17 more...)

1908.03645

Genre: Research Report (0.40)

Industry: Education > Curriculum > Subject-Specific Education (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

#artificialintelligenceAug-8-2019, 19:37:11 GMT

Lexical semantics - Wikipedia

Lexical semantics (also known as lexicosemantics), is a subfield of linguistic semantics. The units of analysis in lexical semantics are lexical units which include not only words but also sub-words or sub-units such as affixes and even compound words and phrases. Lexical units make up the catalogue of words in a language, the lexicon. Lexical semantics looks at how the meaning of the lexical units correlates with the structure of the language or syntax. This is referred to as syntax-semantic interface.[1] Lexical units, also referred to as syntactic atoms, can stand alone such as in the case of root words or parts of compound words or they necessarily attach to other units such as prefixes and suffixes do. The former are called free morphemes and the latter bound morphemes.[2]

artificial intelligence, natural language, text processing, (20 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.73)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.50)