AITopics

2207.0527

Country:

Asia > China > Guangdong Province > Shenzhen (0.05)
Europe > Middle East > Cyprus > Nicosia > Nicosia (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)

Genre: Overview (1.00)

Industry:

Transportation > Passenger (0.68)
Transportation > Air (0.68)
Consumer Products & Services > Travel (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.79)

arXiv.org Artificial IntelligenceJul-6-2022

Learning grammar with a divide-and-concur neural network

Deyo, Sean, Elser, Veit

We implement a divide-and-concur iterative projection approach to context-free grammar inference. Unlike most state-of-the-art models of natural language processing, our method requires a relatively small number of discrete parameters, making the inferred grammar directly interpretable -- one can read off from a solution how to construct grammatically valid sentences. Another advantage of our approach is the ability to infer meaningful grammatical rules from just a few sentences, compared to the hundreds of gigabytes of training data many other models employ. We demonstrate several ways of applying our approach: classifying words and inferring a grammar from scratch, taking an existing grammar and refining its categories and rules, and taking an existing grammar and expanding its lexicon as it encounters new words in new data.

artificial intelligence, machine learning, natural language, (17 more...)

doi: 10.1103/PhysRevE.105.064303

2201.07341

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.68)

arXiv.org Artificial IntelligenceJul-4-2022

Building a Relation Extraction Baseline for Gene-Disease Associations: A Reproducibility Study

Menotti, Laura

Reproducibility is an important task in scientific research. It is crucial for researchers to compare newly developed systems with the state-of-the-art to assess whether they made a breakthrough. However previous works may not be immediately reproducible, for example due to the lack of source code. In this work we reproduce DEXTER, a system to automatically extract Gene-Disease Associations (GDAs) from biomedical abstracts.[1] The goal is to provide a benchmark for future works regarding Relation Extraction (RE), enabling researchers to test and compare their results.

dexter, expression, information, (16 more...)

2207.06226

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.96)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.71)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)

#artificialintelligenceJul-3-2022, 10:40:29 GMT

AI

The purposeful exchange of information caused by the creation and perception of signals drawn from a shared system of conventional signs is known as communication. Most animals employ signals to convey vital messages: there's food here, there's a predator nearby, approach, recede, and let's mate. Communication can help agents succeed in a partially visible world because they can learn knowledge that others have observed or inferred. Humans are the most talkative of all species, thus computer agents will need to master the language if they are to be useful. Language models for communication are examined in this chapter.

context-free grammar, grammar, hierarchy, (16 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

arXiv.org Artificial IntelligenceJun-18-2022

A Double-Graph Based Framework for Frame Semantic Parsing

Zheng, Ce, Chen, Xudong, Xu, Runxin, Chang, Baobao

Frame semantic parsing is a fundamental NLP task, which consists of three subtasks: frame identification, argument identification and role classification. Most previous studies tend to neglect relations between different subtasks and arguments and pay little attention to ontological frame knowledge defined in FrameNet. In this paper, we propose a Knowledge-guided Incremental semantic parser with Double-graph (KID). We first introduce Frame Knowledge Graph (FKG), a heterogeneous graph containing both frames and FEs (Frame Elements) built on the frame knowledge so that we can derive knowledge-enhanced representations for frames and FEs. Besides, we propose Frame Semantic Graph (FSG) to represent frame semantic structures extracted from the text with graph structures. In this way, we can transform frame semantic parsing into an incremental graph construction problem to strengthen interactions between subtasks and relations between arguments. Our experiments show that KID outperforms the previous state-of-the-art method by up to 1.7 F1-score on two FrameNet datasets. Our code is availavle at https://github.com/PKUnlp-icler/KID.

artificial intelligence, computational linguistic, natural language, (15 more...)

doi: 10.18653/v1/2022.naacl-main.368

2206.09158

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Denmark > Capital Region > Copenhagen (0.04)
(10 more...)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

#artificialintelligenceJun-10-2022, 14:04:27 GMT

Natural Language Processing: Part of Speech Tagging - PythonAlgos

Part of Speech (POS) Tagging is an integral part of Natural Language Processing (NLP). The first step in most state of the art NLP pipelines is tokenization. Tokenization is the separating of text into "tokens". Tokens are generally regarded as individual pieces of languages – words, whitespace, and punctuation. Once we tokenize our text we can tag it with the part of speech, note that this article only covers the details of part of speech tagging for English.

natural language processing, spacy, speech, (12 more...)

Industry: Energy (0.32)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.98)

Eder, Maciej, Górski, Rafał. L.

Stylistic Fingerprints, POS-tags and Inflected Languages: A Case Study in Polish

arXiv.org Artificial IntelligenceJun-5-2022

In stylometric investigations, frequencies of the most frequent words (MFWs) and character n-grams outperform other style-markers, even if their performance varies significantly across languages. In inflected languages, word endings play a prominent role, and hence different word forms cannot be recognized using generic text tokenization. Countless inflected word forms make frequencies sparse, making most statistical procedures complicated. Presumably, applying one of the NLP techniques, such as lemmatization and/or parsing, might increase the performance of classification. The aim of this paper is to examine the usefulness of grammatical features (as assessed via POS-tag n-grams) and lemmatized forms in recognizing authorial profiles, in order to address the underlying issue of the degree of freedom of choice within lexis and grammar. Using a corpus of Polish novels, we performed a series of supervised authorship attribution benchmarks, in order to compare the classification accuracy for different types of lexical and syntactic style-markers. Even if the performance of POS-tags as well as lemmatized forms was notoriously worse than that of lexical markers, the difference was not substantial and never exceeded ca. 15%.

artificial intelligence, natural language, po-tag and inflected language, (2 more...)

doi: 10.1080/09296174.2022.2122751

2206.02208

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

#artificialintelligenceMay-27-2022, 07:18:41 GMT

Intel AVX-512 A Big Win For... JSON Parsing Performance

In addition to the many HPC workloads and other scientific computing tasks where Intel's AVX-512 performance on their latest processor proves very beneficial, it also turns out AVX-512 can provide significant benefit to a much more mundane web server task: JSON parsing. The simdjson project that is focused on "parsing gigabytes of JSON per second" this week issued simdjson 2.0 and is headlined by an Intel-led contribution of AVX-512 support. The JavaScript Object Notation (JSON) data interchange format is heavily used by practically all major websites/web-applications in some capacity and can be dealt with by pretty much all programming languages. JSON really need not any introduction. The past few years there has been simdjson as an open-source (Apache 2.0 licensed) project aimed at delivering the highest performance JSON parser that can parse "gigabytes of JSON per second" and claims of being 4 25x faster than alternatives.

avx-512, json parsing performance, simdjson 2, (5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.88)
Information Technology > Software > Programming Languages (0.59)
Information Technology > Scientific Computing (0.59)

arXiv.org Artificial IntelligenceMay-26-2022

Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and BERT Models for Maltese

Micallef, Kurt, Gatt, Albert, Tanti, Marc, van der Plas, Lonneke, Borg, Claudia

Multilingual language models such as mBERT have seen impressive cross-lingual transfer to a variety of languages, but many languages remain excluded from these models. In this paper, we analyse the effect of pre-training with monolingual data for a low-resource language that is not included in mBERT -- Maltese -- with a range of pre-training set ups. We conduct evaluations with the newly pre-trained models on three morphosyntactic tasks -- dependency parsing, part-of-speech tagging, and named-entity recognition -- and one semantic classification task -- sentiment analysis. We also present a newly created corpus for Maltese, and determine the effect that the pre-training data size and domain have on the downstream performance. Our results show that using a mixture of pre-training domains is often superior to using Wikipedia text only. We also find that a fraction of this corpus is enough to make significant leaps in performance over Wikipedia-trained models. We pre-train and compare two models on the new corpus: a monolingual BERT model trained from scratch (BERTu), and a further pre-trained multilingual BERT (mBERTu). The models achieve state-of-the-art performance on these tasks, despite the new corpus being considerably smaller than typically used corpora for high-resourced languages. On average, BERTu outperforms or performs competitively with mBERTu, and the largest gains are observed for higher-level tasks.

low-resource language, new corpus and bert model, pre-training data quality and quantity

doi: 10.18653/v1/2022.deeplo-1.10

2205.10517

Genre: Research Report > New Finding (0.53)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.53)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.53)

#artificialintelligenceMay-23-2022, 15:25:14 GMT

How Writing SQL Could Get a Whole Lot Easier With NLQ

What is the most intuitive, efficient, and least mentally draining way to ask a question? It is using the simplest words possible in your own language. Modern search engines such as Google has made searching for information online using simple sentences commonplace. This had helped create our modern society and improved access to information globally; it's hard to overstate how transformational the advent of the search engine truly was. However, searching for information on the internet didn't truly become democratized and popular until we could ask the internet questions using natural language in the same way we would talk to another person.

dataset, natural language, query, (14 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.31)