Goto

Collaborating Authors

 Grammars & Parsing


Na\"iveRole: Author-Contribution Extraction and Parsing from Biomedical Manuscripts

arXiv.org Machine Learning

Information about the contributions of individual authors to scientific publications is important for assessing authors' achievements. Some biomedical publications have a short section that describes authors' roles and contributions. It is usually written in natural language and hence author contributions cannot be trivially extracted in a machine-readable format. In this paper, we present 1) A statistical analysis of roles in author contributions sections, and 2) Na\"iveRole, a novel approach to extract structured authors' roles from author contribution sections. For the first part, we used co-clustering techniques, as well as Open Information Extraction, to semi-automatically discover the popular roles within a corpus of 2,000 contributions sections from PubMed Central. The discovered roles were used to automatically build a training set for Na\"iveRole, our role extractor approach, based on Na\"ive Bayes. Na\"iveRole extracts roles with a micro-averaged precision of 0.68, recall of 0.48 and F1 of 0.57. It is, to the best of our knowledge, the first attempt to automatically extract author roles from research papers. This paper is an extended version of a previous poster published at JCDL 2018.


An Unsupervised Domain-Independent Framework for Automated Detection of Persuasion Tactics in Text

arXiv.org Artificial Intelligence

With the increasing growth of social media, people have started relying heavily on the information shared therein to form opinions and make decisions. While such a reliance is motivation for a variety of parties to promote information, it also makes people vulnerable to exploitation by slander, misinformation, terroristic and predatorial advances. In this work, we aim to understand and detect such attempts at persuasion. Existing works on detecting persuasion in text make use of lexical features for detecting persuasive tactics, without taking advantage of the possible structures inherent in the tactics used. We formulate the task as a multi-class classification problem and propose an unsupervised, domain-independent machine learning framework for detecting the type of persuasion used in text, which exploits the inherent sentence structure present in the different persuasion tactics. Our work shows promising results as compared to existing work.


salesforce/decaNLP

#artificialintelligence

The Natural Language Decathlon is a multitask challenge that spans ten tasks: question answering (SQuAD), machine translation (IWSLT), summarization (CNN/DM), natural language inference (MNLI), sentiment analysis (SST), semantic role labeling(QA‑SRL), zero-shot relation extraction (QA‑ZRE), goal-oriented dialogue (WOZ, semantic parsing (WikiSQL), and commonsense reasoning (MWSC). Each task is cast as question answering, which makes it possible to use our new Multitask Question Answering Network (MQAN). This model jointly learns all tasks in decaNLP without any task-specific modules or parameters in the multitask setting. For a more thorough introduction to decaNLP and the tasks, see the main website, our blog post, or the paper. While the research direction associated with this repository focused on multitask learning, the framework itself is designed in a way that should make single-task training, transfer learning, and zero-shot evaluation simple.


Design and implementation of an open source Greek POS Tagger and Entity Recognizer using spaCy

arXiv.org Machine Learning

This paper proposes a machine learning approach to part-of-speech tagging and named entity recognition for Greek, focusing on the extraction of morphological features and classification of tokens into a small set of classes for named entities. The architecture model that was used is introduced. The greek version of the spaCy platform was added into the source code, a feature that did not exist before our contribution, and was used for building the models. Additionally, a part of speech tagger was trained that can detect the morphology of the tokens and performs higher than the state-of-the-art results when classifying only the part of speech. For named entity recognition using spaCy, a model that extends the standard ENAMEX type (organization, location, person) was built. Certain experiments that were conducted indicate the need for flexibility in out-of-vocabulary words and there is an effort for resolving this issue. Finally, the evaluation results are discussed.


Findings of the 2016 WMT Shared Task on Cross-lingual Pronoun Prediction

arXiv.org Artificial Intelligence

We describe the design, the evaluation setup, and the results of the 2016 WMT shared task on cross-lingual pronoun prediction. This is a classification task in which participants are asked to provide predictions on what pronoun class label should replace a placeholder value in the target-language text, provided in lemma-tised and PoS-tagged form. We provided four subtasks, for the English-French and English-German language pairs, in both directions. Eleven teams participated in the shared task; nine for the English-French subtask, five for French-English, nine for English-German, and six for German-English. Most of the submissions outperformed two strong language-model- based baseline systems, with systems using deep recurrent neural networks outperforming those using other architectures for most language pairs.


Feature-Rich Part-of-speech Tagging for Morphologically Complex Languages: Application to Bulgarian

arXiv.org Artificial Intelligence

Unlike most previous work, which has used a small number of grammatical categories, we work with 680 morpho-syntactic tags. W e combine a large morphological lexicon with prior linguistic knowledge and guided learning from a POSannotated corpus, achieving accuracy of 97.98%, which is a significant improvement over the state-of-the-art for Bulgarian.


SemEval-2015 Task 3: Answer Selection in Community Question Answering

arXiv.org Artificial Intelligence

Community Question Answering (cQA) provides new interesting research directions to the traditional Question Answering (QA) field, e.g., the exploitation of the interaction between users and the structure of related posts. In this context, we organized SemEval-2015 Task 3 on "Answer Selection in cQA", which included two subtasks: (a) classifying answers as "good", "bad", or "potentially relevant" with respect to the question, and (b) answering a YES/NO question with "yes", "no", or "unsure", based on the list of all answers. We set subtask A for Arabic and English on two relatively different cQA domains, i.e., the Qatar Living website for English, and a Quran-related website for Arabic. We used crowdsourcing on Amazon Mechanical Turk to label a large English training dataset, which we released to the research community. Thirteen teams participated in the challenge with a total of 61 submissions: 24 primary and 37 contrastive. The best systems achieved an official score (macro-averaged F1) of 57.19 and 63.7 for the English subtasks A and B, and 78.55 for the Arabic subtask A.


Filling Conversation Ellipsis for Better Social Dialog Understanding

arXiv.org Artificial Intelligence

The phenomenon of ellipsis is prevalent in social conversations. Ellipsis increases the difficulty of a series of downstream language understanding tasks, such as dialog act prediction and semantic role labeling. We propose to resolve ellipsis through automatic sentence completion to improve language understanding. However, automatic ellipsis completion can result in output which does not accurately reflect user intent. To address this issue, we propose a method which considers both the original utterance that has ellipsis and the automatically completed utterance in dialog act and semantic role labeling tasks. Specifically, we first complete user utterances to resolve ellipsis using an end-to-end pointer network model. We then train a prediction model using both utterances containing ellipsis and our automatically completed utterances. Finally, we combine the prediction results from these two utterances using a selection model that is guided by expert knowledge. Our approach improves dialog act prediction and semantic role labeling by 1.3% and 2.5% in F1 score respectively in social conversations. We also present an open-domain human-machine conversation dataset with manually completed user utterances and annotated semantic role labeling after manual completion. Introduction Ellipsis, in which a speaker omits words that are understood from context, is a frequent phenomenon in human conversation. Although natural to humans, ellipsis poses a challenge for language understanding in spoken dialog systems.


NLP News Cypher 11.24.19

#artificialintelligence

The French RoBERTa, aka CamemBERT, is now part of Hugging Face's transformer library. The transformer achieves state-of-the-art (SOTA) results on several NLP downstream tasks: part-of-speech tagging, dependency parsing, named-entity recognition, and natural language inference in French. Need a cheat-sheet for data science or ML? Thanks to this fellow, the biggest payload of cheat sheets in the galaxy covers several programming languages and use-cases is easily accessible on GitHub. One of the biggest retailers on planet Earth is turning to Conversational AI. This past week Walmart announced its partnership with Apple's Siri for peeps looking to buy groceries online -- the service is called Walmart Voice Order.