Goto

Collaborating Authors

 Information Extraction


HumSet: Dataset of Multilingual Information Extraction and Classification for Humanitarian Crisis Response

arXiv.org Artificial Intelligence

Timely and effective response to humanitarian crises requires quick and accurate analysis of large amounts of text data - a process that can highly benefit from expert-assisted NLP systems trained on validated and annotated data in the humanitarian response domain. To enable creation of such NLP systems, we introduce and release HumSet, a novel and rich multilingual dataset of humanitarian response documents annotated by experts in the humanitarian response community. The dataset provides documents in three languages (English, French, Spanish) and covers a variety of humanitarian crises from 2018 to 2021 across the globe. For each document, HUMSET provides selected snippets (entries) as well as assigned classes to each entry annotated using common humanitarian information analysis frameworks. HUMSET also provides novel and challenging entry extraction and multi-label entry classification tasks. In this paper, we take a first step towards approaching these tasks and conduct a set of experiments on Pre-trained Language Models (PLM) to establish strong baselines for future research in this domain. The dataset is available at https://blog.thedeep.io/humset/.


LinkedIn Data Science Interview Questions

#artificialintelligence

I recently interviewed for a research engineer (vision) role at LinkedIn. In this role the candidate is expected to work on state-of-the-art computer vision algorithms to understand users and content on the platform. In this post, I'll summarize the questions and the whole interview process. I got a call from LinkedIn recruiter taking my background and my preference. They gave a list of datasets from different domains like Songs DB, NLP, CV and asked me to build a report out of it.


How to download your Twitter data

PCWorld

The future of Twitter has now become much murkier, thanks to the recent shake-up in Twitter's management. But even if you're not wondering how you'll have access to your Twitter account as it is, grabbing a copy of your data is never a bad idea. Especially when it's so easy--exporting your account info is straightforward. Just a few clicks (and a reauthentication of your account credentials) will put in your request for your data archive, which you'll receive a day or so later. Here's how to do it.


Data-efficient End-to-end Information Extraction for Statistical Legal Analysis

arXiv.org Artificial Intelligence

Legal practitioners often face a vast amount of documents. Lawyers, for instance, search for appropriate precedents favorable to their clients, while the number of legal precedents is ever-growing. Although legal search engines can assist finding individual target documents and narrowing down the number of candidates, retrieved information is often presented as unstructured text and users have to examine each document thoroughly which could lead to information overloading. This also makes their statistical analysis challenging. Here, we present an end-to-end information extraction (IE) system for legal documents. By formulating IE as a generation task, our system can be easily applied to various tasks without domain-specific engineering effort. The experimental results of four IE tasks on Korean precedents shows that our IE system can achieve competent scores (-2.3 on average) compared to the rule-based baseline with as few as 50 training examples per task and higher score (+5.4 on average) with 200 examples. Finally, our statistical analysis on two case categories--drunk driving and fraud--with 35k precedents reveals the resulting structured information from our IE system faithfully reflects the macroscopic features of Korean legal system.


Deploying a Sentiment Analysis Text Classifier With FastAPI

#artificialintelligence

FastAPI has recently been making waves as an easy-to-use Python framework for creating APIs. If you're developing apps with FastAPI, you can add language processing capabilities to it by integrating Cohere's Large Language Models. In this article, you will learn how to create and finetune a Cohere sentiment analysis classification model, and generate predictions by making API calls to it using FastAPI. To follow this tutorial, you will need a Cohere account to generate an API key, create a finetuned model, and generate API calls. You also need a Python coding environment, such as VS Code.


Generative Entity-to-Entity Stance Detection with Knowledge Graph Augmentation

arXiv.org Artificial Intelligence

Stance detection is typically framed as predicting the sentiment in a given text towards a target entity. However, this setup overlooks the importance of the source entity, i.e., who is expressing the opinion. In this paper, we emphasize the need for studying interactions among entities when inferring stances. We first introduce a new task, entity-to-entity (E2E) stance detection, which primes models to identify entities in their canonical names and discern stances jointly. To support this study, we curate a new dataset with 10,619 annotations labeled at the sentence-level from news articles of different ideological leanings. We present a novel generative framework to allow the generation of canonical names for entities as well as stances among them. We further enhance the model with a graph encoder to summarize entity activities and external knowledge surrounding the entities. Experiments show that our model outperforms strong comparisons by large margins. Further analyses demonstrate the usefulness of E2E stance detection for understanding media quotation and stance landscape, as well as inferring entity ideology.


TOE: A Grid-Tagging Discontinuous NER Model Enhanced by Embedding Tag/Word Relations and More Fine-Grained Tags

arXiv.org Artificial Intelligence

So far, discontinuous named entity recognition (NER) has received increasing research attention and many related methods have surged such as hypergraph-based methods, span-based methods, and sequence-to-sequence (Seq2Seq) methods, etc. However, these methods more or less suffer from some problems such as decoding ambiguity and efficiency, which limit their performance. Recently, grid-tagging methods, which benefit from the flexible design of tagging systems and model architectures, have shown superiority to adapt for various information extraction tasks. In this paper, we follow the line of such methods and propose a competitive grid-tagging model for discontinuous NER. We call our model TOE because we incorporate two kinds of Tag-Oriented Enhancement mechanisms into a state-of-the-art (SOTA) grid-tagging model that casts the NER problem into word-word relationship prediction. First, we design a Tag Representation Embedding Module (TREM) to force our model to consider not only word-word relationships but also word-tag and tag-tag relationships. Concretely, we construct tag representations and embed them into TREM, so that TREM can treat tag and word representations as queries/keys/values and utilize self-attention to model their relationships. On the other hand, motivated by the Next-Neighboring-Word (NNW) and Tail-Head-Word (THW) tags in the SOTA model, we add two new symmetric tags, namely Previous-Neighboring-Word (PNW) and Head-Tail-Word (HTW), to model more fine-grained word-word relationships and alleviate error propagation from tag prediction. In the experiments of three benchmark datasets, namely CADEC, ShARe13 and ShARe14, our TOE model pushes the SOTA results by about 0.83%, 0.05% and 0.66% in F1, demonstrating its effectiveness.


UK watchdog warns against AI for emotional analysis, dubs 'immature' biometrics a bias risk

#artificialintelligence

The U.K.'s privacy watchdog has warned against use of so-called "emotion analysis" technologies for anything more serious than kids' party games, saying there's a discrimination risk attached to applying "immature" biometric tech that makes pseudoscientific claims about being able to recognize people's emotions using AI to interpret biometric data inputs. Such AI systems'function', if we can use the word, by claiming to be able to'read the tea leaves' of one or more biometric signals, such as heart rate, eye movements, facial expression, skin moisture, gait tracking, vocal tone etc, and perform emotion detection or sentiment analysis to predict how the person is feeling -- presumably after being trained on a bunch of visual data of faces frowning, faces smiling etc (but you can immediately see the problem with trying to assign individual facial expressions to absolute emotional states -- because no two people, and often no two emotional states, are the same; hence hello pseudoscience!). The watchdog's deputy commissioner, Stephen Bonner, appears to agree that this high tech nonsense must be stopped -- saying today there's no evidence that such technologies do actually work as claimed (or that they will ever work). "Developments in the biometrics and emotion AI market are immature. They may not work yet, or indeed ever," he warned in a statement. "While there are opportunities present, the risks are currently greater.


Applications of SentimentAnalysis part1

#artificialintelligence

This paper shows to what extent machine learning can analyze and structure these databases. An automated data analysis pipeline is deployed to provide insights into user-generated content for researchers in other domains. First, the domain expert can select an image and a term of interest. Then, the pipeline uses image retrieval to find all images showing similar contents and applies aspect-based sentiment analysis to outline users' opinions about the selected term. As part of an interdisciplinary project between architecture and computer science researchers, an empirical study of Hamburg's Elbphilharmonie was conveyed on 300 thousand posts from the platform Flickr with the hashtag'hamburg'. Image retrieval methods generated a subset of slightly more than 1.5 thousand images displaying the Elbphilharmonie. We found that these posts mainly convey a neutral or positive sentiment towards it. With this pipeline, we suggest a new big data analysis method that offers new insights into end-users opinions, e.g., for architecture domain experts.


Design a Sustainable Micro-mobility Future: Trends and Challenges in the United States and European Union Using Natural Language Processing Techniques

arXiv.org Artificial Intelligence

ABSTRACT Micro-mobility is promising to contribute to sustainable cities in the future with its efficiency and low cost. To better design such a sustainable future, it is necessary to understand the trends and challenges. Thus, we examined people's opinions on micro-mobility in the US and the EU using Tweets. We used topic modeling based on advanced natural language processing techniques and categorized the data into seven topics: promotion and service, mobility, technical features, acceptance, recreation, infrastructure and regulations. Furthermore, using sentiment analysis, we investigated people's positive and negative attitudes towards specific aspects of these topics and compared the patterns of the trends and challenges in the US and the EU. We found that 1) promotion and service included the majority of Twitter discussions in the both regions, 2) the EU had more positive opinions than the US, 3) micro-mobility devices were more widely used for utilitarian mobility and recreational purposes in the EU than in the US, and 4) compared to the EU, people in the US had many more concerns related to infrastructure and regulation issues. These findings help us understand the trends and challenges and prioritize different aspects in micro-mobility to improve their safety and experience across the two areas for designing a more sustainable micro-mobility future. INTRODUCTION The growth of transportation has raised the need for compact, flexible, and more sustainable forms of transportation. Recent developments in the micro-mobility industry show that these devices might address this issue and offer people safer and cheaper trips with reduced travel time. According to the Society of Automotive Engineers (SAE) definition (Society of Automotive Engineers, 2019), micro-mobility refers to a range of small, less than 500 pounds (227 kg) lightweight, fully motorized or motor-assisted devices operating at a speed below 30 mph (48 km/h) and ideal for trips up to 10 km. Typical examples include e-bikes, e-scooters, e-unicycles and e-skateboards, and some of them are widely used as personal or shared transportation devices (Price, Blackshear, Blount Jr, & Sandt, 2021). The global micro-mobility market has been increasing over the years. According to the NACTO (National Association of City Transportation Officials, 2020), 136 million trips were generated by shared micro-mobility in 2019 in the U.S., which was 60% more than 2018. Thus, micro-mobility devices can be well integrated into the overall urban design process of smart and sustainable transportation in the near future. With the sustainable design and development goal, we should not only consider technical challenges and requirements (e.g., battery and material), but also complement and constrain the design and development process by social, infrastructural, and political schemes for a sustainable future (Jiao, Luo, Malmqvist, Johan, & Summers, 2022).