AITopics | Information Extraction

Collaborating Authors

Information Extraction

News Overviews Instructional Materials AI-Alerts Classics

Classifying COVID-19 Related Tweets for Fake News Detection and Sentiment Analysis with BERT-based Models

Bounaama, Rabia, Abderrahim, Mohammed El Amine

arXiv.org Artificial IntelligenceApr-2-2023

The present paper is about the participation of our team "techno" on CERIST'22 shared tasks. We used an available dataset "task1.c" related to covid-19 pandemic. It comprises 4128 tweets for sentiment analysis task and 8661 tweets for fake news detection task. We used natural language processing tools with the combination of the most renowned pre-trained language models BERT (Bidirectional Encoder Representations from Transformers). The results shows the efficacy of pre-trained language models as we attained an accuracy of 0.93 for the sentiment analysis task and 0.90 for the fake news detection task.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2304.00636

Country:

Asia > Middle East > Saudi Arabia (0.16)
Africa > Middle East > Algeria > Tlemcen Province > Tlemcen (0.05)
North America > United States > California (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.93)
Health & Medicine > Therapeutic Area > Immunology (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

An Information Extraction Study: Take In Mind the Tokenization!

Theodoropoulos, Christos, Moens, Marie-Francine

arXiv.org Artificial IntelligenceApr-1-2023

Current research on the advantages and trade-offs of using characters, instead of tokenized text, as input for deep learning models, has evolved substantially. New token-free models remove the traditional tokenization step; however, their efficiency remains unclear. Moreover, the effect of tokenization is relatively unexplored in sequence tagging tasks. To this end, we investigate the impact of tokenization when extracting information from documents and present a comparative study and analysis of subword-based and character-based models. Specifically, we study Information Extraction (IE) from biomedical texts. The main outcome is twofold: tokenization patterns can introduce inductive bias that results in state-of-the-art performance, and the character-based models produce promising results; thus, transitioning to token-free IE models is feasible.

information extraction study, tokenization

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-39965-7_49

2303.151

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining > Text Mining (0.60)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Add feedback

Energy-hungry TikTok data centre harming our Ukraine ammunition production plans, CEO says

The GuardianMar-28-2023, 03:40:40 GMT

One of Europe's largest ammunition manufacturers has said efforts to meet surging demand from the war in Ukraine have been stymied by a new TikTok data centre that is monopolising electricity in the region close to its biggest factory. The chief executive of Nammo, which is co-owned by the Norwegian government, said a planned expansion of its largest factory in central Norway hit a roadblock due to a lack of surplus energy, with the construction of TikTok's new data centre using up electricity in the local area. "We are concerned because we see our future growth is challenged by the storage of cat videos," Morten Brandtzæg told the Financial Times. Demand for artillery rounds is 15 times higher than normal and Europe's munitions industry needs to invest €2bn in new factories to keep up with Ukraine's needs, according to Brandtzæg. By some estimates, Ukraine is firing 6,000 to 7,000 artillery shells a day and is facing ammunition shortages after more than a year of war.

data centre, tiktok, ukraine ammunition production plan, (13 more...)

The Guardian

Country:

Europe > Ukraine (1.00)
Asia > China (0.55)
North America > United States (0.17)
Europe > Norway > Eastern Norway > Innlandet > Hamar (0.06)

Genre: Press Release (0.34)

Industry:

Information Technology > Services (1.00)
Government > Regional Government > Europe Government (1.00)
Government > Regional Government > Asia Government > China Government (0.32)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.62)

Add feedback

Sejarah dan Perkembangan Teknik Natural Language Processing (NLP) Bahasa Indonesia: Tinjauan tentang sejarah, perkembangan teknologi, dan aplikasi NLP dalam bahasa Indonesia

Amien, Mukhlis

arXiv.org Artificial IntelligenceMar-27-2023

This study provides an overview of the history of the development of Natural Language Processing (NLP) in the context of the Indonesian language, with a focus on the basic technologies, methods, and practical applications that have been developed. This review covers developments in basic NLP technologies such as stemming, part-of-speech tagging, and related methods; practical applications in cross-language information retrieval systems, information extraction, and sentiment analysis; and methods and techniques used in Indonesian language NLP research, such as machine learning, statistics-based machine translation, and conflict-based approaches. This study also explores the application of NLP in Indonesian language industry and research and identifies challenges and opportunities in Indonesian language NLP research and development. Recommendations for future Indonesian language NLP research and development include developing more efficient methods and technologies, expanding NLP applications, increasing sustainability, further research into the potential of NLP, and promoting interdisciplinary collaboration. It is hoped that this review will help researchers, practitioners, and the government to understand the development of Indonesian language NLP and identify opportunities for further research and development. Designing an indonesian part of speech tagset and manually tagged indonesian corpus.

information retrieval, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2304.02746

Country:

Asia > Indonesia (1.00)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > Vietnam > Hanoi > Hanoi (0.04)

Genre:

Overview (1.00)
Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)

Add feedback

Evaluating the Role of Target Arguments in Rumour Stance Classification

Li, Yue, Scarton, Carolina

arXiv.org Artificial IntelligenceMar-22-2023

Considering a conversation thread, stance classification aims to identify the opinion (e.g. agree or disagree) of replies towards a given target. The target of the stance is expected to be an essential component in this task, being one of the main factors that make it different from sentiment analysis. However, a recent study shows that a target-oblivious model outperforms target-aware models, suggesting that targets are not useful when predicting stance. This paper re-examines this phenomenon for rumour stance classification (RSC) on social media, where a target is a rumour story implied by the source tweet in the conversation. We propose adversarial attacks in the test data, aiming to assess the models robustness and evaluate the role of the data in the models performance. Results show that state-of-the-art models, including approaches that use the entire conversation thread, overly relying on superficial signals. Our hypothesis is that the naturally high occurrence of target-independent direct replies in RSC (e.g. "this is fake" or just "fake") results in the impressive performance of target-oblivious models, highlighting the risk of target instances being treated as noise during training.

computational linguistic, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2303.12665

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.15)
North America > United States > California > San Diego County > San Diego (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(6 more...)

Genre: Research Report > New Finding (0.86)

Industry:

Information Technology (0.49)
Media > News (0.46)
Government (0.35)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.30)

Add feedback

Sentiment Analysis With BigQuery ML - Liwaiwai

#artificialintelligenceMar-17-2023, 22:05:51 GMT

We recently announced BigQuery support for sparse features which help users to store and process the sparse features efficiently while working with them. That functionality enables users to represent sparse tensors and train machine learning models directly in the BigQuery environment. Being able to represent sparse tensors is a useful feature because sparse tensors are used extensively in encoding schemes like TF-IDF as part of data pre-processing in NLP applications and for pre-processing images with a lot of dark pixels in computer vision applications. There are numerous applications of sparse features such as text generation and sentiment analysis. In this blog, we'll demonstrate how to perform sentiment analysis with the space features in BigQuery ML by training and inferencing machine learning models using a public dataset.

dataset, sentiment analysis, sparse feature, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.86)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.86)

Add feedback

Tribe or Not? Critical Inspection of Group Differences Using TribalGram

Ahn, Yongsu, Yan, Muheng, Lin, Yu-Ru, Chung, Wen-Ting, Hwa, Rebecca

arXiv.org Artificial IntelligenceMar-16-2023

With the rise of big data, artificial intelligence (AI), and data mining techniques, group analysis has increasingly become a powerful tool in many applications, ranging from policy-making, direct marketing, education, to healthcare. For example, an important analysis strategy is group profiling, which extracts and describes the characteristics of groups of people [40]; it has been commonly used for customized recommendations to overcome sparse and missing personal data [25]. The same strategy is also used for mining social media, educational, and healthcare data to understand the shared characteristics of online communities or student/patient cohorts [15, 51, 100]. While it may help to support public and private services or product creations that are better tailored to different communities, group profiles resulted from mathematical inference are typically not valid for every individual regarded as a member in the group (this is known as non-distributive group profiles) [40]. The shared group characteristics extracted from data can have social ramifications such as stereotyping, stigmatization, or lead to pernicious consequences in decision making because individuals might be judged by group characteristics they do not posses [24, 56, 58].

data mining, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/1122445.1122456

2303.09664

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
(5 more...)

Genre:

Overview (0.93)
Personal > Interview (0.68)
Research Report > New Finding (0.67)
Research Report > Experimental Study (0.67)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Government (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
(5 more...)

Add feedback

Tollywood Emotions: Annotation of Valence-Arousal in Telugu Song Lyrics

Shanker, R Guru Ravi, Gupta, B Manikanta, Koushik, BV, Alluri, Vinoo

arXiv.org Artificial IntelligenceMar-16-2023

Emotion recognition from a given music track has heavily relied on acoustic features, social tags, and metadata but is seldom focused on lyrics. There are no datasets of Indian language songs that contain both valence and arousal manual ratings of lyrics. We present a new manually annotated dataset of Telugu songs' lyrics collected from Spotify with valence and arousal annotated on a discrete scale. A fairly high inter-annotator agreement was observed for both valence and arousal. Subsequently, we create two music emotion recognition models by using two classification techniques to identify valence, arousal and respective emotion quadrant from lyrics. Support vector machine (SVM) with term frequency-inverse document frequency (TF-IDF) features and fine-tuning the pre-trained XLMRoBERTa (XLM-R) model were used for valence, arousal and quadrant classification tasks. Fine-tuned XLMRoBERTa performs better than the SVM by improving macro-averaged F1-scores of 54.69%, 67.61%, 34.13% to 77.90%, 80.71% and 58.33% for valence, arousal and quadrant classifications, respectively, on 10-fold cross-validation. In addition, we compare our lyrics annotations with Spotify's annotations of valence and energy (same as arousal), which are based on entire music tracks. The implications of our findings are discussed. Finally, we make the dataset publicly available with lyrics, annotations and Spotify IDs.

lyric, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2303.09364

Country:

Asia > India (0.05)
North America > United States > Oregon (0.04)
Europe > Finland (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Emotion (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.49)
(2 more...)

Add feedback

What is sentiment analysis? Using NLP and ML to extract meaning

#artificialintelligenceMar-15-2023, 15:55:38 GMT

Sentiment analysis is analytical technique that uses statistics, natural language processing, and machine learning to determine the emotional meaning of communications. Companies use sentiment analysis to evaluate customer messages, call center interactions, online reviews, social media posts, and other content. Sentiment analysis can track changes in attitudes towards companies, products, or services, or individual features of those products or services. One of the most prominent examples of sentiment analysis on the Web today is the Hedonometer, a project of the University of Vermont's Computational Story Lab. The group analyzes more than 50 million English-language tweets every single day, about a tenth of Twitter's total traffic, to calculate a daily happiness store.

extract, nlp and ml, sentiment analysis, (1 more...)

#artificialintelligence

Country: North America > United States > Vermont (0.28)

Industry: Information Technology > Services (0.62)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Cross-domain Sentiment Classification in Spanish

Estienne, Lautaro, Vera, Matias, Vega, Leonardo Rey

arXiv.org Artificial IntelligenceMar-15-2023

Sentiment Classification is a fundamental task in the field of Natural Language Processing, and has very important academic and commercial applications. It aims to automatically predict the degree of sentiment present in a text that contains opinions and subjectivity at some level, like product and movie reviews, or tweets. This can be really difficult to accomplish, in part, because different domains of text contains different words and expressions. In addition, this difficulty increases when text is written in a non-English language due to the lack of databases and resources. As a consequence, several cross-domain and cross-language techniques are often applied to this task in order to improve the results. In this work we perform a study on the ability of a classification system trained with a large database of product reviews to generalize to different Spanish domains. Reviews were collected from the MercadoLibre website from seven Latin American countries, allowing the creation of a large and balanced dataset. Results suggest that generalization across domains is feasible though very challenging when trained with these product reviews, and can be improved by pre-training and fine-tuning the classification model.

machine learning, natural language, text classification, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ARGENCON55245.2022.9940056

2303.08985

Country:

South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.05)
South America > Venezuela (0.04)
South America > Uruguay (0.04)
(11 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology (0.66)
Leisure & Entertainment (0.48)
Media > Film (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.88)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.88)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.72)

Add feedback