AITopics | Information Extraction

Collaborating Authors

Information Extraction

News Overviews Instructional Materials AI-Alerts Classics

A Dataset and Strong Baselines for Classification of Czech News Texts

arXiv.org Artificial IntelligenceJul-20-2023

Pre-trained models for Czech Natural Language Processing are often evaluated on purely linguistic tasks (POS tagging, parsing, NER) and relatively simple classification tasks such as sentiment classification or article classification from a single news source. As an alternative, we present CZEch~NEws~Classification~dataset (CZE-NEC), one of the largest Czech classification datasets, composed of news articles from various sources spanning over twenty years, which allows a more rigorous evaluation of such models. We define four classification tasks: news source, news category, inferred author's gender, and day of the week. To verify the task difficulty, we conducted a human evaluation, which revealed that human performance lags behind strong machine-learning baselines built upon pre-trained transformer models. Furthermore, we show that language-specific pre-trained encoder analysis outperforms selected commercially available large-scale generative language models.

machine learning, natural language, text classification, (14 more...)

arXiv.org Artificial Intelligence

2307.10666

Country:

Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(6 more...)

Genre: Research Report (0.66)

Industry: Media > News (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.49)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.35)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.35)

Add feedback

Towards Robust Aspect-based Sentiment Analysis through Non-counterfactual Augmentations

Liu, Xinyu, Ding, Yan, An, Kaikai, Xiao, Chunyang, Madhyastha, Pranava, Xiao, Tong, Zhu, Jingbo

arXiv.org Artificial IntelligenceJul-20-2023

While state-of-the-art NLP models have demonstrated excellent performance for aspect based sentiment analysis (ABSA), substantial evidence has been presented on their lack of robustness. This is especially manifested as significant degradation in performance when faced with out-of-distribution data. Recent solutions that rely on counterfactually augmented datasets show promising results, but they are inherently limited because of the lack of access to explicit causal structure. In this paper, we present an alternative approach that relies on non-counterfactual data augmentation. Our proposal instead relies on using noisy, cost-efficient data augmentations that preserve semantics associated with the target aspect. Our approach then relies on modelling invariances between different versions of the data to improve robustness. A comprehensive suite of experiments shows that our proposal significantly improves upon strong pre-trained baselines on both standard and robustness-specific datasets. Our approach further establishes a new state-of-the-art on the ABSA robustness benchmark and transfers well across domains.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2306.13971

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China > Hong Kong (0.04)
(10 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.86)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Automated Ableism: An Exploration of Explicit Disability Biases in Sentiment and Toxicity Analysis Models

Venkit, Pranav Narayanan, Srinath, Mukund, Wilson, Shomir

arXiv.org Artificial IntelligenceJul-18-2023

We analyze sentiment analysis and toxicity detection models to detect the presence of explicit bias against people with disability (PWD). We employ the bias identification framework of Perturbation Sensitivity Analysis to examine conversations related to PWD on social media platforms, specifically Twitter and Reddit, in order to gain insight into how disability bias is disseminated in real-world social settings. We then create the \textit{Bias Identification Test in Sentiment} (BITS) corpus to quantify explicit disability bias in any sentiment analysis and toxicity detection models. Our study utilizes BITS to uncover significant biases in four open AIaaS (AI as a Service) sentiment analysis tools, namely TextBlob, VADER, Google Cloud Natural Language API, DistilBERT and two toxicity detection models, namely two versions of Toxic-BERT. Our findings indicate that all of these models exhibit statistically significant explicit bias against PWD.

artificial intelligence, disability bias, natural language, (17 more...)

arXiv.org Artificial Intelligence

2307.09209

Country:

North America > United States > Pennsylvania > Centre County > University Park (0.04)
North America > Dominican Republic (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.68)
Law > Civil Rights & Constitutional Law (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Better Handling Coreference Resolution in Aspect Level Sentiment Classification by Fine-Tuning Language Models

Mullick, Dhruv, Ghanem, Bilal, Fyshe, Alona

arXiv.org Artificial IntelligenceJul-11-2023

Customer feedback is invaluable to companies as they refine their products. Monitoring customer feedback can be automated with Aspect Level Sentiment Classification (ALSC) which allows us to analyse specific aspects of the products in reviews. Large Language Models (LLMs) are the heart of many state-of-the-art ALSC solutions, but they perform poorly in some scenarios requiring Coreference Resolution (CR). In this work, we propose a framework to improve an LLM's performance on CR-containing reviews by fine tuning on highly inferential tasks. We show that the performance improvement is likely attributed to the improved model CR ability. We also release a new dataset that focuses on CR in ALSC.

computational linguistic, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2307.05646

Country:

North America > Canada > Alberta (0.15)
Asia > China > Hong Kong (0.05)
Europe > Italy > Tuscany > Florence (0.04)
(9 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.87)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.87)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.78)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.61)

Add feedback

AI Ethics on Blockchain: Topic Analysis on Twitter Data for Blockchain Security

Fu, Yihang, Zhuang, Zesen, Zhang, Luyao

arXiv.org Artificial IntelligenceJul-8-2023

Blockchain has empowered computer systems to be more secure using a distributed network. However, the current blockchain design suffers from fairness issues in transaction ordering. Miners are able to reorder transactions to generate profits, the so-called miner extractable value (MEV). Existing research recognizes MEV as a severe security issue and proposes potential solutions, including prominent Flashbots. However, previous studies have mostly analyzed blockchain data, which might not capture the impacts of MEV in a much broader AI society. Thus, in this research, we applied natural language processing (NLP) methods to comprehensively analyze topics in tweets on MEV. We collected more than 20000 tweets with #MEV and #Flashbots hashtags and analyzed their topics. Our results show that the tweets discussed profound topics of ethical concern, including security, equity, emotional sentiments, and the desire for solutions to MEV. We also identify the co-movements of MEV activities on blockchain and social media platforms. Our study contributes to the literature at the interface of blockchain security, MEV solutions, and AI ethics.

artificial intelligence, natural language, text processing, (16 more...)

arXiv.org Artificial Intelligence

2212.06951

Country:

Asia > China (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Services (1.00)
Banking & Finance > Trading (1.00)

Technology:

Information Technology > e-Commerce > Financial Technology (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.64)
(2 more...)

Add feedback

Sentiment analysis and opinion mining on E-commerce site

Anny, Fatema Tuz Zohra, Islam, Oahidul

arXiv.org Artificial IntelligenceJul-4-2023

Sentiment analysis or opinion mining help to illustrate the phrase NLP (Natural Language Processing). Sentiment analysis has been the most significant topic in recent years. The goal of this study is to solve the sentiment polarity classification challenges in sentiment analysis. A broad technique for categorizing sentiment opposition is presented, along with comprehensive process explanations. With the results of the analysis, both sentence-level classification and review-level categorization are conducted. Finally, we discuss our plans for future sentiment analysis research.

artificial intelligence, natural language, sentiment analysis and opinion mining, (1 more...)

arXiv.org Artificial Intelligence

2211.15536

Genre: Research Report (0.66)

Industry: Information Technology > Services > e-Commerce Services (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Multimodal Sentiment Analysis: A Survey

Lai, Songning, Hu, Xifeng, Xu, Haoxuan, Ren, Zhaoxia, Liu, Zhi

arXiv.org Artificial IntelligenceJul-3-2023

Multimodal sentiment analysis has become an important research area in the field of artificial intelligence. With the latest advances in deep learning, this technology has reached new heights. It has great potential for both application and research, making it a popular research topic. This review provides an overview of the definition, background, and development of multimodal sentiment analysis. It also covers recent datasets and advanced models, emphasizing the challenges and future prospects of this technology. Finally, it looks ahead to future research directions. It should be noted that this review provides constructive suggestions for promising research directions and building better performing multimodal sentiment analysis models, which can help researchers in this field.

machine learning, natural language, sentiment analysis, (18 more...)

arXiv.org Artificial Intelligence

2305.07611

Country:

Asia > China > Shandong Province > Qingdao (0.04)
Asia > China > Chongqing Province > Chongqing (0.04)
Africa > Eswatini > Manzini > Manzini (0.04)

Genre: Overview (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CollabKG: A Learnable Human-Machine-Cooperative Information Extraction Toolkit for (Event) Knowledge Graph Construction

Wei, Xiang, Chen, Yufeng, Cheng, Ning, Cui, Xingyu, Xu, Jinan, Han, Wenjuan

arXiv.org Artificial IntelligenceJul-3-2023

In order to construct or extend entity-centric and event-centric knowledge graphs (KG and EKG), the information extraction (IE) annotation toolkit is essential. However, existing IE toolkits have several non-trivial problems, such as not supporting multi-tasks, not supporting automatic updates. In this work, we present CollabKG, a learnable human-machine-cooperative IE toolkit for KG and EKG construction. Specifically, for the multi-task issue, CollabKG unifies different IE subtasks, including named entity recognition (NER), entity-relation triple extraction (RE), and event extraction (EE), and supports both KG and EKG. Then, combining advanced prompting-based IE technology, the human-machine-cooperation mechanism with LLMs as the assistant machine is presented which can provide a lower cost as well as a higher performance. Lastly, owing to the two-way interaction between the human and machine, CollabKG with learning ability allows self-renewal. Besides, CollabKG has several appealing features (e.g., customization, training-free, propagation, etc.) that make the system powerful, easy-to-use, and high-productivity. We holistically compare our toolkit with other existing tools on these features. Human evaluation quantitatively illustrates that CollabKG significantly improves annotation quality, efficiency, and stability simultaneously.

computational linguistic, data mining, natural language, (19 more...)

arXiv.org Artificial Intelligence

2307.00769

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Beijing > Beijing (0.05)
Asia > China > Hong Kong (0.04)
(9 more...)

Genre:

Research Report > Experimental Study (0.47)
Research Report > New Finding (0.46)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.72)
Information Technology > Data Science > Data Mining > Text Mining (0.61)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.61)

Add feedback

weighted CapsuleNet networks for Persian multi-domain sentiment analysis

Kobari, Mahboobeh Sadat, Karimi, Nima, Pourhosseini, Benyamin, Mousa, Ramin

arXiv.org Artificial IntelligenceJul-1-2023

Sentiment classification is a fundamental task in natural language processing, assigning one of the three classes, positive, negative, or neutral, to free texts. However, sentiment classification models are highly domain dependent; the classifier may perform classification with reasonable accuracy in one domain but not in another due to the Semantic multiplicity of words getting poor accuracy. This article presents a new Persian/Arabic multi-domain sentiment analysis method using the cumulative weighted capsule networks approach. Weighted capsule ensemble consists of training separate capsule networks for each domain and a weighting measure called domain belonging degree (DBD). This criterion consists of TF and IDF, which calculates the dependency of each document for each domain separately; this value is multiplied by the possible output that each capsule creates. In the end, the sum of these multiplications is the title of the final output, and is used to determine the polarity. And the most dependent domain is considered the final output for each domain. The proposed method was evaluated using the Digikala dataset and obtained acceptable accuracy compared to the existing approaches. It achieved an accuracy of 0.89 on detecting the domain of belonging and 0.99 on detecting the polarity. Also, for the problem of dealing with unbalanced classes, a cost-sensitive function was used. This function was able to achieve 0.0162 improvements in accuracy for sentiment classification. This approach on Amazon Arabic data can achieve 0.9695 accuracies in domain classification.

classification, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2306.17068

Country:

Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
North America > United States (0.04)
Asia > Middle East > Iran > Zanjan Province > Zanjan (0.04)
Asia > Middle East > Iran > Yazd Province > Yazd (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
(2 more...)

Add feedback

Truth Discovery in Sequence Labels from Crowds

Sabetpour, Nasim, Kulkarni, Adithya, Xie, Sihong, Li, Qi

arXiv.org Artificial IntelligenceJul-1-2023

Annotation quality and quantity positively affect the learning performance of sequence labeling, a vital task in Natural Language Processing. Hiring domain experts to annotate a corpus is very costly in terms of money and time. Crowdsourcing platforms, such as Amazon Mechanical Turk (AMT), have been deployed to assist in this purpose. However, the annotations collected this way are prone to human errors due to the lack of expertise of the crowd workers. Existing literature in annotation aggregation assumes that annotations are independent and thus faces challenges when handling the sequential label aggregation tasks with complex dependencies. To conquer the challenges, we propose an optimization-based method that infers the ground truth labels using annotations provided by workers for sequential labeling tasks. The proposed Aggregation method for Sequential Labels from Crowds ($AggSLC$) jointly considers the characteristics of sequential labeling tasks, workers' reliabilities, and advanced machine learning techniques. Theoretical analysis on the algorithm's convergence further demonstrates that the proposed $AggSLC$ halts after a finite number of iterations. We evaluate $AggSLC$ on different crowdsourced datasets for Named Entity Recognition (NER) tasks and Information Extraction tasks in biomedical (PICO), as well as a simulated dataset. Our results show that the proposed method outperforms the state-of-the-art aggregation methods. To achieve insights into the framework, we study the effectiveness of $AggSLC$'s components through ablation studies.

annotation, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2109.0447

Country:

North America > United States > Iowa > Story County > Ames (0.04)
North America > United States > Pennsylvania > Northampton County > Bethlehem (0.04)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.86)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (0.89)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.89)
(4 more...)

Add feedback