AITopics | Information Extraction

Collaborating Authors

Information Extraction

News Overviews Instructional Materials AI-Alerts Classics

Twitter data leak exposes over 5.4 million accounts

EngadgetNov-28-2022, 09:50:40 GMT

Earlier this year, Twitter confirmed that the private user data for 5.4 million users was stolen due to an API vulnerability, but the company said it had "no evidence" that it was exploited. Now, all of those accounts have been exposed on a hacker form, BleepingComputer has reported. On top of that, an additional 1.4 million Twitter profiles for suspended users was reportedly shared privately, and an even larger data dump with the data of "tens of millions" of other users may have come from the same vulnerability. The owner of hacking forum called Breached told BleepingComputer that it was responsible for exploiting the weakness (originally obtained from another hacker called "Devil") and dumping the user records. It said that it also obtained 1.4 million Twitter profiles for suspended accounts, obtained via another API, but only shared those privately among a few individuals.

bleepingcomputer, private phone number, twitter data leak expose, (4 more...)

Engadget

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.40)

Add feedback

AfroLM: A Self-Active Learning-based Multilingual Pretrained Language Model for 23 African Languages

Dossou, Bonaventure F. P., Tonja, Atnafu Lambebo, Yousuf, Oreen, Osei, Salomey, Oppong, Abigail, Shode, Iyanuoluwa, Awoyomi, Oluwabusayo Olufunke, Emezue, Chris Chinenye

arXiv.org Artificial IntelligenceNov-23-2022

In recent years, multilingual pre-trained language models have gained prominence due to their remarkable performance on numerous downstream Natural Language Processing tasks (NLP). However, pre-training these large multilingual language models requires a lot of training data, which is not available for African Languages. Active learning is a semi-supervised learning algorithm, in which a model consistently and dynamically learns to identify the most beneficial samples to train itself on, in order to achieve better optimization and performance on downstream tasks. Furthermore, active learning effectively and practically addresses real-world data scarcity. Despite all its benefits, active learning, in the context of NLP and especially multilingual language models pretraining, has received little consideration. In this paper, we present AfroLM, a multilingual language model pretrained from scratch on 23 African languages (the largest effort to date) using our novel self-active learning framework. Pretrained on a dataset significantly (14x) smaller than existing baselines, AfroLM outperforms many multilingual pretrained language models (AfriBERTa, XLMR-base, mBERT) on various NLP downstream tasks (NER, text classification, and sentiment analysis). Additional out-of-domain sentiment analysis experiments show that \textbf{AfroLM} is able to generalize well across various domains. We release the code source, and our datasets used in our framework at https://github.com/bonaventuredossou/MLM_AL.

alphabet, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2211.03263

Country:

Africa > Niger (0.05)
Africa > Ghana (0.05)
Africa > Nigeria (0.05)
(46 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.48)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.45)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.45)

Add feedback

Incorporating Dynamic Semantics into Pre-Trained Language Model for Aspect-based Sentiment Analysis

Zhang, Kai, Zhang, Kun, Zhang, Mengdi, Zhao, Hongke, Liu, Qi, Wu, Wei, Chen, Enhong

arXiv.org Artificial IntelligenceNov-23-2022

Aspect-based sentiment analysis (ABSA) predicts sentiment polarity towards a specific aspect in the given sentence. While pre-trained language models such as BERT have achieved great success, incorporating dynamic semantic changes into ABSA remains challenging. To this end, in this paper, we propose to address this problem by Dynamic Re-weighting BERT (DR-BERT), a novel method designed to learn dynamic aspect-oriented semantics for ABSA. Specifically, we first take the Stack-BERT layers as a primary encoder to grasp the overall semantic of the sentence and then fine-tune it by incorporating a lightweight Dynamic Re-weighting Adapter (DRA). Note that the DRA can pay close attention to a small region of the sentences at each step and re-weigh the vitally important words for better aspect-aware sentiment understanding. Finally, experimental results on three benchmark datasets demonstrate the effectiveness and the rationality of our proposed model and provide good interpretable insights for future semantic modeling.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2022.findings-acl.285

2203.16369

Country:

North America > United States > New York (0.04)
North America > Canada (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(4 more...)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

EU confirms multiple ongoing investigations into TikTok data practices

EngadgetNov-22-2022, 16:15:35 GMT

The president of the European Commission, the executive branch of the European Union, has confirmed there are multiple ongoing investigations into TikTok. The probes concern the transfer of EU citizens' data to China and targeted advertising aimed at minors. Investigators are seeking to ensure that TikTok meets General Data Protection Regulation ( GDPR) requirements. "The data practices of TikTok, including with respect to international data transfers, are the object of several ongoing proceedings," Ursula von der Leyden wrote in a letter shared by Federal Communications Commissioner Brendan Carr. "This includes an investigation by the Irish [Data Protection Commission] about TikTok's compliance with several GDPR requirements, including as regards data transfers to China and the processing of data of minors, and litigation before the Dutch courts (in particular concerning targeted advertising regarding minors and data transfers to China)."

data practice, multiple ongoing investigation, tiktok, (11 more...)

Engadget

Country:

Europe (0.97)
Asia > China (0.79)
North America > United States (0.55)

Industry:

Information Technology > Services (1.00)
Information Technology > Security & Privacy (1.00)
Government > Regional Government > Europe Government (0.97)
Government > Regional Government > North America Government > United States Government (0.38)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.42)

Add feedback

Automatic extraction of materials and properties from superconductors scientific literature

Foppiano, Luca, de Castro, Pedro Baptista, Suarez, Pedro Ortiz, Terashima, Kensei, Takano, Yoshihiko, Ishii, Masashi

arXiv.org Artificial IntelligenceNov-22-2022

The automatic extraction of materials and related properties from the scientific literature is gaining attention in data-driven materials science (Materials Informatics). In this paper, we discuss Grobid-superconductors, our solution for automatically extracting superconductor material names and respective properties from text. Built as a Grobid module, it combines machine learning and heuristic approaches in a multi-step architecture that supports input data as raw text or PDF documents. Using Grobid-superconductors, we built SuperCon2, a database of 40324 materials and properties records from 37700 papers. The material (or sample) information is represented by name, chemical formula, and material class, and is characterized by shape, doping, substitution variables for components, and substrate as adjoined information. The properties include the Tc superconducting critical temperature and, when available, applied pressure with the Tc measurement method.

information, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1080/27660400.2022.2153633

2210.156

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Netherlands > South Holland > Delft (0.05)
Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.04)
(6 more...)

Genre:

Workflow (0.93)
Research Report (0.64)

Industry: Materials > Chemicals > Industrial Gases (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.70)
(2 more...)

Add feedback

Smart Agriculture : A Novel Multilevel Approach for Agricultural Risk Assessment over Unstructured Data

Najmi, Hasna, Mikram, Mounia, Rhanoui, Maryem, Yousfi, Siham

arXiv.org Artificial IntelligenceNov-22-2022

Detecting opportunities and threats from massive text data is a challenging task for most. Traditionally, companies would rely mainly on structured data to detect and predict risks, losing a huge amount of information that could be extracted from unstructured text data. Fortunately, artificial intelligence came to remedy this issue by innovating in data extraction and processing techniques, allowing us to understand and make use of Natural Language data and turning it into structures that a machine can process and extract insight from. Uncertainty refers to a state of not knowing what will happen in the future. This paper aims to leverage natural language processing and machine learning techniques to model uncertainties and evaluate the risk level in each uncertainty cluster using massive text data.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2211.12515

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Reading (0.04)
(5 more...)

Genre: Research Report (0.82)

Industry:

Government (1.00)
Food & Agriculture > Agriculture (1.00)
Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.96)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.66)

Add feedback

Twitter turmoil and staff exodus aggravate security concerns

The Japan TimesNov-21-2022, 04:38:03 GMT

Washington – Twitter's owner Elon Musk has pledged the platform will not become a "hellscape," but experts fear a staff exodus following mass layoffs may have devastated its ability to combat misinformation, impersonation and data theft. Twitter devolved into what campaigners described as a cesspit of falsehoods and hate speech after recent layoffs cut half the company's 7,500 staff and fake accounts proliferated following its botched rollout of a paid verification system. This could be due to a conflict with your ad-blocking or security software. Please add japantimes.co.jp and piano.io to your list of allowed sites. If this does not resolve the issue or you are unable to add the domains to your allowlist, please see this FAQ.

staff exodus aggravate security concern, twitter turmoil

The Japan Times

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.40)

Add feedback

Unsupervised extraction, labelling and clustering of segments from clinical notes

Zelina, Petr, Halámková, Jana, Nováček, Vít

arXiv.org Artificial IntelligenceNov-21-2022

This work is motivated by the scarcity of tools for accurate, unsupervised information extraction from unstructured clinical notes in computationally underrepresented languages, such as Czech. We introduce a stepping stone to a broad array of downstream tasks such as summarisation or integration of individual patient records, extraction of structured information for national cancer registry reporting or building of semi-structured semantic patient representations for computing patient embeddings. More specifically, we present a method for unsupervised extraction of semantically-labelled textual segments from clinical notes and test it out on a dataset of Czech breast cancer patients, provided by Masaryk Memorial Cancer Institute (the largest Czech hospital specialising in oncology). Our goal was to extract, classify (i.e. label) and cluster segments of the free-text notes that correspond to specific clinical features (e.g., family background, comorbidities or toxicities). The presented results demonstrate the practical relevance of the proposed approach for building more sophisticated extraction and analytical pipelines deployed on Czech clinical notes.

data mining, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/BIBM55620.2022.9995229

2211.11799

Country:

Europe > Czechia > South Moravian Region > Brno (0.05)
Europe > Ireland > Connaught > County Galway > Galway (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

UniMSE: Towards Unified Multimodal Sentiment Analysis and Emotion Recognition

Hu, Guimin, Lin, Ting-En, Zhao, Yi, Lu, Guangming, Wu, Yuchuan, Li, Yongbin

arXiv.org Artificial IntelligenceNov-21-2022

Multimodal sentiment analysis (MSA) and emotion recognition in conversation (ERC) are key research topics for computers to understand human behaviors. From a psychological perspective, emotions are the expression of affect or feelings during a short period, while sentiments are formed and held for a longer period. However, most existing works study sentiment and emotion separately and do not fully exploit the complementary knowledge behind the two. In this paper, we propose a multimodal sentiment knowledge-sharing framework (UniMSE) that unifies MSA and ERC tasks from features, labels, and models. We perform modality fusion at the syntactic and semantic levels and introduce contrastive learning between modalities and samples to better capture the difference and consistency between sentiments and emotions. Experiments on four public benchmark datasets, MOSI, MOSEI, MELD, and IEMOCAP, demonstrate the effectiveness of the proposed method and achieve consistent improvements compared with state-of-the-art methods.

computational linguistic, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2211.11256

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Dominican Republic (0.04)
(15 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (0.85)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Can an AI recognize my opinion from tweets?

#artificialintelligenceNov-20-2022, 06:00:36 GMT

To make a long story short: In principle; yes. And if my colleagues at the University of Edinburgh are to be believed, it even works in cases where an opinion is not explicitly expressed. In fact, the terms "sentiment analysis" or "opinion mining" are nothing new to people who deal with language technology. However, this is not infrequently a marketing ploy: because what sounds like opinion analysis is in fact usually nothing more than a polarity analysis of the feelings that are transported via a text. In other words, it analyzes whether a social media post has positive or negative vibes.

algorithm, training data, tweet, (14 more...)

#artificialintelligence

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.55)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.55)

Add feedback