AITopics | Saggion, Horacio

Collaborating Authors

Saggion, Horacio

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MultiLS-SP/CA: Lexical Complexity Prediction and Lexical Simplification Resources for Catalan and Spanish

Bott, Stefan, Saggion, Horacio, Rojas, Nelson Peréz, Salazar, Martin Solis, Ramirez, Saul Calderon

arXiv.org Artificial IntelligenceApr-11-2024

Automatic lexical simplification is a task to substitute lexical items that may be unfamiliar and difficult to understand with easier and more common words. This paper presents MultiLS-SP/CA, a novel dataset for lexical simplification in Spanish and Catalan. This dataset represents the first of its kind in Catalan and a substantial addition to the sparse data on automatic lexical simplification which is available for Spanish. Specifically, MultiLS-SP is the first dataset for Spanish which includes scalar ratings of the understanding difficulty of lexical items. In addition, we describe experiments with this dataset, which can serve as a baseline for future work on the same data.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2404.07814

Country:

Asia (0.67)
Europe > Spain (0.46)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.40)

Industry:

Government (0.67)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Novel Dataset for Financial Education Text Simplification in Spanish

Perez-Rojas, Nelson, Calderon-Ramirez, Saul, Solis-Salazar, Martin, Romero-Sandoval, Mario, Arias-Monge, Monica, Saggion, Horacio

arXiv.org Artificial IntelligenceDec-15-2023

Text simplification, crucial in natural language processing, aims to make texts more comprehensible, particularly for specific groups like visually impaired Spanish speakers, a less-represented language in this field. In Spanish, there are few datasets that can be used to create text simplification systems. Our research has the primary objective to develop a Spanish financial text simplification dataset. We created a dataset with 5,314 complex and simplified sentence pairs using established simplification rules. We also compared our dataset with the simplifications generated from GPT-3, Tuner, and MT5, in order to evaluate the feasibility of data augmentation using these systems. In this manuscript we present the characteristics of our dataset and the findings of the comparisons with other systems. The dataset is available at Hugging face, saul1917/FEINA.

large language model, machine learning, simplification, (21 more...)

arXiv.org Artificial Intelligence

2312.09897

Country: Europe > Spain (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Creating a silver standard for patent simplification

Casola, Silvia, Lavelli, Alberto, Saggion, Horacio

arXiv.org Artificial IntelligenceOct-24-2023

Patents are legal documents that aim at protecting inventions on the one hand and at making technical knowledge circulate on the other. Their complex style -- a mix of legal, technical, and extremely vague language -- makes their content hard to access for humans and machines and poses substantial challenges to the information retrieval community. This paper proposes an approach to automatically simplify patent text through rephrasing. Since no in-domain parallel simplification data exist, we propose a method to automatically generate a large-scale silver standard for patent sentences. To obtain candidates, we use a general-domain paraphrasing system; however, the process is error-prone and difficult to control. Thus, we pair it with proper filters and construct a cleaner corpus that can successfully be used to train a simplification system. Human evaluation of the synthetic silver corpus shows that it is considered grammatical, adequate, and contains simple sentences.

machine learning, natural language, simplification, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3539618.3591657

2310.15689

Country:

Europe (1.00)
Asia (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Law > Intellectual Property & Technology Law (0.69)
Materials > Chemicals > Commodity Chemicals (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)

Add feedback

Verifying the Robustness of Automatic Credibility Assessment

Przybyła, Piotr, Shvets, Alexander, Saggion, Horacio

arXiv.org Artificial IntelligenceAug-11-2023

Text classification methods have been widely investigated as a way to detect content of low credibility: fake news, social media bots, propaganda, etc. Quite accurate models (likely based on deep neural networks) help in moderating public electronic platforms and often cause content creators to face rejection of their submissions or removal of already published texts. Having the incentive to evade further detection, content creators try to come up with a slightly modified version of the text (known as an attack with an adversarial example) that exploit the weaknesses of classifiers and result in a different output. Here we systematically test the robustness of popular text classifiers against available attacking techniques and discover that, indeed, in some cases insignificant changes in input text can mislead the models. We also introduce BODEGA: a benchmark for testing both victim models and attack methods on four misinformation detection tasks in an evaluation framework designed to simulate real use-cases of content moderation. Finally, we manually analyse a subset adversarial examples and check what kinds of modifications are used in successful attacks. The BODEGA code and data is openly shared in hope of enhancing the comparability and replicability of further research in this area

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2303.08032

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.50)

Industry:

Media > News (1.00)
Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multilingual Controllable Transformer-Based Lexical Simplification

Sheang, Kim Cheng, Saggion, Horacio

arXiv.org Artificial IntelligenceJul-5-2023

Text is by far the most ubiquitous source of knowledge and information and should be made easily accessible to as many people as possible; however, texts often contain complex words that hinder reading comprehension and accessibility. Therefore, suggesting simpler alternatives for complex words without compromising meaning would help convey the information to a broader audience. This paper proposes mTLS, a multilingual controllable Transformer-based Lexical Simplification (LS) system fined-tuned with the T5 model. The novelty of this work lies in the use of language-specific prefixes, control tokens, and candidates extracted from pre-trained masked language models to learn simpler alternatives for complex words. The evaluation results on three well-known LS datasets -- LexMTurk, BenchLS, and NNSEval -- show that our model outperforms the previous state-of-the-art models like LSBert and ConLS. Moreover, further evaluation of our approach on the part of the recent TSAR-2022 multilingual LS shared-task dataset shows that our model performs competitively when compared with the participating systems for English LS and even outperforms the GPT-3 model on several metrics. Moreover, our model obtains performance gains also for Spanish and Portuguese.

machine learning, natural language, simplification, (16 more...)

arXiv.org Artificial Intelligence

2307.0212

Country:

Asia > Middle East > UAE (0.16)
North America > United States > Oregon (0.14)
North America > United States > Minnesota (0.14)
(4 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Controllable Lexical Simplification for English

Sheang, Kim Cheng, Ferrés, Daniel, Saggion, Horacio

arXiv.org Artificial IntelligenceFeb-6-2023

Fine-tuning Transformer-based approaches have recently shown exciting results on sentence simplification task. However, so far, no research has applied similar approaches to the Lexical Simplification (LS) task. In this paper, we present ConLS, a Controllable Lexical Simplification system fine-tuned with T5 (a Transformer-based model pre-trained with a BERT-style approach and several other tasks). The evaluation results on three datasets (LexMTurk, BenchLS, and NNSeval) have shown that our model performs comparable to LSBert (the current state-of-the-art) and even outperforms it in some cases. We also conducted a detailed comparison on the effectiveness of control tokens to give a clear view of how each token contributes to the model.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2302.029

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Findings of the TSAR-2022 Shared Task on Multilingual Lexical Simplification

Saggion, Horacio, Štajner, Sanja, Ferrés, Daniel, Sheang, Kim Cheng, Shardlow, Matthew, North, Kai, Zampieri, Marcos

arXiv.org Artificial IntelligenceFeb-6-2023

We report findings of the TSAR-2022 shared task on multilingual lexical simplification, organized as part of the Workshop on Text Simplification, Accessibility, and Readability TSAR-2022 held in conjunction with EMNLP 2022. The task called the Natural Language Processing research community to contribute with methods to advance the state of the art in multilingual lexical simplification for English, Portuguese, and Spanish. A total of 14 teams submitted the results of their lexical simplification systems for the provided test data. Results of the shared task indicate new benchmarks in Lexical Simplification with English lexical simplification quantitative results noticeably higher than those obtained for Spanish and (Brazilian) Portuguese.

artificial intelligence, natural language, simplification, (17 more...)

arXiv.org Artificial Intelligence

2302.02888

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)
Information Technology > Communications > Social Media > Crowdsourcing (0.46)

Add feedback

ExTaSem! Extending, Taxonomizing and Semantifying Domain Terminologies

Espinosa-Anke, Luis (Universitat Pompeu Fabra) | Saggion, Horacio (Universitat Pompeu Fabra) | Ronzano, Francesco (Universitat Pompeu Fabra) | Navigli, Roberto (Sapienza University of Rome)

AAAI ConferencesApr-19-2016

We introduce ExTaSem!, a novel approach for the automatic learning of lexical taxonomies from domain terminologies. First, we exploit a very large semantic network to collect housands of in-domain textual definitions. Second, we extract (hyponym, hypernym) pairs from each definition with a CRF-based algorithm trained on manually-validated data. Finally, we introduce a graph induction procedure which constructs a full-fledged taxonomy where each edge is weighted according to its domain pertinence. ExTaSem! achieves state-of-the-art results in the following taxonomy evaluation experiments: (1) Hypernym discovery, (2) Reconstructing gold standard taxonomies, and (3) Taxonomy quality according to structural measures. We release weighted taxonomies for six domains for the use and scrutiny of the community.

artificial intelligence, taxonomy, text processing, (19 more...)

AAAI Conferences

Thirtieth AAAI Conference on Artificial Intelligence

Country:

Europe > Spain (0.28)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Qatar (0.14)

Genre: Research Report (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Do We Criticise (and Laugh) in the Same Way? Automatic Detection of Multi-Lingual Satirical News in Twitter

Barbieri, Francesco (Universitat Pompeu Fabra) | Ronzano, Francesco (Universitat Pompeu Fabra) | Saggion, Horacio (Universitat Pompeu Fabra)

AAAI ConferencesJul-15-2015

During the last few years, the investigation of methodologies to automatically detect and characterise the figurative traits of textual contents has attracted a growing interest. Indeed, the capability to correctly deal with figurative language and more specifically with satire is fundamental to build robust approaches in several sub-fields of Artificial Intelligence including Sentiment Analysis and Affective Computing. In this paper we investigate the automatic detection of Tweets that advertise satirical news in English, Spanish and Italian. To this purpose we present a system that models Tweets from different languages by a set of language independent features that describe lexical, semantic and usage-related properties of the words of each Tweet. We approach the satire identification problem as binary classification of Tweets as satirical or not satirical messages. We test the performance of our system by performing experiments of both monolingual and cross-language classifications, evaluating the satire detection effectiveness of our features.Our system outperforms a word-based baseline and it is able to recognise if a news in Twitter is satirical or not with good accuracy. Moreover, we analyse the behaviour of the system across the different languages, obtaining interesting results.

social media, survey article, tweet, (20 more...)

AAAI Conferences

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country: Europe > Spain > Catalonia (0.28)

Genre: Research Report (0.46)

Industry: Information Technology > Services (0.49)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)

Add feedback