AITopics

Automatic text simplification (TS) aims to automate the process of rewriting text to make it easier for people to read. A pre-requisite for TS to be useful is that it should convey information that is consistent with the meaning of the original text. However, current TS evaluation protocols assess system outputs for simplicity and meaning preservation without regard for the document context in which output sentences occur and for how people understand them. In this work, we introduce a human evaluation framework to assess whether simplified texts preserve meaning using reading comprehension questions. With this framework, we conduct a thorough human evaluation of texts by humans and by nine automatic systems. Supervised systems that leverage pre-training knowledge achieve the highest scores on the reading comprehension (RC) tasks amongst the automatic controllable TS systems. However, even the best-performing supervised system struggles with at least 14% of the questions, marking them as "unanswerable'' based on simplified content. We further investigate how existing TS evaluation metrics and automatic question-answering systems approximate the human judgments we obtained.

computational linguistic, proceedings, simplification, (14 more...)

2312.10126

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Maryland (0.04)
(21 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Industry:

Education > Educational Setting (1.00)
Education > Assessment & Standards > Student Performance (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)
(2 more...)

Schomacker, Thorben, Gille, Michael, von der Hülls, Jörg, Tropmann-Frick, Marina

Data and Approaches for German Text simplification -- towards an Accessibility-enhanced Communication

This paper examines the current state-of-the-art of German text simplification, focusing on parallel and monolingual German corpora. It reviews neural language models for simplifying German texts and assesses their suitability for legal texts and accessibility requirements. Our findings highlight the need for additional training data and more appropriate approaches that consider the specific linguistic characteristics of German, as well as the importance of the needs and preferences of target groups with cognitive or language impairments. The authors launched the interdisciplinary OPEN-LS project in April 2023 to address these research gaps. The project aims to develop a framework for text formats tailored to individuals with low literacy levels, integrate legal texts, and enhance comprehensibility for those with linguistic or cognitive impairments. It will also explore cost-effective ways to enhance the data with audience-specific illustrations using image-generating AI. For more and up-to-date information, please visit our project homepage https://open-ls.entavis.com

proceedings, simplification, text simplification, (11 more...)

2312.09966

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.05)
Europe > Bulgaria > Sofia City Province > Sofia (0.04)
(8 more...)

Genre: Research Report (1.00)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.70)

Schomacker, Thorben, Dönicke, Tillmann, Tropmann-Frick, Marina

Exploring Automatic Text Simplification of German Narrative Documents

In this paper, we apply transformer-based Natural Language Generation (NLG) techniques to the problem of text simplification. Currently, there are only a few German datasets available for text simplification, even fewer with larger and aligned documents, and not a single one with narrative texts. In this paper, we explore to which degree modern NLG techniques can be applied to German narrative text simplifications. We use Longformer attention and a pre-trained mBART model. Our findings indicate that the existing approaches for German are not able to solve the task properly. We conclude on a few directions for future research to address this problem.

computational linguistic, simplification, text simplification, (12 more...)

2312.09907

Country:

Europe > Germany > Lower Saxony > Gottingen (0.14)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
Europe > Bulgaria > Sofia City Province > Sofia (0.04)
(9 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (0.54)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue

Yoon, Sunjae, Kim, Dahyun, Yoon, Eunseop, Yoon, Hee Suk, Kim, Junyeong, Yoo, Chnag D.

Video-grounded Dialogue (VGD) aims to answer questions regarding a given multi-modal input comprising video, audio, and dialogue history. Although there have been numerous efforts in developing VGD systems to improve the quality of their responses, existing systems are competent only to incorporate the information in the video and text and tend to struggle in extracting the necessary information from the audio when generating appropriate responses to the question. The VGD system seems to be deaf, and thus, we coin this symptom of current systems' ignoring audio data as a deaf response. To overcome the deaf response problem, Hearing Enhanced Audio Response (HEAR) framework is proposed to perform sensible listening by selectively attending to audio whenever the question requires it. The HEAR framework enhances the accuracy and audibility of VGD systems in a model-agnostic manner. HEAR is validated on VGD datasets (i.e., AVSD@DSTC7 and AVSD@DSTC8) and shows effectiveness with various VGD systems.

2312.09736

Country: Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

arXiv.org Artificial IntelligenceDec-14-2023

Audio-visual fine-tuning of audio-only ASR models

May, Avner, Serdyuk, Dmitriy, Shah, Ankit Parag, Braga, Otavio, Siohan, Olivier

Audio-visual automatic speech recognition (AV-ASR) models are very effective at reducing word error rates on noisy speech, but require large amounts of transcribed AV training data. Recently, audio-visual self-supervised learning (SSL) approaches have been developed to reduce this dependence on transcribed AV data, but these methods are quite complex and computationally expensive. In this work, we propose replacing these expensive AV-SSL methods with a simple and fast \textit{audio-only} SSL method, and then performing AV supervised fine-tuning. We show that this approach is competitive with state-of-the-art (SOTA) AV-SSL methods on the LRS3-TED benchmark task (within 0.5% absolute WER), while being dramatically simpler and more efficient (12-30x faster to pre-train). Furthermore, we show we can extend this approach to convert a SOTA audio-only ASR model into an AV model. By doing so, we match SOTA AV-SSL results, even though no AV data was used during pre-training.

av-ssl method, recognition, speech recognition, (15 more...)

2312.09369

Country:

Europe > Portugal > Braga > Braga (0.05)
Oceania > Australia > Queensland > Brisbane (0.04)
Asia > Singapore (0.04)
(9 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

arXiv.org Artificial IntelligenceDec-14-2023

Multi-modal Latent Space Learning for Chain-of-Thought Reasoning in Language Models

He, Liqi, Li, Zuchao, Cai, Xiantao, Wang, Ping

Chain-of-thought (CoT) reasoning has exhibited impressive performance in language models for solving complex tasks and answering questions. However, many real-world questions require multi-modal information, such as text and images. Previous research on multi-modal CoT has primarily focused on extracting fixed image features from off-the-shelf vision models and then fusing them with text using attention mechanisms. This approach has limitations because these vision models were not designed for complex reasoning tasks and do not align well with language thoughts. To overcome this limitation, we introduce a novel approach for multi-modal CoT reasoning that utilizes latent space learning via diffusion processes to generate effective image features that align with language thoughts. Our method fuses image features and text representations at a deep level and improves the complex reasoning ability of multi-modal CoT. We demonstrate the efficacy of our proposed method on multi-modal ScienceQA and machine translation benchmarks, achieving state-of-the-art performance on ScienceQA. Overall, our approach offers a more robust and effective solution for multi-modal reasoning in language models, enhancing their ability to tackle complex real-world problems.

information, language model, latent space, (15 more...)

2312.08762

Country:

Asia > China > Hubei Province > Wuhan (0.05)
South America > Colombia > Meta Department > Villavicencio (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(7 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.69)
Leisure & Entertainment > Sports (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
(2 more...)

arXiv.org Machine LearningDec-14-2023

Fast Sampling via De-randomization for Discrete Diffusion Models

Chen, Zixiang, Yuan, Huizhuo, Li, Yongqian, Kou, Yiwen, Zhang, Junkai, Gu, Quanquan

Diffusion models have emerged as powerful tools for high-quality data generation, such as image generation. Despite its success in continuous spaces, discrete diffusion models, which apply to domains such as texts and natural languages, remain under-studied and often suffer from slow generation speed. In this paper, we propose a novel de-randomized diffusion process, which leads to an accelerated algorithm for discrete diffusion models. Our technique significantly reduces the number of function evaluations (i.e., calls to the neural network), making the sampling process much faster. Furthermore, we introduce a continuous-time (i.e., infinite-step) sampling algorithm that can provide even better sample qualities than its discrete-time (finite-step) counterpart. Extensive experiments on natural language generation and machine translation tasks demonstrate the superior performance of our method in terms of both generation speed and sample quality over existing methods for discrete diffusion models.

diffusion model, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2312.09193

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Maryland > Baltimore (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
(2 more...)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Radouane, Karim, Tchechmedjiev, Andon, Lagarde, Julien, Ranwez, Sylvie

Motion2Language, unsupervised learning of synchronized semantic motion segmentation

arXiv.org Artificial IntelligenceDec-13-2023

In this paper, we investigate building a sequence to sequence architecture for motion to language translation and synchronization. The aim is to translate motion capture inputs into English natural-language descriptions, such that the descriptions are generated synchronously with the actions performed, enabling semantic segmentation as a byproduct, but without requiring synchronized training data. We propose a new recurrent formulation of local attention that is suited for synchronous/live text generation, as well as an improved motion encoder architecture better suited to smaller data and for synchronous generation. We evaluate both contributions in individual experiments, using the standard BLEU4 metric, as well as a simple semantic equivalence measure, on the KIT motion language dataset. In a follow-up experiment, we assess the quality of the synchronization of generated text in our proposed approaches through multiple evaluation metrics. We find that both contributions to the attention mechanism and the encoder architecture additively improve the quality of generated text (BLEU and semantic equivalence), but also of synchronization. Our code is available at https://github.com/rd20karim/M2T-Segmentation/tree/main

motion2language, segmentation, unsupervised learning, (15 more...)

doi: 10.1007/s00521-023-09227-z

2310.10594

Country:

Europe > France > Occitanie > Hérault > Montpellier (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

arXiv.org Artificial IntelligenceDec-13-2023

Cem Mil Podcasts: A Spoken Portuguese Document Corpus For Multi-modal, Multi-lingual and Multi-Dialect Information Access Research

Garmash, Ekaterina, Tanaka, Edgar, Clifton, Ann, Correia, Joana, Jat, Sharmistha, Zhu, Winstead, Jones, Rosie, Karlgren, Jussi

In this paper we describe the Portuguese-language podcast dataset we have released for academic research purposes. We give an overview of how the data was sampled, descriptive statistics over the collection, as well as information about the distribution over Brazilian and Portuguese dialects. We give results from experiments on multi-lingual summarization, showing that summarizing podcast transcripts can be performed well by a system supporting both English and Portuguese. We also show experiments on Portuguese podcast genre classification using text metadata. Combining this collection with previously released English-language collection opens up the potential for multi-modal, multi-lingual and multi-dialect podcast information access research.

dataset, podcast, portuguese, (13 more...)

doi: 10.1007/978-3-031-42448-9_5

2209.11871

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
South America > Brazil > Rio Grande do Sul > Porto Alegre (0.04)
Europe > Sweden > Uppsala County > Uppsala (0.04)
Europe > Portugal (0.04)

Genre:

Overview (0.88)
Research Report (0.83)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)

Rarrick, Spencer, Naik, Ranjita, Poudel, Sundar, Chowdhary, Vishal

Evaluating Gender Bias in the Translation of Gender-Neutral Languages into English

arXiv.org Artificial IntelligenceDec-12-2023

Machine Translation (MT) continues to improve in quality and adoption, yet the inadvertent perpetuation of gender bias remains a significant concern. Despite numerous studies into gender bias in translations from gender-neutral languages such as Turkish into more strongly gendered languages like English, there are no benchmarks for evaluating this phenomenon or for assessing mitigation strategies. To address this gap, we introduce GATE X-E, an extension to the GATE (Rarrick et al., 2023) corpus, that consists of human translations from Turkish, Hungarian, Finnish, and Persian into English. Each translation is accompanied by feminine, masculine, and neutral variants for each possible gender interpretation. The dataset, which contains between 1250 and 1850 instances for each of the four language pairs, features natural sentences with a wide range of sentence lengths and domains, challenging translation rewriters on various linguistic phenomena. Additionally, we present an English gender rewriting solution built on GPT-3.5 Turbo and use GATE X-E to evaluate it. We open source our contributions to encourage further research on gender debiasing.

gender, pronoun, translation, (15 more...)

2311.08836

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Dominican Republic (0.04)
Europe > Italy > Tuscany > Florence (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)