AITopics | Alastruey, Belen

Collaborating Authors

Alastruey, Belen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

2M-BELEBELE: Highly Multilingual Speech and American Sign Language Comprehension Dataset

Costa-jussà, Marta R., Yu, Bokai, Andrews, Pierre, Alastruey, Belen, Camgoz, Necati Cihan, Chuang, Joe, Maillard, Jean, Ropers, Christophe, Turkantenko, Arina, Wood, Carleigh

arXiv.org Artificial IntelligenceDec-23-2024

We introduce the first highly multilingual speech and American Sign Language (ASL) comprehension dataset by extending BELEBELE. Our dataset covers 74 spoken languages at the intersection of BELEBELE and FLEURS, and one sign language (ASL). We evaluate 2M-BELEBELE dataset for both 5-shot and zero-shot settings and across languages, the speech comprehension accuracy is ~ 2-3% average lower compared to reading comprehension.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2412.08274

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Education > Curriculum > Subject-Specific Education (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.46)

Add feedback

Large Concept Models: Language Modeling in a Sentence Representation Space

LCM team, null, Barrault, Loïc, Duquenne, Paul-Ambroise, Elbayad, Maha, Kozhevnikov, Artyom, Alastruey, Belen, Andrews, Pierre, Coria, Mariano, Couairon, Guillaume, Costa-jussà, Marta R., Dale, David, Elsahar, Hady, Heffernan, Kevin, Janeiro, João Maria, Tran, Tuan, Ropers, Christophe, Sánchez, Eduardo, Roman, Robin San, Mourachko, Alexandre, Saleem, Safiyyah, Schwenk, Holger

arXiv.org Artificial IntelligenceDec-15-2024

LLMs have revolutionized the field of artificial intelligence and have emerged as the de-facto tool for many tasks. The current established technology of LLMs is to process input and generate output at the token level. This is in sharp contrast to humans who operate at multiple levels of abstraction, well beyond single words, to analyze information and to generate creative content. In this paper, we present an attempt at an architecture which operates on an explicit higher-level semantic representation, which we name a concept. Concepts are language- and modality-agnostic and represent a higher level idea or action in a flow. Hence, we build a "Large Concept Model". In this study, as proof of feasibility, we assume that a concept corresponds to a sentence, and use an existing sentence embedding space, SONAR, which supports up to 200 languages in both text and speech modalities. The Large Concept Model is trained to perform autoregressive sentence prediction in an embedding space. We explore multiple approaches, namely MSE regression, variants of diffusion-based generation, and models operating in a quantized SONAR space. These explorations are performed using 1.6B parameter models and training data in the order of 1.3T tokens. We then scale one architecture to a model size of 7B parameters and training data of about 2.7T tokens. We perform an experimental evaluation on several generative tasks, namely summarization and a new task of summary expansion. Finally, we show that our model exhibits impressive zero-shot generalization performance to many languages, outperforming existing LLMs of the same size. The training code of our models is freely available.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2412.08821

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Unveiling the Role of Pretraining in Direct Speech Translation

Alastruey, Belen, Gállego, Gerard I., Costa-jussà, Marta R.

arXiv.org Artificial IntelligenceSep-26-2024

Direct speech-to-text translation systems encounter an important drawback in data scarcity. A common solution consists on pretraining the encoder on automatic speech recognition, hence losing efficiency in the training process. In this study, we compare the training dynamics of a system using a pretrained encoder, the conventional approach, and one trained from scratch. We observe that, throughout the training, the randomly initialized model struggles to incorporate information from the speech inputs for its predictions. Hence, we hypothesize that this issue stems from the difficulty of effectively training an encoder for direct speech translation. While a model trained from scratch needs to learn acoustic and semantic modeling simultaneously, a pretrained one can just focus on the latter. Based on these findings, we propose a subtle change in the decoder cross-attention to integrate source information from earlier steps in training. We show that with this change, the model trained from scratch can achieve comparable performance to the pretrained one, while reducing the training time.

information, machine learning, translation, (20 more...)

arXiv.org Artificial Intelligence

2409.18044

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Towards Real-World Streaming Speech Translation for Code-Switched Speech

Alastruey, Belen, Sperber, Matthias, Gollan, Christian, Telaar, Dominic, Ng, Tim, Agarwal, Aashish

arXiv.org Artificial IntelligenceOct-23-2023

Code-switching (CS), i.e. mixing different languages in a single sentence, is a common phenomenon in communication and can be challenging in many Natural Language Processing (NLP) settings. Previous studies on CS speech have shown promising results for end-to-end speech translation (ST), but have been limited to offline scenarios and to translation to one of the languages present in the source (\textit{monolingual transcription}). In this paper, we focus on two essential yet unexplored areas for real-world CS speech translation: streaming settings, and translation to a third language (i.e., a language not included in the source). To this end, we extend the Fisher and Miami test and validation datasets to include new targets in Spanish and German. Using this data, we train a model for both offline and streaming ST and we establish baseline results for the two settings mentioned earlier.

artificial intelligence, natural language, translation, (17 more...)

arXiv.org Artificial Intelligence

2310.12648

Country: North America > United States > Minnesota (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

SpeechAlign: a Framework for Speech Translation Alignment Evaluation

Alastruey, Belen, Sant, Aleix, Gállego, Gerard I., Dale, David, Costa-jussà, Marta R.

arXiv.org Artificial IntelligenceSep-20-2023

Speech-to-Speech and Speech-to-Text translation are currently dynamic areas of research. To contribute to these fields, we present SpeechAlign, a framework to evaluate the underexplored field of source-target alignment in speech models. Our framework has two core components. First, to tackle the absence of suitable evaluation datasets, we introduce the Speech Gold Alignment dataset, built upon a English-German text translation gold alignment dataset. Secondly, we introduce two novel metrics, Speech Alignment Error Rate (SAER) and Time-weighted Speech Alignment Error Rate (TW-SAER), to evaluate alignment quality in speech models. By publishing SpeechAlign we provide an accessible evaluation framework for model assessment, and we employ it to benchmark open-source Speech Translation models.

artificial intelligence, speech recognition, speech translation alignment evaluation, (2 more...)

arXiv.org Artificial Intelligence

2309.11585

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.89)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.60)

Add feedback

The Gender-GAP Pipeline: A Gender-Aware Polyglot Pipeline for Gender Characterisation in 55 Languages

Muller, Benjamin, Alastruey, Belen, Hansanti, Prangthip, Kalbassi, Elahe, Ropers, Christophe, Smith, Eric Michael, Williams, Adina, Zettlemoyer, Luke, Andrews, Pierre, Costa-jussà, Marta R.

arXiv.org Artificial IntelligenceAug-31-2023

Gender biases in language generation systems are challenging to mitigate. One possible source for these biases is gender representation disparities in the training and evaluation data. Despite recent progress in documenting this problem and many attempts at mitigating it, we still lack shared methodology and tooling to report gender representation in large datasets. Such quantitative reporting will enable further mitigation, e.g., via data augmentation. This paper describes the Gender-GAP Pipeline (for Gender-Aware Polyglot Pipeline), an automatic pipeline to characterize gender representation in large-scale datasets for 55 languages. The pipeline uses a multilingual lexicon of gendered person-nouns to quantify the gender representation in text. We showcase it to report gender representation in WMT training data and development data for the News task, confirming that current data is skewed towards masculine representation. Having unbalanced datasets may indirectly optimize our systems towards outperforming one gender over the others. We suggest introducing our gender quantification pipeline in current datasets and, ideally, modifying them toward a balanced representation.

artificial intelligence, gender-aware polyglot pipeline, pipeline, (3 more...)

arXiv.org Artificial Intelligence

2308.16871

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence (0.53)

Add feedback

Multi-View Frequency-Attention Alternative to CNN Frontends for Automatic Speech Recognition

Alastruey, Belen, Drude, Lukas, Heymann, Jahn, Wiesler, Simon

arXiv.org Artificial IntelligenceJun-12-2023

Convolutional frontends are a typical choice for Transformer-based automatic speech recognition to preprocess the spectrogram, reduce its sequence length, and combine local information in time and frequency similarly. However, the width and height of an audio spectrogram denote different information, e.g., due to reverberation as well as the articulatory system, the time axis has a clear left-to-right dependency. On the contrary, vowels and consonants demonstrate very different patterns and occupy almost disjoint frequency ranges. Therefore, we hypothesize, global attention over frequencies is beneficial over local convolution. We obtain 2.4 % relative word error rate reduction (rWERR) on a production scale Conformer transducer replacing its convolutional neural network frontend by the proposed F-Attention module on Alexa traffic. To demonstrate generalizability, we validate this on public LibriSpeech data with a long short term memory-based listen attend and spell architecture obtaining 4.6 % rWERR and demonstrate robustness to (simulated) noisy conditions.

artificial intelligence, frontend, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2306.06954

Country:

Europe (0.68)
Asia (0.46)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback