AITopics | Luque, Jordi

Collaborating Authors

Luque, Jordi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Word Sense Disambiguation in Native Spanish: A Comprehensive Lexical Evaluation Resource

Ortega, Pablo, Luque, Jordi, Lamiable, Luis, López, Rodrigo, Benjamins, Richard

arXiv.org Artificial IntelligenceSep-30-2024

Human language, while aimed at conveying meaning, inherently carries ambiguity. It poses challenges for speech and language processing, but also serves crucial communicative functions. Efficiently solve ambiguity is both a desired and a necessary characteristic. The lexical meaning of a word in context can be determined automatically by Word Sense Disambiguation (WSD) algorithms that rely on external knowledge often limited and biased toward English. When adapting content to other languages, automated translations are frequently inaccurate and a high degree of expert human validation is necessary to ensure both accuracy and understanding. The current study addresses previous limitations by introducing a new resource for Spanish WSD. It includes a sense inventory and a lexical dataset sourced from the Diccionario de la Lengua Espa\~nola which is maintained by the Real Academia Espa\~nola. We also review current resources for Spanish and report metrics on them by a state-of-the-art system.

artificial intelligence, natural language, sense disambiguation, (12 more...)

arXiv.org Artificial Intelligence

2409.20524

Country:

North America > United States > Colorado (0.14)
Asia > Middle East > Republic of Türkiye (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Robust Wake-Up Word Detection by Two-stage Multi-resolution Ensembles

López, Fernando, Luque, Jordi, Segura, Carlos, Gómez, Pablo

arXiv.org Artificial IntelligenceOct-17-2023

Voice-based interfaces rely on a wake-up word mechanism to initiate communication with devices. However, achieving a robust, energy-efficient, and fast detection remains a challenge. This paper addresses these real production needs by enhancing data with temporal alignments and using detection based on two phases with multi-resolution. It employs two models: a lightweight on-device model for real-time processing of the audio stream and a verification model on the server-side, which is an ensemble of heterogeneous architectures that refine detection. This scheme allows the optimization of two operating points. To protect privacy, audio features are sent to the cloud instead of raw audio. The study investigated different parametric configurations for feature extraction to select one for on-device detection and another for the verification model. Furthermore, thirteen different audio classifiers were compared in terms of performance and inference time. The proposed ensemble outperforms our stronger classifier in every noise condition.

classifier, data mining, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2310.11379

Country: Europe > Spain (0.14)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.48)

Technology:

Information Technology > Data Science > Data Mining (0.90)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

Scheduling Inference Workloads on Distributed Edge Clusters with Reinforcement Learning

Castellano, Gabriele, Nieto, Juan-José, Luque, Jordi, Diego, Ferrán, Segura, Carlos, Perino, Diego, Esposito, Flavio, Risso, Fulvio, Raman, Aravindh

arXiv.org Artificial IntelligenceJan-31-2023

Many real-time applications (e.g., Augmented/Virtual Reality, cognitive assistance) rely on Deep Neural Networks (DNNs) to process inference tasks. Edge computing is considered a key infrastructure to deploy such applications, as moving computation close to the data sources enables us to meet stringent latency and throughput requirements. However, the constrained nature of edge networks poses several additional challenges to the management of inference workloads: edge clusters can not provide unlimited processing power to DNN models, and often a trade-off between network and processing time should be considered when it comes to end-to-end delay requirements. In this paper, we focus on the problem of scheduling inference queries on DNN models in edge networks at short timescales (i.e., few milliseconds). By means of simulations, we analyze several policies in the realistic network settings and workloads of a large ISP, highlighting the need for a dynamic scheduling policy that can adapt to network conditions and workloads. We therefore design ASET, a Reinforcement Learning based scheduling algorithm able to adapt its decisions according to the system conditions. Our results show that ASET effectively provides the best performance compared to static policies when scheduling over a distributed pool of edge resources.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2301.13618

Country: Europe (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Iterative pseudo-forced alignment by acoustic CTC loss for self-supervised ASR domain adaptation

López, Fernando, Luque, Jordi

arXiv.org Artificial IntelligenceJan-15-2023

High-quality data labeling from specific domains is costly and human time-consuming. In this work, we propose a self-supervised domain adaptation method, based upon an iterative pseudo-forced alignment algorithm. The produced alignments are employed to customize an end-to-end Automatic Speech Recognition (ASR) and iteratively refined. The algorithm is fed with frame-wise character posteriors produced by a seed ASR, trained with out-of-domain data, and optimized throughout a Connectionist Temporal Classification (CTC) loss. The alignments are computed iteratively upon a corpus of broadcast TV. The process is repeated by reducing the quantity of text to be aligned or expanding the alignment window until finding the best possible audio-text alignment. The starting timestamps, or temporal anchors, are produced uniquely based on the confidence score of the last aligned utterance. This score is computed with the paths of the CTC-alignment matrix. With this methodology, no human-revised text references are required. Alignments from long audio files with low-quality transcriptions, like TV captions, are filtered out by confidence score and ready for further ASR adaptation. The obtained results, on both the Spanish RTVE2022 and CommonVoice databases, underpin the feasibility of using CTC-based systems to perform: highly accurate audio-text alignments, domain adaptation and semi-supervised training of end-to-end ASR.

alignment, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2210.15226

Genre: Research Report (0.50)

Industry: Media > Television (0.35)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Data Augmentation for Low-Resource Quechua ASR Improvement

Zevallos, Rodolfo, Bel, Nuria, Cámbara, Guillermo, Farrús, Mireia, Luque, Jordi

arXiv.org Artificial IntelligenceJul-14-2022

Automatic Speech Recognition (ASR) is a key element in new services that helps users to interact with an automated system. Deep learning methods have made it possible to deploy systems with word error rates below 5% for ASR of English. However, the use of these methods is only available for languages with hundreds or thousands of hours of audio and their corresponding transcriptions. For the so-called low-resource languages to speed up the availability of resources that can improve the performance of their ASR systems, methods of creating new resources on the basis of existing ones are being investigated. In this paper we describe our data augmentation approach to improve the results of ASR models for low-resource and agglutinative languages. We carry out experiments developing an ASR for Quechua using the wav2letter++ model. We reduced WER by 8.73% through our approach to the base model. The resulting ASR model obtained 22.75% WER and was trained with 99 hours of original resources and 99 hours of synthetic data obtained with a combination of text augmentation and synthetic speech generati

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2207.06872

Country: South America > Peru (0.28)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.90)

Add feedback

Convolutional Speech Recognition with Pitch and Voice Quality Features

Cámbara, Guillermo, Luque, Jordi, Farrús, Mireia

arXiv.org Artificial IntelligenceSep-2-2020

The effects of adding pitch and voice quality features such as jitter and shimmer to a state-of-the-art CNN model for Automatic Speech Recognition are studied in this work. Pitch features have been previously used for improving classical HMM and DNN baselines, while jitter and shimmer parameters have proven to be useful for tasks like speaker or emotion recognition. Up to our knowledge, this is the first work combining such pitch and voice quality features with modern convolutional architectures, showing improvements up to 2% absolute WER points, for the publicly available Spanish Common Voice dataset. Particularly, our work combines these features with mel-frequency spectral coefficients (MFSCs) to train a convolutional architecture with Gated Linear Units (Conv GLUs). Such models have shown to yield small word error rates, while being very suitable for parallel processing for online streaming recognition use cases. We have added pitch and voice quality functionality to Facebook's wav2letter speech recognition framework, and we provide with such code and recipes to the community, to carry on with further experiments. Besides, to the best of our knowledge, our Spanish Common Voice recipe is the first public Spanish recipe for wav2letter.

deep learning, shimmer, speech recognition, (19 more...)

arXiv.org Artificial Intelligence

2009.01309

Country: Europe > Spain (0.47)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Input complexity and out-of-distribution detection with likelihood-based generative models

Serrà, Joan, Álvarez, David, Gómez, Vicenç, Slizovskaia, Olga, Núñez, José F., Luque, Jordi

arXiv.org Machine LearningSep-25-2019

On the other side of the spectrum, we observe that the data set with a lower log-likelihood is Noise, a data set of uniform random images, followed by TrafficSign and TinyImageNet; both featuring colorful images with nontrivial background. Such ordering is perhaps more clear by looking at the average log-likelihood of each data set (Appendix D). If we think about the visual complexity of the images in those data sets, it would seem that log-likelihoods tend to grow when images become simpler and with less information or content. To further confirm the previous observation, we design a controlled experiment where we can set different decreasing levels of image complexity. We train a generative model with some data set, as before, but now compute likelihoods of progressively simpler inputs. Such inputs are obtained by average-pooling the uniform random Noise images by factors of 1, 2, 4, 8, 16, and 32, and re-scaling back the images to the original size by nearest-neighbor up-sampling. Intuitively, a noise image with a pooling size of 1 (no pooling) has the highest complexity, while a noise image with a pooling of 32 (constant-color image) has the lowest complexity.

deep learning, generative model, neural network, (19 more...)

arXiv.org Machine Learning

1909.1148

Country:

Europe (0.28)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > Experimental Study (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.66)

Add feedback

How Much Does Audio Matter to Recognize Egocentric Object Interactions?

Cartas, Alejandro, Luque, Jordi, Radeva, Petia, Segura, Carlos, Dimiccoli, Mariella

arXiv.org Artificial IntelligenceJun-3-2019

Sounds are an important source of information on our daily interactions with objects. For instance, a significant amount of people can discern the temperature of water that it is being poured just by using the sense of hearing. However, only a few works have explored the use of audio for the classification of object interactions in conjunction with vision or as single modality. In this preliminary work, we propose an audio model for egocentric action recognition and explore its usefulness on the parts of the problem (noun, verb, and action classification). Our model achieves a competitive result in terms of verb classification (34.26% accuracy) on a standard benchmark with respect to vision-based state of the art systems, using a comparatively lighter architecture.

classification, deep learning, neural network, (18 more...)

arXiv.org Artificial Intelligence

1906.00634

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)

Add feedback