AITopics | Alameda-Pineda, Xavier

Collaborating Authors

Alameda-Pineda, Xavier

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder

Sadok, Samir, Leglaive, Simon, Girin, Laurent, Richard, Gaël, Alameda-Pineda, Xavier

arXiv.org Artificial IntelligenceJan-9-2025

Abstract--This article introduces AnCoGen, a novel method that leverages a masked autoencoder to unify the analysis, control, and generation of speech signals within a single model. AnCoGen can analyze speech by estimating key attributes, such as speaker identity, pitch, content, loudness, signal-to-noise ratio, and clarity index. In addition, it can generate speech from these attributes and allow precise control of the synthesized speech by modifying them. Extensive experiments demonstrated the effectiveness of AnCoGen across speech analysisresynthesis, pitch estimation, pitch modification, and speech enhancement. Over the years, many speech processing algorithms have been In this paper, we introduce AnCoGen for analyzing, controlling, developed to analyze, transform, and synthesize speech signals.

ancogen, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2501.05332

Country: Europe > France (0.29)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Diffusion-based Unsupervised Audio-visual Speech Enhancement

Ayilo, Jean-Eudes, Sadeghi, Mostafa, Serizel, Romain, Alameda-Pineda, Xavier

arXiv.org Artificial IntelligenceOct-4-2024

This paper proposes a new unsupervised audiovisual speech enhancement (AVSE) approach that combines a diffusion-based audio-visual speech generative model with a non-negative matrix factorization (NMF) noise model. First, the diffusion model is pre-trained on clean speech conditioned on corresponding video data to simulate the speech generative distribution. This pre-trained model is then paired with the NMF-based noise model to iteratively estimate clean speech. Specifically, a diffusion-based posterior sampling approach is implemented within the reverse diffusion process, where after each iteration, a speech estimate is obtained and used to update the noise parameters. Experimental results confirm that the proposed AVSE approach not only outperforms its audio-only counterpart but also generalizes better than a recent supervisedgenerative AVSE method. Additionally, the new inference algorithm offers a better balance between inference speed and performance compared to the previous diffusion-based method.

artificial intelligence, machine learning, speech enhancement, (18 more...)

arXiv.org Artificial Intelligence

2410.05301

Country: Europe > France (0.28)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Socially Pertinent Robots in Gerontological Healthcare

Alameda-Pineda, Xavier, Addlesee, Angus, García, Daniel Hernández, Reinke, Chris, Arias, Soraya, Arrigoni, Federica, Auternaud, Alex, Blavette, Lauriane, Beyan, Cigdem, Camara, Luis Gomez, Cohen, Ohad, Conti, Alessandro, Dacunha, Sébastien, Dondrup, Christian, Ellinson, Yoav, Ferro, Francesco, Gannot, Sharon, Gras, Florian, Gunson, Nancie, Horaud, Radu, D'Incà, Moreno, Kimouche, Imad, Lemaignan, Séverin, Lemon, Oliver, Liotard, Cyril, Marchionni, Luca, Moradi, Mordehay, Pajdla, Tomas, Pino, Maribel, Polic, Michal, Py, Matthieu, Rado, Ariel, Ren, Bin, Ricci, Elisa, Rigaud, Anne-Sophie, Rota, Paolo, Romeo, Marta, Sebe, Nicu, Sieińska, Weronika, Tandeitnik, Pinchas, Tonini, Francesco, Turro, Nicolas, Wintz, Timothée, Yu, Yanchao

arXiv.org Artificial IntelligenceApr-11-2024

Despite the many recent achievements in developing and deploying social robotics, there are still many underexplored environments and applications for which systematic evaluation of such systems by end-users is necessary. While several robotic platforms have been used in gerontological healthcare, the question of whether or not a social interactive robot with multi-modal conversational capabilities will be useful and accepted in real-life facilities is yet to be answered. This paper is an attempt to partially answer this question, via two waves of experiments with patients and companions in a day-care gerontological facility in Paris with a full-sized humanoid robot endowed with social and conversational interaction capabilities. The software architecture, developed during the H2020 SPRING project, together with the experimental protocol, allowed us to evaluate the acceptability (AES) and usability (SUS) with more than 60 end-users. Overall, the users are receptive to this technology, especially when the robot perception and action skills are robust to environmental clutter and flexible to handle a plethora of different interactions.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2404.0756

Country:

North America > Canada > Ontario (0.14)
Europe > France > Île-de-France > Paris > Paris (0.14)
Asia > Middle East > Israel (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.46)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.93)
(4 more...)

Add feedback

Univariate Radial Basis Function Layers: Brain-inspired Deep Neural Layers for Low-Dimensional Inputs

Jost, Daniel, Patil, Basavasagar, Alameda-Pineda, Xavier, Reinke, Chris

arXiv.org Artificial IntelligenceFeb-2-2024

Deep Neural Networks (DNNs) became the standard tool for function approximation with most of the introduced architectures being developed for high-dimensional input data. However, many real-world problems have low-dimensional inputs for which standard Multi-Layer Perceptrons (MLPs) are the default choice. An investigation into specialized architectures is missing. We propose a novel DNN layer called Univariate Radial Basis Function (U-RBF) layer as an alternative. Similar to sensory neurons in the brain, the U-RBF layer processes each individual input dimension with a population of neurons whose activations depend on different preferred input values. We verify its effectiveness compared to MLPs in low-dimensional function regressions and reinforcement learning tasks. The results show that the U-RBF is especially advantageous when the target function becomes complex and difficult to approximate.

artificial intelligence, frequency, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2311.16148

Country:

Europe (0.28)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Mixture of Dynamical Variational Autoencoders for Multi-Source Trajectory Modeling and Separation

Lin, Xiaoyu, Girin, Laurent, Alameda-Pineda, Xavier

arXiv.org Artificial IntelligenceDec-7-2023

In this paper, we propose a latent-variable generative model called mixture of dynamical variational autoencoders (MixDV AE) to model the dynamics of a system composed of multiple moving sources. A DV AE model is pre-trained on a single-source dataset to capture the source dynamics. Then, multiple instances of the pre-trained DV AE model are integrated into a multi-source mixture model with a discrete observation-to-source assignment latent variable. The posterior distributions of both the discrete observation-to-source assignment variable and the continuous DV AE variables representing the sources content/position are estimated using a variational expectation-maximization algorithm, leading to multi-source trajectories estimation. We illustrate the versatility of the proposed MixDV AE model on two tasks: a computer vision task, namely multi-object tracking, and an audio processing task, namely single-channel audio source separation. Experimental results show that the proposed method works well on these two tasks, and outperforms several baseline methods.

artificial intelligence, dataset, deep learning, (16 more...)

arXiv.org Artificial Intelligence

2312.04167

Country:

Europe > France (0.14)
Asia > Middle East (0.14)
North America > United States (0.14)
(3 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Leisure & Entertainment (0.67)
Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Motion-DVAE: Unsupervised learning for fast human motion denoising

Fiche, Guénolé, Leglaive, Simon, Alameda-Pineda, Xavier, Séguier, Renaud

arXiv.org Artificial IntelligenceNov-30-2023

Pose and motion priors are crucial for recovering realistic and accurate human motion from noisy observations. Substantial progress has been made on pose and shape estimation from images, and recent works showed impressive results using priors to refine frame-wise predictions. However, a lot of motion priors only model transitions between consecutive poses and are used in time-consuming optimization procedures, which is problematic for many applications requiring real-time motion capture. We introduce Motion-DVAE, a motion prior to capture the short-term dependencies of human motion. As part of the dynamical variational autoencoder (DVAE) models family, Motion-DVAE combines the generative capability of VAE models and the temporal modeling of recurrent architectures. Together with Motion-DVAE, we introduce an unsupervised learned denoising method unifying regression- and optimization-based approaches in a single framework for real-time 3D human pose estimation. Experiments show that the proposed approach reaches competitive performance with state-of-the-art methods while being much faster.

artificial intelligence, computer vision, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2306.05846

Country:

Europe > France (0.28)
Europe > Middle East > Cyprus (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

A weighted-variance variational autoencoder model for speech enhancement

Golmakani, Ali, Sadeghi, Mostafa, Alameda-Pineda, Xavier, Serizel, Romain

arXiv.org Artificial IntelligenceOct-26-2023

We address speech enhancement based on variational autoencoders, which involves learning a speech prior distribution in the time-frequency (TF) domain. A zero-mean complex-valued Gaussian distribution is usually assumed for the generative model, where the speech information is encoded in the variance as a function of a latent variable. In contrast to this commonly used approach, we propose a weighted variance generative model, where the contribution of each spectrogram time-frame in parameter learning is weighted. We impose a Gamma prior distribution on the weights, which would effectively lead to a Student's t-distribution instead of Gaussian for speech generative modeling. We develop efficient training and speech enhancement algorithms based on the proposed generative model. Our experimental results on spectrogram auto-encoding and speech enhancement demonstrate the effectiveness and robustness of the proposed approach compared to the standard unweighted variance model.

artificial intelligence, machine learning, speech enhancement, (15 more...)

arXiv.org Artificial Intelligence

2211.0099

Country: Europe > France (0.29)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Variational Meta Reinforcement Learning for Social Robotics

Ballou, Anand, Alameda-Pineda, Xavier, Reinke, Chris

arXiv.org Artificial IntelligenceAug-3-2023

With the increasing presence of robots in our every-day environments, improving their social skills is of utmost importance. Nonetheless, social robotics still faces many challenges. One bottleneck is that robotic behaviors need to be often adapted as social norms depend strongly on the environment. For example, a robot should navigate more carefully around patients in a hospital compared to workers in an office. In this work, we investigate meta-reinforcement learning (meta-RL) as a potential solution. Here, robot behaviors are learned via reinforcement learning where a reward function needs to be chosen so that the robot learns an appropriate behavior for a given environment. We propose to use a variational meta-RL procedure that quickly adapts the robots' behavior to new reward functions. As a result, given a new environment different reward functions can be quickly evaluated and an appropriate one selected. The procedure learns a vectorized representation for reward functions and a meta-policy that can be conditioned on such a representation. Given observations from a new reward function, the procedure identifies its representation and conditions the meta-policy to it. While investigating the procedures' capabilities, we realized that it suffers from posterior collapse where only a subset of the dimensions in the representation encode useful information resulting in a reduced performance. Our second contribution, a radial basis function (RBF) layer, partially mitigates this negative effect. The RBF layer lifts the representation to a higher dimensional space, which is more easily exploitable for the meta-policy. We demonstrate the interest of the RBF layer and the usage of meta-RL for social robotics on four robotic simulation tasks.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2206.03211

Country: Europe > France (0.14)

Genre: Research Report > New Finding (0.93)

Industry:

Education (0.46)
Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation

Airale, Louis, Vaufreydaz, Dominique, Alameda-Pineda, Xavier

arXiv.org Artificial IntelligenceJul-4-2023

Animating still face images with deep generative models using a speech input signal is an active research topic and has seen important recent progress. However, much of the effort has been put into lip syncing and rendering quality while the generation of natural head motion, let alone the audio-visual correlation between head motion and speech, has often been neglected. In this work, we propose a multi-scale audio-visual synchrony loss and a multi-scale autoregressive GAN to better handle short and long-term correlation between speech and the dynamics of the head and lips. In particular, we train a stack of syncer models on multimodal input pyramids and use these models as guidance in a multi-scale generator network to produce audio-aligned motion unfolding over diverse time scales. Our generator operates in the facial landmark domain, which is a standard low-dimensional head representation. The experiments show significant improvements over the state of the art in head motion dynamics quality and in multi-scale audio-visual synchrony both in the landmark domain and in the image domain.

machine learning, natural language, voxceleb2, (20 more...)

arXiv.org Artificial Intelligence

2307.0327

Country:

Europe > France (0.15)
Asia > Taiwan (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Unsupervised speech enhancement with deep dynamical generative speech and noise models

Lin, Xiaoyu, Leglaive, Simon, Girin, Laurent, Alameda-Pineda, Xavier

arXiv.org Artificial IntelligenceJun-13-2023

ND methods use noise or noisy speech enhancement using a dynamical variational autoencoder speech training samples to learn some noise characteristics. In (DVAE) as the clean speech model and non-negative matrix factorization contrast, NA methods only use clean speech signals for training (NMF) as the noise model. We propose to replace and the noise characteristics are estimated at test time for the NMF noise model with a deep dynamical generative model each noisy speech sequence to process. A typical unsupervised (DDGM) depending either on the DVAE latent variables, or on NA approach uses a pre-trained variational autoencoder (VAE) the noisy observations, or on both. This DDGM can be trained as a prior distribution of the clean speech signal and a nonnegative in three configurations: noise-agnostic, noise-dependent and matrix factorization (NMF) model for the noise variance noise adaptation after noise-dependent training.

artificial intelligence, configuration, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2306.0782

Country:

Europe (1.00)
Asia (0.68)
North America > Canada (0.28)
North America > United States > California (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback