Goto

Collaborating Authors

 emo


Imitation Learning in Continuous Action Spaces: Mitigating Compounding Error without Interaction

arXiv.org Machine Learning

We study the problem of imitating an expert demonstrator in a continuous state-and-action dynamical system. While imitation learning in discrete settings such as autoregressive language modeling has seen immense success and popularity in recent years, imitation in physical settings such as autonomous driving and robot learning has proven comparably more complex due to the compounding errors problem, often requiring elaborate set-ups to perform stably. Recent work has demonstrated that even in benign settings, exponential compounding errors are unavoidable when learning solely from expert-controlled trajectories, suggesting the need for more advanced policy parameterizations or data augmentation. To this end, we present minimal interventions that provably mitigate compounding errors in continuous state-and-action imitation learning. When the system is open-loop stable, we prescribe "action chunking," i.e., predicting and playing sequences of actions in open-loop; when the system is possibly unstable, we prescribe "noise injection," i.e., adding noise during expert demonstrations. These interventions align with popular choices in modern robot learning, though the benefits we derive are distinct from the effects they were designed to target. Our results draw insights and tools from both control theory and reinforcement learning; however, our analysis reveals novel considerations that do not naturally arise when either literature is considered in isolation.


EMO: Edge Model Overlays to Scale Model Size in Federated Learning

arXiv.org Artificial Intelligence

--Federated Learning (FL) trains machine learning models on edge devices with distributed data. However, the computational and memory limitations of these devices restrict the training of large models using FL. Split Federated Learning (SFL) addresses this challenge by distributing the model across the device and server, but it introduces a tightly coupled data flow, leading to computational bottlenecks and high communication costs. We propose EMO as a solution to enable the training of large models in FL while mitigating the challenges of SFL. EMO introduces Edge Model Overlay(s) between the device and server, enabling the creation of a larger ensemble model without modifying the FL workflow. The key innovation in EMO is Augmented Federated Learning (AFL), which builds an ensemble model by connecting the original (smaller) FL model with model(s) trained in the overlay(s) to facilitate horizontal or vertical scaling. This is accomplished through three key modules: a hierarchical activation replay cache to decouple AFL from FL, a convergence-aware communication controller to optimize communication overhead, and an ensemble inference module. Evaluations on a real-world prototype show that EMO improves accuracy by up to 17.77% compared to FL, and reduces communication costs by up to 7.17 and decreases training time by up to 6.9 compared to SFL.


Anonymising Elderly and Pathological Speech: Voice Conversion Using DDSP and Query-by-Example

arXiv.org Artificial Intelligence

Speech anonymisation aims to protect speaker identity by changing personal identifiers in speech while retaining linguistic content. Current methods fail to retain prosody and unique speech patterns found in elderly and pathological speech domains, which is essential for remote health monitoring. To address this gap, we propose a voice conversion-based method (DDSP-QbE) using differentiable digital signal processing and query-by-example. The proposed method, trained with novel losses, aids in disentangling linguistic, prosodic, and domain representations, enabling the model to adapt to uncommon speech patterns. Objective and subjective evaluations show that DDSP-QbE significantly outperforms the voice conversion state-of-the-art concerning intelligibility, prosody, and domain preservation across diverse datasets, pathologies, and speakers while maintaining quality and speaker anonymity. Experts validate domain preservation by analysing twelve clinically pertinent domain attributes.


Investigating the relationship between empathy and attribution of mental states to robots

arXiv.org Artificial Intelligence

This paper describes an experimental evaluation aimed at detecting the users' perception of the robot's empathic abilities during a conversation. The results have been then analyzed to search for a possible relationship between the perceived empathy and the attribution of mental states to the robot, namely the user's perception of the robot's mental qualities as compared to humans. The involved sample consisted of 68 subjects, including 34 adults and 34 between teenagers and children. By conducting the experiment with both adult and child participants, make possible to compare the results obtained from each group and identify any differences in perception between the various age groups.


Uncanny Valley! Watch as a creepy humanoid robot mimics a researcher's facial expressions in real time - with eerie precision

Daily Mail - Science & tech

If we want to live in a world where we interact with robots, they'll have to be able to read and respond to our facial expressions in lightning-fast time. Now, scientists have come a step closer to creating such an advanced machine. 'Emo', built by experts at Columbia University in New York, is the fastest humanoid in the world when it comes to mimicking a person's expressions. In fact, it can'predict' a person's smile by looking for subtle signs in their facial muscles and imitate them so that they're effectively smiling at the same time. Amazing video shows the bot copying a researcher's facial expressions in real time with eerie precision and remarkable speed, thanks to cameras in its eyes. Columbia engineers build Emo, a silicon-clad robotic face that makes eye contact and can anticipate and replicate a person's smile at effectively the same time British-made Ameca is described as the'world's most advanced humanoid robot' Emo is the creation of researchers at Columbia University's Creative Machines Lab in New York, who present their work in a new study in Scientific Reports.


This robot predicts when you're going to smile – and smiles back

New Scientist

The Emo robot mimics people's facial expressions A humanoid robot can predict whether someone will smile a second before they do, and match the smile on its own face. The creators hope the technology could make interactions with robots more lifelike. Although artificial intelligence can now mimic human language to an impressive degree, interactions with physical robots often fall into the "uncanny valley", in part because robots can't replicate the complex non-verbal cues and mannerisms that are vital for communication. Now, Hod Lipson at Columbia University in New York and his colleagues have created a robot called Emo that uses AI models and high-resolution cameras to predict people's facial expressions and try to replicate them. It can anticipate whether someone will smile about 0.9 seconds before they do, and smile itself in sync.


EMO: Earth Mover Distance Optimization for Auto-Regressive Language Modeling

arXiv.org Artificial Intelligence

Neural language models are probabilistic models of human text. They are predominantly trained using maximum likelihood estimation (MLE), which is equivalent to minimizing the forward cross-entropy between the empirical data distribution and the model distribution. However, various degeneration phenomena are still widely observed when decoding from the distributions learned by such models. We establish that the forward cross-entropy is suboptimal as a distance metric for aligning human and model distribution due to its (1) recall-prioritization (2) negative diversity ignorance and (3) train-test mismatch. In this paper, we propose Earth Mover Distance Optimization (EMO) for auto-regressive language modeling. EMO capitalizes on the inherent properties of earth mover distance to address the aforementioned challenges. Due to the high complexity of direct computation, we further introduce a feasible upper bound for EMO to ease end-to-end training. Upon extensive evaluation of language models trained using EMO and MLE. We find that EMO demonstrates a consistently better language modeling performance than MLE across domains. Moreover, EMO demonstrates noteworthy enhancements in downstream performance with minimal fine-tuning on merely 25,000 sentences. This highlights the tremendous potential of EMO as a lightweight calibration method for enhancing large-scale pre-trained language models.


Computer Vision Estimation of Emotion Reaction Intensity in the Wild

arXiv.org Artificial Intelligence

Emotions play an essential role in human communication. Developing computer vision models for automatic recognition of emotion expression can aid in a variety of domains, including robotics, digital behavioral healthcare, and media analytics. There are three types of emotional representations which are traditionally modeled in affective computing research: Action Units, Valence Arousal (VA), and Categorical Emotions. As part of an effort to move beyond these representations towards more fine-grained labels, we describe our submission to the newly introduced Emotional Reaction Intensity (ERI) Estimation challenge in the 5th competition for Affective Behavior Analysis in-the-Wild (ABAW). We developed four deep neural networks trained in the visual domain and a multimodal model trained with both visual and audio features to predict emotion reaction intensity. Our best performing model on the Hume-Reaction dataset achieved an average Pearson correlation coefficient of 0.4080 on the test set using a pre-trained ResNet50 model. This work provides a first step towards the development of production-grade models which predict emotion reaction intensities rather than discrete emotion categories.


EMO: Episodic Memory Optimization for Few-Shot Meta-Learning

arXiv.org Artificial Intelligence

Few-shot meta-learning presents a challenge for gradient descent optimization due to the limited number of training samples per task. To address this issue, we propose an episodic memory optimization for meta-learning, we call EMO, which is inspired by the human ability to recall past learning experiences from the brain's memory. EMO retains the gradient history of past experienced tasks in external memory, enabling few-shot learning in a memory-augmented way. By learning to retain and recall the learning process of past training tasks, EMO nudges parameter updates in the right direction, even when the gradients provided by a limited number of examples are uninformative. We prove theoretically that our algorithm converges for smooth, strongly convex objectives. EMO is generic, flexible, and model-agnostic, making it a simple plug-and-play optimizer that can be seamlessly embedded into existing optimization-based few-shot meta-learning approaches. Empirical results show that EMO scales well with most few-shot classification benchmarks and improves the performance of optimization-based meta-learning methods, resulting in accelerated convergence.


Gigaom Tech Goes Emo

#artificialintelligence

Emotion isn't a new frontier in business, of course; sentiment analysis and emotional branding have been in practice long before they were formalized. Focus groups date at least as far back as World War II and Mad Men fans will likely recall Draper's tryst with consumer-research (and consultant Faye Miller…) And, of course, as the 20th century progressed, technology joined customer insight's analog tool sets. But it's only more recently that tech-powered emotional analytics have really stepped into the spotlight.