Goto

Collaborating Authors

 head position


Learning Nonverbal Cues in Multiparty Social Interactions for Robotic Facilitators

arXiv.org Artificial Intelligence

Conventional behavior cloning (BC) models often struggle to replicate the subtleties of human actions. Previous studies have attempted to address this issue through the development of a new BC technique: Implicit Behavior Cloning (IBC). This new technique consistently outperformed the conventional Mean Squared Error (MSE) BC models in a variety of tasks. Our goal is to replicate the performance of the IBC model by Florence [in Proceedings of the 5th Conference on Robot Learning, 164:158-168, 2022], for social interaction tasks using our custom dataset. While previous studies have explored the use of large language models (LLMs) for enhancing group conversations, they often overlook the significance of non-verbal cues, which constitute a substantial part of human communication. We propose using IBC to replicate nonverbal cues like gaze behaviors. The model is evaluated against various types of facilitator data and compared to an explicit, MSE BC model. Results show that the IBC model outperforms the MSE BC model across session types using the same metrics used in the previous IBC paper. Despite some metrics showing mixed results which are explainable for the custom dataset for social interaction, we successfully replicated the IBC model to generate nonverbal cues. Our contributions are (1) the replication and extension of the IBC model, and (2) a nonverbal cues generation model for social interaction. These advancements facilitate the integration of robots into the complex interactions between robots and humans, e.g., in the absence of a human facilitator.


Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of natural language processing and reasoning tasks. However, their performance in the foundational domain of arithmetic remains unsatisfactory. When dealing with arithmetic tasks, LLMs often memorize specific examples rather than learning the underlying computational logic, limiting their ability to generalize to new problems. In this paper, we propose a Composable Arithmetic Execution Framework (CAEF) that enables LLMs to learn to execute step-by-step computations by emulating Turing Machines, thereby gaining a genuine understanding of computational logic. Moreover, the proposed framework is highly scalable, allowing composing learned operators to significantly reduce the difficulty of learning complex operators. In our evaluation, CAEF achieves nearly 100% accuracy across seven common mathematical operations on the LLaMA 3.1-8B model, effectively supporting computations involving operands with up to 100 digits, a level where GPT-4o falls short noticeably in some settings.


CapHuman: Capture Your Moments in Parallel Universes

arXiv.org Artificial Intelligence

We concentrate on a novel human-centric image synthesis task, that is, given only one reference facial photograph, it is expected to generate specific individual images with diverse head positions, poses, and facial expressions in different contexts. To accomplish this goal, we argue that our generative model should be capable of the following favorable characteristics: (1) a strong visual and semantic understanding of our world and human society for basic object and human image generation. (2) generalizable identity preservation ability. (3) flexible and fine-grained head control. Recently, large pre-trained text-to-image diffusion models have shown remarkable results, serving as a powerful generative foundation. As a basis, we aim to unleash the above two capabilities of the pre-trained model. In this work, we present a new framework named CapHuman. We embrace the ``encode then learn to align" paradigm, which enables generalizable identity preservation for new individuals without cumbersome tuning at inference. CapHuman encodes identity features and then learns to align them into the latent space. Moreover, we introduce the 3D facial prior to equip our model with control over the human head in a flexible and 3D-consistent manner. Extensive qualitative and quantitative analyses demonstrate our CapHuman can produce well-identity-preserved, photo-realistic, and high-fidelity portraits with content-rich representations and various head renditions, superior to established baselines. Code and checkpoint will be released at https://github.com/VamosC/CapHuman.


Appearance-based gaze estimation enhanced with synthetic images using deep neural networks

arXiv.org Artificial Intelligence

Human eye gaze estimation is an important cognitive ingredient for successful human-robot interaction, enabling the robot to read and predict human behavior. We approach this problem using artificial neural networks and build a modular system estimating gaze from separately cropped eyes, taking advantage of existing well-functioning components for face detection (RetinaFace) and head pose estimation (6DRepNet). Our proposed method does not require any special hardware or infrared filters but uses a standard notebook-builtin RGB camera, as often approached with appearance-based methods. Using the MetaHuman tool, we also generated a large synthetic dataset of more than 57,000 human faces and made it publicly available. The inclusion of this dataset (with eye gaze and head pose information) on top of the standard Columbia Gaze dataset into training the model led to better accuracy with a mean average error below two degrees in eye pitch and yaw directions, which compares favourably to related methods. We also verified the feasibility of our model by its preliminary testing in real-world setting using the builtin 4K camera in NICO semi-humanoid robot's eye.


Subject-Independent Magnetoencephalographic Source Localization by a Multilayer Perceptron

Neural Information Processing Systems

We describe a system that localizes a single dipole to reasonable accu- racy from noisy magnetoencephalographic (MEG) measurements in real time. At its core is a multilayer perceptron (MLP) trained to map sen- sor signals and head position to dipole location. Including head position overcomes the previous need to retrain the MLP for each subject and ses- sion. The training dataset was generated by mapping randomly chosen dipoles and head positions through an analytic model and adding noise from real MEG recordings. After training, a localization took 0.7 ms with an average error of 0.90 cm.


Ford F-150 Lightning Electric Pickup to Have Level 2 Autonomous Driving - AI Trends

#artificialintelligence

The all-electric Ford F-150 Lightning, announced recently by the Ford Motor Co., will feature hands-free driving by virtue of Blue Cruise advanced driving assistance system (ADAS). The hands-free driving features will also be available on the 2021 internal combustion pickup truck and certain Mustang models through a software update later this year, according to an account in TechCrunch. The hands-free capability uses cameras, radar sensors and software to provide a combination of adaptive cruise control, lane centering and speed-sign recognition. It has undergone some 500,000 miles of development testing, Ford emphasized in an announcement in April. The system also has an in-cabin camera that monitors eye gaze and head position to help ensure the driver's eyes remain on the road.


Universality of Gradient Descent Neural Network Training

arXiv.org Machine Learning

It has been observed that design choices of neural networks are often crucial for their successful optimization. In this article, we therefore discuss the question if it is always possible to redesign a neural network so that it trains well with gradient descent. This yields the following universality result: If, for a given network, there is any algorithm that can find good network weights for a classification task, then there exists an extension of this network that reproduces these weights and the corresponding forward output by mere gradient descent training. The construction is not intended for practical computations, but it provides some orientation on the possibilities of meta-learning and related approaches.


Progress Extrapolating Algorithmic Learning to Arbitrary Sequence Lengths

arXiv.org Machine Learning

Recent neural network models for algorithmic tasks have led to significant improvements in extrapolation to sequences much longer than training, but it remains an outstanding problem that the performance still degrades for very long or adversarial sequences. We present alternative architectures and loss-terms to address these issues, and our testing of these approaches has not detected any remaining extrapolation errors within memory constraints. We focus on linear time algorithmic tasks including copy, parentheses parsing, and binary addition. First, activation binning was used to discretize the trained network in order to avoid computational drift from continuous operations, and a binning-based digital loss term was added to encourage discretizable representations. In addition, a localized differentiable memory (LDM) architecture, in contrast to distributed memory access, addressed remaining extrapolation errors and avoided unbounded growth of internal computational states. Previous work has found that algorithmic extrapolation issues can also be alleviated with approaches relying on program traces, but the current effort does not rely on such traces.


Smart PILLOW uses airbags to adjust your head position in your sleep and stop you snoring

Daily Mail - Science & tech

A smart pillow made by lifestyle technology company 10minds wants to end the scourge of snoring once and for all. The Motion Pillow, which was showcased at CES in Las Vegas, is a memory foam pillow that uses multiple different technologies to help alleviate issues that contribute to snoring. Using the company's'Sleep Pressure Monitoring System' - pads inside the pillow that can detect the position of one's head - the Motion Pillow is able to activate airbags inside the product to give sleepers' heads and necks a nudge in the right direction. The pillow technology is coupled with an audio detection system that is capable of hearing snores when they happen. Once the recorder picks up on any heavy breathing, it is able to communicate with the Motion Pillow, which then inflates airbags to reposition a user's head.


AI Sucks at Making Adorable Cat Photos, Clearly Misses the Entire Point of the Internet

#artificialintelligence

Artificial intelligence (AI) recently tried to generate cat photos from scratch, and the results were cat-astrophic. This particular neural network (a type of AI modeled after the workings of the human brain) can produce astonishingly realistic original photos of human faces. In fact, the images of these made-up people were nearly impossible for human viewers to distinguish from photos of real people, programmers of the AI reported in a study that was posted December 2018 to the preprint journal arXiv. Felines, however, proved to be another story. The same algorithm that generated flawless human faces created cats with misshapen heads; the wrong number of eyes and legs; and bodies that were too long, too short, unusually rotund or rectangular, and bent at peculiar angles.