AITopics

Country: North America > United States > South Carolina (0.28)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.83)

Ishikura, Seiya, Yamada, Hiroaki, Hiraoka, Tatsuya, Yamada, Hiroaki, Tokunaga, Takenobu

Augmenting Dialog with Think-Aloud Utterances for Modeling Individual Personality Traits by LLM

arXiv.org Artificial IntelligenceOct-30-2025

This study proposes augmenting dialog data with think-aloud utterances (TAUs) for modeling individual personalities in text chat by LLM. TAU is a verbalization of a speaker's thought before articulating the utterance. We expect "persona LLMs" trained with TAU-augmented data can mimic the speaker's personality trait better. We tested whether the trained persona LLMs obtain the human personality with respect to Big Five, a framework characterizing human personality traits from five aspects. The results showed that LLMs trained with TAU-augmented data more closely align to the speakers' Agreeableness and Neuroticism of Big Five than those trained with original dialog data. We also found that the quality of TAU-augmentation impacts persona LLM's performance.

large language model, machine learning, utterance, (18 more...)

2510.09158

Country:

Asia (1.00)
North America > United States (0.28)
North America > Mexico (0.28)
Europe > Austria (0.28)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

arXiv.org Artificial IntelligenceOct-10-2025

The Behavioural Translation Style Space: Towards simulating the temporal dynamics of affect, behaviour, and cognition in human translation production

Carl, Michael, Mizowaki, Takanori, Ray, Aishvarya, Yamada, Masaru, Bandaru, Devi Sri, Ren, Xinyue

The paper introduces a novel behavioural translation style space (BTSS) that describes possible behavioural translation patterns. The suggested BTSS is organized as a hierarchical structure that entails various embedded processing layers. We posit that observable translation behaviour - i.e. eye and finger movements - is fundamental when executing the physical act of translation but it is caused and shaped by higher-order cognitive processes and affective translation states. We analyse records of keystrokes and gaze data as indicators of the hidden mental processing structure and organize the behavioural patterns as a multi-layered embedded BTSS. We develop a perspective in which the BTSS serves as the basis for a computational translation agent to simulate the temporal dynamics of affect, behavioural routines and cognition during human translation production.

artificial intelligence, machine learning, natural language, (19 more...)

doi: 10.33542/JTI2025-S-11

2507.12208

Country:

North America > United States (0.46)
Europe > United Kingdom (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Neural Information Processing SystemsAug-20-2025, 02:52:00 GMT

cf67355a3333e6e143439161adc2d82e-Paper.pdf

au recognition, face image, recognition, (16 more...)

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(3 more...)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Neural Information Processing SystemsAug-15-2025, 14:24:57 GMT

A Generic Knowledge as Probabilities

We adapt the generic knowledge from existing studies that are applicable to different datasets. Generic knowledge is expressed as probabilities. The generic knowledge is categorized into three types: expression-dependent single AU probabilities, expression-dependent joint AU probabilities, and expression-independent joint AU probabilities. 1) For expression-dependent single AU probabilities, two sources are considered. According to FACS, given an expression, AUs can be grouped into primary (P) and secondary (S) categories. The primary AUs are the most expressive AUs with respective to the expression, and the secondary AUs may co-occur with primary AUs providing additional supports for the expression.

constraint, expression, probability, (13 more...)

Technology:

Information Technology > Knowledge Management > Knowledge Engineering (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.47)

Yaragoppa, Akhila, Siddharth, null

Saliency-guided Emotion Modeling: Predicting Viewer Reactions from Video Stimuli

arXiv.org Artificial IntelligenceMay-27-2025

Understanding the emotional impact of videos is crucial for applications in content creation, advertising, and Human-Computer Interaction (HCI). Traditional affective computing methods rely on self-reported emotions, facial expression analysis, and biosensing data, yet they often overlook the role of visual saliency -- the naturally attention-grabbing regions within a video. In this study, we utilize deep learning to introduce a novel saliency-based approach to emotion prediction by extracting two key features: saliency area and number of salient regions. Using the HD2S saliency model and OpenFace facial action unit analysis, we examine the relationship between video saliency and viewer emotions. Our findings reveal three key insights: (1) Videos with multiple salient regions tend to elicit high-valence, low-arousal emotions, (2) Videos with a single dominant salient region are more likely to induce low-valence, high-arousal responses, and (3) Self-reported emotions often misalign with facial expression-based emotion detection, suggesting limitations in subjective reporting. By leveraging saliency-driven insights, this work provides a computationally efficient and interpretable alternative for emotion modeling, with implications for content creation, personalized media experiences, and affective computing research.

artificial intelligence, machine learning, salient region, (19 more...)

2505.19178

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceMar-13-2025

Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained Control

Chen, Hejia, Zhang, Haoxian, Zhang, Shoulong, Liu, Xiaoqiang, Zhuang, Sisi, Zhang, Yuan, Wan, Pengfei, Zhang, Di, Li, Shuai

Speech-driven 3D talking face method should offer both accurate lip synchronization and controllable expressions. Previous methods solely adopt discrete emotion labels to globally control expressions throughout sequences while limiting flexible fine-grained facial control within the spatiotemporal domain. We propose a diffusion-transformer-based 3D talking face generation model, Cafe-Talk, which simultaneously incorporates coarse- and fine-grained multimodal control conditions. Nevertheless, the entanglement of multiple conditions challenges achieving satisfying performance. To disentangle speech audio and fine-grained conditions, we employ a two-stage training pipeline. Specifically, Cafe-Talk is initially trained using only speech audio and coarse-grained conditions. Then, a proposed fine-grained control adapter gradually adds fine-grained instructions represented by action units (AUs), preventing unfavorable speech-lip synchronization. To disentangle coarse- and fine-grained conditions, we design a swap-label training mechanism, which enables the dominance of the fine-grained conditions. We also devise a mask-based CFG technique to regulate the occurrence and intensity of fine-grained control. In addition, a text-based detector is introduced with text-AU alignment to enable natural language user input and further support multimodal control. Extensive experimental results prove that Cafe-Talk achieves state-of-the-art lip synchronization and expressiveness performance and receives wide acceptance in fine-grained control in user studies. Project page: https://harryxd2018.github.io/cafe-talk/

facial movement, fine-grained condition, fine-grained control, (16 more...)

2503.14517

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Neural Information Processing SystemsJan-27-2025, 07:02:59 GMT

Review for NeurIPS paper: Knowledge Augmented Deep Neural Networks for Joint Facial Expression and Action Unit Recognition

Additional Feedback: The work is a good incremental step towards understanding the relationship of AU and FER, and their influence in detecting one over the other. Figure 1: I am assuming that the dotted lines represent back-propagation steps for each module. Please clarify this in the manuscript/Figure. Sec 3.1: The explanation on using the generic knowledge as probabilities is not unique ([b]), and the usage of limited 8 AUs (there are a lot more) is not justified. While generating Table 1, it is important to note that these numbers are taken from studies which explored more AUs than mentioned in the table.

emotionet, expression and action unit recognition, knowledge augmented deep neural network, (7 more...)

Country: North America > United States > Ohio (0.06)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Heisler, Marcel, Becker-Asano, Christian

Learning to Control an Android Robot Head for Facial Animation

arXiv.org Artificial IntelligenceDec-18-2024

The ability to display rich facial expressions is crucial for human-like robotic heads. While manually defining such expressions is intricate, there already exist approaches to automatically learn them. In this work one such approach is applied to evaluate and control a robot head different from the one in the original study. To improve the mapping of facial expressions from human actors onto a robot head, it is proposed to use 3D landmarks and their pairwise distances as input to the learning algorithm instead of the previously used facial action units. Participants of an online survey preferred mappings from our proposed approach in most cases, though there are still further improvements required.

artificial intelligence, machine learning, robot head, (14 more...)

doi: 10.1145/3610978.3640674

2412.13641

Country:

North America > United States > Colorado > Boulder County > Boulder (0.06)
Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.05)
Asia > China > Shaanxi Province > Xi'an (0.05)
(5 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Rabeyah, Abdullah Al, Góes, Fabrício, Volpe, Marco, Medeiros, Talles

Do LLMs Agree on the Creativity Evaluation of Alternative Uses?

arXiv.org Artificial IntelligenceNov-26-2024

This paper investigates whether large language models (LLMs) show agreement in assessing creativity in responses to the Alternative Uses Test (AUT). While LLMs are increasingly used to evaluate creative content, previous studies have primarily focused on a single model assessing responses generated by the same model or humans. This paper explores whether LLMs can impartially and accurately evaluate creativity in outputs generated by both themselves and other models. Using an oracle benchmark set of AUT responses, categorized by creativity level (common, creative, and highly creative), we experiment with four state-of-the-art LLMs evaluating these outputs. We test both scoring and ranking methods and employ two evaluation settings (comprehensive and segmented) to examine if LLMs agree on the creativity evaluation of alternative uses. Results reveal high inter-model agreement, with Spearman correlations averaging above 0.7 across models and reaching over 0.77 with respect to the oracle, indicating a high level of agreement and validating the reliability of LLMs in creativity assessment of alternative uses. Notably, models do not favour their own responses, instead they provide similar creativity assessment scores or rankings for alternative uses generated by other models. These findings suggest that LLMs exhibit impartiality and high alignment in creativity evaluation, offering promising implications for their use in automated creativity assessment.

alternative use, creativity, evaluation, (14 more...)

2411.1556

Country:

Europe > United Kingdom > England > Leicestershire > Leicester (0.04)
North America > Canada > Ontario (0.04)
Asia > Middle East > Saudi Arabia (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.83)