Goto

Collaborating Authors

 bfi


Can LLMs Infer Personality from Real World Conversations?

Zhu, Jianfeng, Jin, Ruoming, Coifman, Karin G.

arXiv.org Artificial Intelligence

Large Language Models (LLMs) such as OpenAI's GPT-4 and Meta's LLaMA offer a promising approach for scalable personality assessment from open-ended language. However, inferring personality traits remains challenging, and earlier work often relied on synthetic data or social media text lacking psychometric validity. We introduce a real-world benchmark of 555 semi-structured interviews with BFI-10 self-report scores for evaluating LLM-based personality inference. Three state-of-the-art LLMs (GPT-4.1 Mini, Meta-LLaMA, and DeepSeek) were tested using zero-shot prompting for BFI-10 item prediction and both zero-shot and chain-of-thought prompting for Big Five trait inference. All models showed high test-retest reliability, but construct validity was limited: correlations with ground-truth scores were weak (max Pearson's $r = 0.27$), interrater agreement was low (Cohen's $κ< 0.10$), and predictions were biased toward moderate or high trait levels. Chain-of-thought prompting and longer input context modestly improved distributional alignment, but not trait-level accuracy. These results underscore limitations in current LLM-based personality inference and highlight the need for evidence-based development for psychological applications.


AI plundering scripts poses 'direct threat' to UK screen sector, says BFI

The Guardian

In a wide-ranging report analysing the benefits and threats posed by AI to the UK's film, TV, video game and visual special effects industries, the BFI also raises fears that automation will eliminate the entry-level jobs that bring in the next generation of workers. It says the "primary issue" facing the 125bn industry is the use of intellectual property (IP) to train generative AI models without payment to, or permission from, rights holders. The UK creative industries want to see an "opt-in" regime, forcing AI companies to seek permission and strike licensing deals before they can use content, and the government is currently in the process of considering what legislation to put in place. "AI offers significant opportunities for the screen sector such as speeding up production workflows, democratising content creation and empowering new voices," said Rishi Coupland, director of research and innovation at the BFI. "However, it could also erode traditional business models, displace skilled workers, and undermine public trust in screen content."


Illuminating the Black Box: A Psychometric Investigation into the Multifaceted Nature of Large Language Models

Lu, Yang, Yu, Jordan, Huang, Shou-Hsuan Stephen

arXiv.org Artificial Intelligence

This study explores the idea of AI Personality or AInality suggesting that Large Language Models (LLMs) exhibit patterns similar to human personalities. Assuming that LLMs share these patterns with humans, we investigate using human-centered psychometric tests such as the Myers-Briggs Type Indicator (MBTI), Big Five Inventory (BFI), and Short Dark Triad (SD3) to identify and confirm LLM personality types. By introducing role-play prompts, we demonstrate the adaptability of LLMs, showing their ability to switch dynamically between different personality types. Using projective tests, such as the Washington University Sentence Completion Test (WUSCT), we uncover hidden aspects of LLM personalities that are not easily accessible through direct questioning. Projective tests allowed for a deep exploration of LLMs cognitive processes and thought patterns and gave us a multidimensional view of AInality. Our machine learning analysis revealed that LLMs exhibit distinct AInality traits and manifest diverse personality types, demonstrating dynamic shifts in response to external instructions. This study pioneers the application of projective tests on LLMs, shedding light on their diverse and adaptable AInality traits.


Blockwise Feature Interaction in Recommendation Systems

Zhao, Weijie, Li, Ping

arXiv.org Artificial Intelligence

Feature interactions can play a crucial role in recommendation systems as they capture complex relationships between user preferences and item characteristics. Existing methods such as Deep & Cross Network (DCNv2) may suffer from high computational requirements due to their cross-layer operations. In this paper, we propose a novel approach called blockwise feature interaction (BFI) to help alleviate this issue. By partitioning the feature interaction process into smaller blocks, we can significantly reduce both the memory footprint and the computational burden. Four variants (denoted by P, Q, T, S, respectively) of BFI have been developed and empirically compared. Our experimental results demonstrate that the proposed algorithms achieves close accuracy compared to the standard DCNv2, while greatly reducing the computational overhead and the number of parameters. This paper contributes to the development of efficient recommendation systems by providing a practical solution for improving feature interaction efficiency.


Does GPT-3 Demonstrate Psychopathy? Evaluating Large Language Models from a Psychological Perspective

Li, Xingxuan, Li, Yutong, Joty, Shafiq, Liu, Linlin, Huang, Fei, Qiu, Lin, Bing, Lidong

arXiv.org Artificial Intelligence

In this work, we determined whether large language models (LLMs) are psychologically safe. We designed unbiased prompts to systematically evaluate LLMs from a psychological perspective. First, we tested three different LLMs by using two personality tests: Short Dark Triad (SD-3) and Big Five Inventory (BFI). All models scored higher than the human average on SD-3, suggesting a relatively darker personality pattern. Despite being instruction fine-tuned with safety metrics to reduce toxicity, InstructGPT and FLAN-T5 still showed implicit dark personality patterns; both models scored higher than self-supervised GPT-3 on the Machiavellianism and narcissism traits on SD-3. Then, we evaluated the LLMs in the GPT-3 series by using well-being tests to study the impact of fine-tuning with more training data. We observed a continuous increase in the well-being scores of GPT-3 and InstructGPT. Following these observations, we showed that instruction fine-tuning FLAN-T5 with positive answers from BFI could effectively improve the model from a psychological perspective. On the basis of the findings, we recommended the application of more systematic and comprehensive psychological metrics to further evaluate and improve the safety of LLMs.


MIT Robot Steals Human Brains to Help It Balance

AITopics Original Links

Based on every horror/sci-fi movie I've ever seen, squishing an actual fleshy human brain into a robot would make it unstoppable. Sooner or later, I'm sure someone is going try it for real. Until they do, what's almost as good is letting a robot borrow an actual fleshy human brain to help it balance and complete tasks requiring sensing and dexterity. It's like teleoperation, except the user's brain and body are controlling the robot directly, from inside a haptic suit. HERMES is a disaster response robot from MIT based on the Cheetah Robot, developed by Professor Sangbae Kim and his group at the MIT Biomimetic Robotics Lab.