superstition
A Unified Representation Underlying the Judgment of Large Language Models
Lu, Yi-Long, Song, Jiajun, Wang, Wei
A central architectural question for both biological and artificial intelligence is whether judgment relies on specialized modules or a unified, domain-general resource. While the discovery of decodable neural representations for distinct concepts in Large Language Models (LLMs) has suggested a modular architecture, whether these representations are truly independent systems remains an open question. Here we provide evidence for a convergent architecture for evaluative judgment. Across a range of LLMs, we find that diverse evaluative judgments are computed along a dominant dimension, which we term the Valence-Assent Axis (VAA). This axis jointly encodes subjective valence ("what is good") and the model's assent to factual claims ("what is true"). Through direct interventions, we demonstrate this axis drives a critical mechanism, which is identified as the subordination of reasoning: the VAA functions as a control signal that steers the generative process to construct a rationale consistent with its evaluative state, even at the cost of factual accuracy. Our discovery offers a mechanistic account for response bias and hallucination, revealing how an architecture that promotes coherent judgment can systematically undermine faithful reasoning.
- North America > Mexico (0.04)
- Asia > East Asia (0.04)
- Pacific Ocean > North Pacific Ocean > San Francisco Bay > Golden Gate (0.04)
- (6 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Questionnaire & Opinion Survey (1.00)
Nunchi-Bench: Benchmarking Language Models on Cultural Reasoning with a Focus on Korean Superstition
As large language models (LLMs) become key advisors in various domains, their cultural sensitivity and reasoning skills are crucial in multicultural environments. We introduce Nunchi-Bench, a benchmark designed to evaluate LLMs' cultural understanding, with a focus on Korean superstitions. The benchmark consists of 247 questions spanning 31 topics, assessing factual knowledge, culturally appropriate advice, and situational interpretation. We evaluate multilingual LLMs in both Korean and English to analyze their ability to reason about Korean cultural contexts and how language variations affect performance. To systematically assess cultural reasoning, we propose a novel evaluation strategy with customized scoring metrics that capture the extent to which models recognize cultural nuances and respond appropriately. Our findings highlight significant challenges in LLMs' cultural reasoning. While models generally recognize factual information, they struggle to apply it in practical scenarios. Furthermore, explicit cultural framing enhances performance more effectively than relying solely on the language of the prompt. To support further research, we publicly release Nunchi-Bench alongside a leaderboard.
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- North America > Mexico > Mexico City > Mexico City (0.04)
- (5 more...)
Super-intelligence or Superstition? Exploring Psychological Factors Underlying Unwarranted Belief in AI Predictions
Lee, Eunhae, Pataranutaporn, Pat, Amores, Judith, Maes, Pattie
This study investigates psychological factors influencing belief in AI predictions about personal behavior, comparing it to belief in astrology and personality-based predictions. Through an experiment with 238 participants, we examined how cognitive style, paranormal beliefs, AI attitudes, personality traits, and other factors affect perceived validity, reliability, usefulness, and personalization of predictions from different sources. Our findings reveal that belief in AI predictions is positively correlated with belief in predictions based on astrology and personality psychology. Notably, paranormal beliefs and positive AI attitudes significantly increased perceived validity, reliability, usefulness, and personalization of AI predictions. Conscientiousness was negatively correlated with belief in predictions across all sources, and interest in the prediction topic increased believability across predictions. Surprisingly, cognitive style did not significantly influence belief in predictions. These results highlight the "rational superstition" phenomenon in AI, where belief is driven more by mental heuristics and intuition than critical evaluation. We discuss implications for designing AI systems and communication strategies that foster appropriate trust and skepticism. This research contributes to our understanding of the psychology of human-AI interaction and offers insights for the design and deployment of AI systems.
Do Large Language Models Latently Perform Multi-Hop Reasoning?
Yang, Sohee, Gribovskaya, Elena, Kassner, Nora, Geva, Mor, Riedel, Sebastian
We study whether Large Language Models (LLMs) latently perform multi-hop reasoning with complex prompts such as "The mother of the singer of 'Superstition' is". We look for evidence of a latent reasoning pathway where an LLM (1) latently identifies "the singer of 'Superstition'" as Stevie Wonder, the bridge entity, and (2) uses its knowledge of Stevie Wonder's mother to complete the prompt. We analyze these two hops individually and consider their co-occurrence as indicative of latent multi-hop reasoning. For the first hop, we test if changing the prompt to indirectly mention the bridge entity instead of any other entity increases the LLM's internal recall of the bridge entity. For the second hop, we test if increasing this recall causes the LLM to better utilize what it knows about the bridge entity. We find strong evidence of latent multi-hop reasoning for the prompts of certain relation types, with the reasoning pathway used in more than 80% of the prompts. However, the utilization is highly contextual, varying across different types of prompts. Also, on average, the evidence for the second hop and the full multi-hop traversal is rather moderate and only substantial for the first hop. Moreover, we find a clear scaling trend with increasing model size for the first hop of reasoning but not for the second hop. Our experimental findings suggest potential challenges and opportunities for future development and applications of LLMs.
- Africa > Middle East > Somalia (0.14)
- North America > Nicaragua (0.14)
- Europe > Finland (0.14)
- (30 more...)
- Research Report > New Finding (0.66)
- Research Report > Experimental Study (0.46)
- Media (1.00)
- Leisure & Entertainment (1.00)
- Government > Regional Government > North America Government (0.46)
"Unlocking the Potential of Machine Translation Through Dataset Training, Validation, and…
The coronavirus pandemic has changed the way we live, work, and interact with each other. We've all had to make adjustments to the way we do things, including the way we shop. We're now seeing a shift towards contactless and digital payments, which has made it easier for us to stay safe and healthy while still being able to purchase the items we need. Contactless payments have become increasingly popular during the pandemic and offer a range of benefits. Not only are they faster, more convenient, and more secure than traditional payment methods, but they also provide an extra layer of protection from the virus.
- Education (0.61)
- Information Technology > Services > e-Commerce Services (0.44)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.45)
- Information Technology > Artificial Intelligence > Natural Language > Generation (0.40)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.40)
Can neural networks have mental health problems?
Is the algorithm that runs the police surveillance system in my city paranoid? Marvin the android in Douglas Adams' Hitchhikers Guide to the Galaxy had a pain in all the diodes down his left-hand side. Is that how my toaster feels? This all sounds ludicrous until we realize that our algorithms are increasingly being made in our own image. As we've learned more about our own brains, we've enlisted that knowledge to create algorithmic versions of ourselves.
- North America > United States > New York (0.05)
- North America > United States > Massachusetts (0.05)
- Leisure & Entertainment > Games (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Artificial Intelligence: Master or Minion? -- By Peter Glaser on Goethe de
In 1997, British cyberneticist Kevin Warwick opened his book "March of the Machines" with a dark vision of the future. By the middle of the 21st century, Warwick predicted artificial intelligence (AI) network and superior robots would subjugate mankind, leaving humans to serve their machine masters solely as the chaos in the system. Will machines initially feel a sense of shame that they were the creations of human beings, like human being first reaction to learning of their ape ancestors? In the 1980s, American AI pioneer Edward Feigenbaum envisioned books communicating with one another in the libraries of tomorrow, autonomously propagating the knowledge they contained. "Maybe," his colleague Marvin Minsky commented, "they'll keep us as pets."
- North America > United States > New Hampshire (0.05)
- North America > United States > Massachusetts (0.05)
- Europe > Germany (0.05)
- Asia > Middle East > Saudi Arabia (0.05)
Apple HomePod, Amazon Echo, Google Home and more: We put 7 speakers to the test
For the last four weeks, I've been living in an Orwellian nightmare. One in which I have to watch every word I say because "they" are always listening. And by "they", I mean Alexa, Siri and Google. It seemed like a good idea - get seven smart speakers and test them in a real house to see how they affected our listening habits and daily routine. At times, they've been pretty helpful. If we're running low on biscuits, one of us can bark, "Hey Siri, add Hob Nobs to the shopping list" and a reminder appears on our phones.
- Europe > United Kingdom (0.05)
- Asia > Middle East > Yemen (0.05)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)