linda
Yes, Prime Minister, question order does matter -- and it's certainly not classical! But is it quantum?
In an episode of the satirical British political sitcom Yes, Prime Minister from the 1980s, Sir Humphrey Appleby once explained to Bernard Woolley (two of the characters) how it is possible to get contradictory polling results by asking a series of leading questions beforehand. The polling discussed in the episode concerns whether the public is for or against the reintroduction of national service. Recently, the leading questions outlined by Appleby were put to the public by the market research and polling giant Ipsos, the findings of which have been made public to raise awareness of the fact that people can be misled by means of a such questions [1]. The actual experiment conducted by Ipsos is explained on their web site: "Ipsos interviewed a representative quota sample of 2,158 adults aged 16-75 in Great Britain. Half saw the'Sample A' questions, reflecting a positive view about national service. Half saw'Sample B', reflecting a negative view."
- North America > United States (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > United Kingdom > England > Surrey > Guildford (0.04)
- Asia > India > West Bengal > Kolkata (0.04)
Analyzing Large language models chatbots: An experimental approach using a probability test
Peruchini, Melise, Teixeira, Julio Monteiro
This study consists of qualitative empirical research, conducted through exploratory tests with two different Large Language Models (LLMs) chatbots: ChatGPT and Gemini. The methodological procedure involved exploratory tests based on prompts designed with a probability question. The "Linda Problem", widely recognized in cognitive psychology, was used as a basis to create the tests, along with the development of a new problem specifically for this experiment, the "Mary Problem". The object of analysis is the dataset with the outputs provided by each chatbot interaction. The purpose of the analysis is to verify whether the chatbots mainly employ logical reasoning that aligns with probability theory or if they are more frequently affected by the stereotypical textual descriptions in the prompts. The findings provide insights about the approach each chatbot employs in handling logic and textual constructions, suggesting that, while the analyzed chatbots perform satisfactorily on a well-known probabilistic problem, they exhibit significantly lower performance on new tests that require direct application of probabilistic logic.
- South America > Brazil > Santa Catarina (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- (3 more...)
A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners
Jiang, Bowen, Xie, Yangxinyu, Hao, Zhuoqun, Wang, Xiaomeng, Mallick, Tanwi, Su, Weijie J., Taylor, Camillo J., Roth, Dan
This study introduces a hypothesis-testing framework to assess whether large language models (LLMs) possess genuine reasoning abilities or primarily depend on token bias. We go beyond evaluating LLMs on accuracy; rather, we aim to investigate their token bias in solving logical reasoning tasks. Specifically, we develop carefully controlled synthetic datasets, featuring conjunction fallacy and syllogistic problems. Our framework outlines a list of hypotheses where token biases are readily identifiable, with all null hypotheses assuming genuine reasoning capabilities of LLMs. The findings in this study suggest, with statistical guarantee, that most LLMs still struggle with logical reasoning. While they may perform well on classic problems, their success largely depends on recognizing superficial patterns with strong token bias, thereby raising concerns about their actual reasoning and generalization abilities.
- Europe > United Kingdom > England > Greater London > London > Wimbledon (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > New York (0.04)
- (2 more...)
- Media > News (0.68)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.67)
- Government > Voting & Elections (0.67)
AI tools like ChatGPT and Google's Gemini are 'irrational' and prone to making simple mistakes, study finds
While you might expect AI to be the epitome of cold, logical reasoning, researchers now suggest that they might be even more illogical than humans. Researchers from University College London put seven of the top AIs through a series of classic tests designed to test human reasoning. Even the best-performing AIs were found to be irrational and prone to simple mistakes, with most models getting the answer wrong more than half the time. However, the researchers also found that these models weren't irrational in same way as a human while some even refused to answer logic questions on'ethical grounds'. Olivia Macmillan-Scott, a PhD student at UCL and lead author on the paper, says: 'Based on the results of our study and other research on Large Language Models, it's safe to say that these models do not'think' like humans yet.'
Analyizing the Conjunction Fallacy as a Fact
Since the seminal paper by Tversky and Kahneman, the conjunction fallacy has been the subject of multiple debates and become a fundamental challenge for cognitive theories in decision-making. In this article, we take a rather uncommon perspective on this phenomenon. Instead of trying to explain the nature or causes of the conjunction fallacy (intensional definition), we analyze its range of factual possibilities (extensional definition). We show that the majority of research on the conjunction fallacy, according to our sample of experiments reviewed which covers literature between 1983 and 2016, has focused on a narrow part of the a priori factual possibilities, implying that explanations of the conjunction fallacy are fundamentally biased by the short scope of possibilities explored. The latter is a rather curious aspect of the research evolution in the conjunction fallacy considering that the very nature of it is motivated by extensional considerations.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Saxony > Leipzig (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (2 more...)
Causal Perception
Alvarez, Jose M., Ruggieri, Salvatore
Perception occurs when two individuals interpret the same information differently. Despite being a known phenomenon with implications for bias in decision-making, as individuals' experience determines interpretation, perception remains largely overlooked in automated decision-making (ADM) systems. In particular, it can have considerable effects on the fairness or fair usage of an ADM system, as fairness itself is context-specific and its interpretation dependent on who is judging. In this work, we formalize perception under causal reasoning to capture the act of interpretation by an individual. We also formalize individual experience as additional causal knowledge that comes with and is used by an individual. Further, we define and discuss loaded attributes, which are attributes prone to evoke perception. Sensitive attributes, such as gender and race, are clear examples of loaded attributes. We define two kinds of causal perception, unfaithful and inconsistent, based on the causal properties of faithfulness and consistency. We illustrate our framework through a series of decision-making examples and discuss relevant fairness applications. The goal of this work is to position perception as a parameter of interest, useful for extending the standard, single interpretation ADM problem formulation.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)
- Asia > Middle East > Jordan (0.04)
- (2 more...)
- Government (0.68)
- Education > Educational Setting (0.46)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.46)
#NeurIPS2023 invited talk: Linda Smith on young humans and self-generated experience
During the first four years of life, children can name and recognise over one thousand object categories, learn the syntax of their language, and absorb the cultural and social properties of where they grew up. By the age of three, they become one-shot learners in many domains. Linda's research focusses on cognitive development in young children, and she wants to understand the structure of experiences that gives rise to all of the knowledge that a child obtains in such a short time. To carry out her research, Linda studies the world from the learner's point of view, by using cameras, audio recorders and motion-tracking sensors to collect data from babies and young children. These sensors have facilitated different projects, including those that focus on recording 24 hours a day, as the child and their family go about their daily routine, and those that are more focussed data-collection sessions which take place in the laboratory.
- North America > United States > Indiana > Monroe County > Bloomington (0.05)
- Asia > India > Tamil Nadu > Chennai (0.05)
FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions
Kim, Hyunwoo, Sclar, Melanie, Zhou, Xuhui, Bras, Ronan Le, Kim, Gunhee, Choi, Yejin, Sap, Maarten
Theory of mind (ToM) evaluations currently focus on testing models using passive narratives that inherently lack interactivity. We introduce FANToM, a new benchmark designed to stress-test ToM within information-asymmetric conversational contexts via question answering. Our benchmark draws upon important theoretical requisites from psychology and necessary empirical considerations when evaluating large language models (LLMs). In particular, we formulate multiple types of questions that demand the same underlying reasoning to identify illusory or false sense of ToM capabilities in LLMs. We show that FANToM is challenging for state-of-the-art LLMs, which perform significantly worse than humans even with chain-of-thought reasoning or fine-tuning.
- Research Report (0.82)
- Personal > Interview (0.46)
- Education (1.00)
- Health & Medicine > Therapeutic Area (0.68)
- Health & Medicine > Consumer Health (0.46)
'It was as if my father were actually texting me': grief in the age of AI
When Sunshine Henle's mother, Linda, died unexpectedly at the age of 72, Henle, a 42-year-old Floridian, was left with what she describes as a "gaping hole of silence" in her life. Even though Linda had lived in New York, where she worked as a Sunday school teacher, the pair had kept in constant contact through phone calls and texting. "I always knew she was there, no matter what – if I was upset, or if I just needed to talk. She would always respond," says Henle. In November, Linda collapsed in her home and was unable to move. Henle's brother Sam and her sister-in-law Julie took her to urgent care.
- North America > United States > New York (0.24)
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
- North America > Canada > Ontario > National Capital Region > Ottawa (0.04)
- (3 more...)
- Telecommunications (0.71)
- Health & Medicine (0.71)
- Media (0.47)
Walden University deploys new AI 'digital human' Linda that analyzes student gestures, talks and emotes
Walden University students are actively using three AI tools, Linda, Charlotte and Julian to set themselves up for educational success. A Minnesota university is actively using several unique artificial intelligence (AI) models to help tutor students, complete assignments and bolster their verbal and non-verbal communication skills. Adtalem Chief Customer Officer Steve Tom has helped to deploy three distinct AI systems: Charlotte, Linda, and Julian at Walden University. The tools help counseling students prepare for their careers by working with "digital people" to cultivate communication and crisis management skills. Charlotte is a digital assistant chatbot that can help students stay on top of tasks and assignments to navigate a class curriculum efficiently.
- North America > United States > Minnesota (0.25)
- North America > United States > New York (0.05)