Goto

Collaborating Authors

 linda


Yes, Prime Minister, question order does matter -- and it's certainly not classical! But is it quantum?

Brody, Dorje C.

arXiv.org Artificial Intelligence

In an episode of the satirical British political sitcom Yes, Prime Minister from the 1980s, Sir Humphrey Appleby once explained to Bernard Woolley (two of the characters) how it is possible to get contradictory polling results by asking a series of leading questions beforehand. The polling discussed in the episode concerns whether the public is for or against the reintroduction of national service. Recently, the leading questions outlined by Appleby were put to the public by the market research and polling giant Ipsos, the findings of which have been made public to raise awareness of the fact that people can be misled by means of a such questions [1]. The actual experiment conducted by Ipsos is explained on their web site: "Ipsos interviewed a representative quota sample of 2,158 adults aged 16-75 in Great Britain. Half saw the'Sample A' questions, reflecting a positive view about national service. Half saw'Sample B', reflecting a negative view."


Analyzing Large language models chatbots: An experimental approach using a probability test

Peruchini, Melise, Teixeira, Julio Monteiro

arXiv.org Artificial Intelligence

This study consists of qualitative empirical research, conducted through exploratory tests with two different Large Language Models (LLMs) chatbots: ChatGPT and Gemini. The methodological procedure involved exploratory tests based on prompts designed with a probability question. The "Linda Problem", widely recognized in cognitive psychology, was used as a basis to create the tests, along with the development of a new problem specifically for this experiment, the "Mary Problem". The object of analysis is the dataset with the outputs provided by each chatbot interaction. The purpose of the analysis is to verify whether the chatbots mainly employ logical reasoning that aligns with probability theory or if they are more frequently affected by the stereotypical textual descriptions in the prompts. The findings provide insights about the approach each chatbot employs in handling logic and textual constructions, suggesting that, while the analyzed chatbots perform satisfactorily on a well-known probabilistic problem, they exhibit significantly lower performance on new tests that require direct application of probabilistic logic.


A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners

Jiang, Bowen, Xie, Yangxinyu, Hao, Zhuoqun, Wang, Xiaomeng, Mallick, Tanwi, Su, Weijie J., Taylor, Camillo J., Roth, Dan

arXiv.org Artificial Intelligence

This study introduces a hypothesis-testing framework to assess whether large language models (LLMs) possess genuine reasoning abilities or primarily depend on token bias. We go beyond evaluating LLMs on accuracy; rather, we aim to investigate their token bias in solving logical reasoning tasks. Specifically, we develop carefully controlled synthetic datasets, featuring conjunction fallacy and syllogistic problems. Our framework outlines a list of hypotheses where token biases are readily identifiable, with all null hypotheses assuming genuine reasoning capabilities of LLMs. The findings in this study suggest, with statistical guarantee, that most LLMs still struggle with logical reasoning. While they may perform well on classic problems, their success largely depends on recognizing superficial patterns with strong token bias, thereby raising concerns about their actual reasoning and generalization abilities.


AI tools like ChatGPT and Google's Gemini are 'irrational' and prone to making simple mistakes, study finds

Daily Mail - Science & tech

While you might expect AI to be the epitome of cold, logical reasoning, researchers now suggest that they might be even more illogical than humans. Researchers from University College London put seven of the top AIs through a series of classic tests designed to test human reasoning. Even the best-performing AIs were found to be irrational and prone to simple mistakes, with most models getting the answer wrong more than half the time. However, the researchers also found that these models weren't irrational in same way as a human while some even refused to answer logic questions on'ethical grounds'. Olivia Macmillan-Scott, a PhD student at UCL and lead author on the paper, says: 'Based on the results of our study and other research on Large Language Models, it's safe to say that these models do not'think' like humans yet.'


Analyizing the Conjunction Fallacy as a Fact

Veloz, Tomas, Sobetska, Olha

arXiv.org Artificial Intelligence

Since the seminal paper by Tversky and Kahneman, the conjunction fallacy has been the subject of multiple debates and become a fundamental challenge for cognitive theories in decision-making. In this article, we take a rather uncommon perspective on this phenomenon. Instead of trying to explain the nature or causes of the conjunction fallacy (intensional definition), we analyze its range of factual possibilities (extensional definition). We show that the majority of research on the conjunction fallacy, according to our sample of experiments reviewed which covers literature between 1983 and 2016, has focused on a narrow part of the a priori factual possibilities, implying that explanations of the conjunction fallacy are fundamentally biased by the short scope of possibilities explored. The latter is a rather curious aspect of the research evolution in the conjunction fallacy considering that the very nature of it is motivated by extensional considerations.


Causal Perception

Alvarez, Jose M., Ruggieri, Salvatore

arXiv.org Artificial Intelligence

Perception occurs when two individuals interpret the same information differently. Despite being a known phenomenon with implications for bias in decision-making, as individuals' experience determines interpretation, perception remains largely overlooked in automated decision-making (ADM) systems. In particular, it can have considerable effects on the fairness or fair usage of an ADM system, as fairness itself is context-specific and its interpretation dependent on who is judging. In this work, we formalize perception under causal reasoning to capture the act of interpretation by an individual. We also formalize individual experience as additional causal knowledge that comes with and is used by an individual. Further, we define and discuss loaded attributes, which are attributes prone to evoke perception. Sensitive attributes, such as gender and race, are clear examples of loaded attributes. We define two kinds of causal perception, unfaithful and inconsistent, based on the causal properties of faithfulness and consistency. We illustrate our framework through a series of decision-making examples and discuss relevant fairness applications. The goal of this work is to position perception as a parameter of interest, useful for extending the standard, single interpretation ADM problem formulation.


#NeurIPS2023 invited talk: Linda Smith on young humans and self-generated experience

AIHub

During the first four years of life, children can name and recognise over one thousand object categories, learn the syntax of their language, and absorb the cultural and social properties of where they grew up. By the age of three, they become one-shot learners in many domains. Linda's research focusses on cognitive development in young children, and she wants to understand the structure of experiences that gives rise to all of the knowledge that a child obtains in such a short time. To carry out her research, Linda studies the world from the learner's point of view, by using cameras, audio recorders and motion-tracking sensors to collect data from babies and young children. These sensors have facilitated different projects, including those that focus on recording 24 hours a day, as the child and their family go about their daily routine, and those that are more focussed data-collection sessions which take place in the laboratory.


FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions

Kim, Hyunwoo, Sclar, Melanie, Zhou, Xuhui, Bras, Ronan Le, Kim, Gunhee, Choi, Yejin, Sap, Maarten

arXiv.org Artificial Intelligence

Theory of mind (ToM) evaluations currently focus on testing models using passive narratives that inherently lack interactivity. We introduce FANToM, a new benchmark designed to stress-test ToM within information-asymmetric conversational contexts via question answering. Our benchmark draws upon important theoretical requisites from psychology and necessary empirical considerations when evaluating large language models (LLMs). In particular, we formulate multiple types of questions that demand the same underlying reasoning to identify illusory or false sense of ToM capabilities in LLMs. We show that FANToM is challenging for state-of-the-art LLMs, which perform significantly worse than humans even with chain-of-thought reasoning or fine-tuning.


'It was as if my father were actually texting me': grief in the age of AI

The Guardian

When Sunshine Henle's mother, Linda, died unexpectedly at the age of 72, Henle, a 42-year-old Floridian, was left with what she describes as a "gaping hole of silence" in her life. Even though Linda had lived in New York, where she worked as a Sunday school teacher, the pair had kept in constant contact through phone calls and texting. "I always knew she was there, no matter what – if I was upset, or if I just needed to talk. She would always respond," says Henle. In November, Linda collapsed in her home and was unable to move. Henle's brother Sam and her sister-in-law Julie took her to urgent care.


Walden University deploys new AI 'digital human' Linda that analyzes student gestures, talks and emotes

FOX News

Walden University students are actively using three AI tools, Linda, Charlotte and Julian to set themselves up for educational success. A Minnesota university is actively using several unique artificial intelligence (AI) models to help tutor students, complete assignments and bolster their verbal and non-verbal communication skills. Adtalem Chief Customer Officer Steve Tom has helped to deploy three distinct AI systems: Charlotte, Linda, and Julian at Walden University. The tools help counseling students prepare for their careers by working with "digital people" to cultivate communication and crisis management skills. Charlotte is a digital assistant chatbot that can help students stay on top of tasks and assignments to navigate a class curriculum efficiently.