interrogator
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Asia > Middle East > Jordan (0.04)
Normality and the Turing Test
This paper proposes to revisit the Turing test through the concept of normality. Its core argument is that the Turing test is a test of normal intelligence as assessed by a normal judge. First, in the sense that the Turing test targets normal/average rather than exceptional human intelligence, so that successfully passing the test requires machines to "make mistakes" and display imperfect behavior just like normal/average humans. Second, in the sense that the Turing test is a statistical test where judgments of intelligence are never carried out by a single "average" judge (understood as non-expert) but always by a full jury. As such, the notion of "average human interrogator" that Turing talks about in his original paper should be understood primarily as referring to a mathematical abstraction made of the normalized aggregate of individual judgments of multiple judges. Its conclusions are twofold. First, it argues that large language models such as ChatGPT are unlikely to pass the Turing test as those models precisely target exceptional rather than normal/average human intelligence. As such, they constitute models of what it proposes to call artificial smartness rather than artificial intelligence, insofar as they deviate from the original goal of Turing for the modeling of artificial minds. Second, it argues that the objectivization of normal human behavior in the Turing test fails due to the game configuration of the test which ends up objectivizing normative ideals of normal behavior rather than normal behavior per se.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- (12 more...)
PAKTON: A Multi-Agent Framework for Question Answering in Long Legal Agreements
Raptopoulos, Petros, Filandrianos, Giorgos, Lymperaiou, Maria, Stamou, Giorgos
Contract review is a complex and time-intensive task that typically demands specialized legal expertise, rendering it largely inaccessible to non-experts. Moreover, legal interpretation is rarely straightforward-ambiguity is pervasive, and judgments often hinge on subjective assessments. Compounding these challenges, contracts are usually confidential, restricting their use with proprietary models and necessitating reliance on open-source alternatives. To address these challenges, we introduce PAKTON: a fully open-source, end-to-end, multi-agent framework with plug-and-play capabilities. PAKTON is designed to handle the complexities of contract analysis through collaborative agent workflows and a novel retrieval-augmented generation (RAG) component, enabling automated legal document review that is more accessible, adaptable, and privacy-preserving. Experiments demonstrate that PAKTON outperforms both general-purpose and pretrained models in predictive accuracy, retrieval performance, explainability, completeness, and grounded justifications as evaluated through a human study and validated with automated metrics.
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > Mexico > Mexico City > Mexico City (0.04)
- North America > Dominican Republic (0.04)
- (6 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.67)
- Law (1.00)
- Information Technology > Security & Privacy (0.67)
ChatGPT passed the Turing Test. Now what?
ChatGPT passed the Turing Test. The AI fooled 73% of people into thinking it was human, raising new questions about machine intelligence. As artificial intelligence gets better and better, people face machines that look--and act--surprisingly human. Breakthroughs, discoveries, and DIY tips sent every weekday. It seems that every day brings a new headline about the burgeoning capabilities of large language models (LLMs) like ChatGPT and Google's Gemini--headlines that are either exciting or increasingly apocalyptic, depending on one's point of view. One particularly striking story arrived earlier this year: a paper that described how an LLM had passed the Turing Test, an experiment devised in the 1950s by computer science pioneer Alan Turing to determine whether machine intelligence could be distinguished from that of a human. The LLM in question was ChatGPT 4.5, and the paper found that it had been strikingly successful in fooling people into thinking it was human: In an experiment where participants were asked to choose whether the chatbot or an actual human was the real person, nearly three of the four chose the former.
- North America > United States > New York (0.04)
- North America > United States > Illinois (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Research Report (0.48)
- Personal > Honors (0.46)
Machine Learning-Driven Compensation for Non-Ideal Channels in AWG-Based FBG Interrogator
Kazakov, Ivan A., Kulichenko, Iana V., Kovalev, Egor E., Treskova, Angelina A., Barma, Daria D., Malakhov, Kirill M., Oseledets, Ivan V., Shipulin, Arkady V.
We present an experimental study of a fiber Bragg grating (FBG) interrogator based on a silicon oxynitride (SiON) photonic integrated arrayed waveguide grating (AWG). While AWG-based interrogators are compact and scalable, their practical performance is limited by non-ideal spectral responses. To address this, two calibration strategies within a 2.4 nm spectral region were compared: (1) a segmented analytical model based on a sigmoid fitting function, and (2) a machine learning (ML)-based regression model. The analytical method achieves a root mean square error (RMSE) of 7.11 pm within the calibrated range, while the ML approach based on exponential regression achieves 3.17 pm. Moreover, the ML model demonstrates generalization across an extended 2.9 nm wavelength span, maintaining sub-5 pm accuracy without re-fitting. Residual and error distribution analyses further illustrate the trade-offs between the two approaches. ML-based calibration provides a robust, data-driven alternative to analytical methods, delivering enhanced accuracy for non-ideal channel responses, reduced manual calibration effort, and improved scalability across diverse FBG sensor configurations.
- North America > United States > California > Yolo County > Davis (0.14)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.05)
- Asia > Russia (0.05)
- Research Report > New Finding (0.34)
- Research Report > Experimental Study (0.34)
Robots are now as intelligent as HUMANS, scientists say - as AI officially passes the famous 'Turing test'
Artificial intelligence (AI) chatbots like ChatGPT have been designed to replicate human speech as closely as possible to improve the user experience. But as AI gets more and more sophisticated, it's becoming difficult to discern these computerised models from real people. Now, scientists at University of California San Diego (UCSD) reveal that two of the leading chatbots have reached a major milestone. Both GPT, which powers OpenAI's ChatGPT, and LLaMa, which is behind Meta AI on WhatsApp and Facebook, have passed the famous Turing test. Devised by British WWII codebreaker Alan Turing Alan Turing in 1950, the Turing test or'imitation game' is a standard measure to test intelligence in a machine.
- Europe > United Kingdom (0.30)
- North America > United States > California > San Diego County > San Diego (0.25)
- Law (0.97)
- Information Technology > Security & Privacy (0.60)
- Government > Military (0.50)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Issues > Turing's Test (1.00)
Large Language Models Pass the Turing Test
Jones, Cameron R., Bergen, Benjamin K.
We evaluated 4 systems (ELIZA, GPT-4o, LLaMa-3.1-405B, and GPT-4.5) in two randomised, controlled, and pre-registered Turing tests on independent populations. Participants had 5 minute conversations simultaneously with another human participant and one of these systems before judging which conversational partner they thought was human. When prompted to adopt a humanlike persona, GPT-4.5 was judged to be the human 73% of the time: significantly more often than interrogators selected the real human participant. LLaMa-3.1, with the same prompt, was judged to be the human 56% of the time -- not significantly more or less often than the humans they were being compared to -- while baseline models (ELIZA and GPT-4o) achieved win rates significantly below chance (23% and 21% respectively). The results constitute the first empirical evidence that any artificial system passes a standard three-party Turing test. The results have implications for debates about what kind of intelligence is exhibited by Large Language Models (LLMs), and the social and economic impacts these systems are likely to have.
- Europe > Austria > Vienna (0.14)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > Netherlands > South Holland > Dordrecht (0.04)
- (2 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.95)
- Education > Educational Setting (0.67)
- Media > News (0.46)
- Leisure & Entertainment > Games (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Issues > Turing's Test (1.00)
ChatGPT-4 in the Turing Test: A Critical Analysis
This paper critically examines the recent publication "ChatGPT-4 in the Turing Test" by Restrepo Echavarr\'ia (2025), challenging its central claims regarding the absence of minimally serious test implementations and the conclusion that ChatGPT-4 fails the Turing Test. The analysis reveals that the criticisms based on rigid criteria and limited experimental data are not fully justified. More importantly, the paper makes several constructive contributions that enrich our understanding of Turing Test implementations. It demonstrates that two distinct formats--the three-player and two-player tests--are both valid, each with unique methodological implications. The work distinguishes between absolute criteria (reflecting an optimal 50% identification rate in a three-player format) and relative criteria (which measure how closely a machine's performance approximates that of a human), offering a more nuanced evaluation framework. Furthermore, the paper clarifies the probabilistic underpinnings of both test types by modeling them as Bernoulli experiments--correlated in the three-player version and uncorrelated in the two-player version. This formalization allows for a rigorous separation between the theoretical criteria for passing the test, defined in probabilistic terms, and the experimental data that require robust statistical methods for proper interpretation. In doing so, the paper not only refutes key aspects of the criticized study but also lays a solid foundation for future research on objective measures of how closely an AI's behavior aligns with, or deviates from, that of a human being.
- North America (0.14)
- Europe (0.14)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Issues > Turing's Test (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
The Imitation Game According To Turing
Temtsin, Sharon, Proudfoot, Diane, Kaber, David, Bartneck, Christoph
The current cycle of hype and anxiety concerning the benefits and risks to human society of Artificial Intelligence is fuelled, not only by the increasing use of generative AI and other AI tools by the general public, but also by claims made on behalf of such technology by popularizers and scientists. In particular, recent studies have claimed that Large Language Models (LLMs) can pass the Turing Test-a goal for AI since the 1950s-and therefore can "think". Large-scale impacts on society have been predicted as a result. Upon detailed examination, however, none of these studies has faithfully applied Turing's original instructions. Consequently, we conducted a rigorous Turing Test with GPT-4-Turbo that adhered closely to Turing's instructions for a three-player imitation game. We followed established scientific standards where Turing's instructions were ambiguous or missing. For example, we performed a Computer-Imitates-Human Game (CIHG) without constraining the time duration and conducted a Man-Imitates-Woman Game (MIWG) as a benchmark. All but one participant correctly identified the LLM, showing that one of today's most advanced LLMs is unable to pass a rigorous Turing Test. We conclude that recent extravagant claims for such models are unsupported, and do not warrant either optimism or concern about the social impact of thinking machines.
- Oceania > New Zealand > South Island > Canterbury Region > Christchurch (0.04)
- North America > United States > Oregon (0.04)
- Europe > United Kingdom (0.04)
- Europe > Netherlands > South Holland > Dordrecht (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Issues > Turing's Test (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)
AI-Driven Agents with Prompts Designed for High Agreeableness Increase the Likelihood of Being Mistaken for a Human in the Turing Test
León-Domínguez, U., Flores-Flores, E. D., García-Jasso, A. J., Gómez-Cuellar, M. K., Torres-Sánchez, D., Basora-Marimon, A.
Large Language Models based on transformer algorithms have revolutionized Artificial Intelligence by enabling verbal interaction with machines akin to human conversation. These AI agents have surpassed the Turing Test, achieving confusion rates up to 50%. However, challenges persist, especially with the advent of robots and the need to humanize machines for improved Human-AI collaboration. In this experiment, three GPT agents with varying levels of agreeableness (disagreeable, neutral, agreeable) based on the Big Five Inventory were tested in a Turing Test. All exceeded a 50% confusion rate, with the highly agreeable AI agent surpassing 60%. This agent was also recognized as exhibiting the most human-like traits. Various explanations in the literature address why these GPT agents were perceived as human, including psychological frameworks for understanding anthropomorphism. These findings highlight the importance of personality engineering as an emerging discipline in artificial intelligence, calling for collaboration with psychology to develop ergonomic psychological models that enhance system adaptability in collaborative activities.
- North America > Mexico > Nuevo León > Monterrey (0.05)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Italy > Umbria > Perugia Province > Perugia (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)