AITopics | interrogation

Collaborating Authors

interrogation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Spilling the Beans: Teaching LLMs to Self-Report Their Hidden Objectives

Li, Chloe, Phuong, Mary, Tan, Daniel

arXiv.org Artificial IntelligenceDec-8-2025

As AI systems become more capable of complex agentic tasks, they also become more capable of pursuing undesirable objectives and causing harm. Previous work has attempted to catch these unsafe instances by interrogating models directly about their objectives and behaviors. However, the main weakness of trusting interrogations is that models can lie. We propose self-report fine-tuning (SRFT), a simple supervised fine-tuning technique that trains models to occasionally make factual mistakes, then admit them when asked. We show that the admission of factual errors in simple question-answering settings generalizes out-of-distribution (OOD) to the admission of hidden misaligned objectives in adversarial agentic settings. We evaluate SRFT in OOD stealth tasks, where models are instructed to complete a hidden misaligned objective alongside a user-specified objective without being caught by monitoring. After SRFT, models are more likely to confess the details of their hidden objectives when interrogated, even under strong pressure not to disclose them. Interrogation on SRFT models can detect hidden objectives with near-ceiling performance (F1 score = 0.98), while the baseline model lies when interrogated under the same conditions (F1 score = 0). Interrogation on SRFT models can further elicit the content of the hidden objective, recovering 28-100% details, compared to 0% details recovered in the baseline model and by prefilled assistant turn attacks. This provides a promising technique for promoting honesty propensity and incriminating misaligned AIs.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2511.06626

Genre: Research Report > New Finding (0.67)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

An Empirical Evaluation of AI-Powered Non-Player Characters' Perceived Realism and Performance in Virtual Reality Environments

Korkiakoski, Mikko, Sheikhi, Saeid, Nyman, Jesper, Saariniemi, Jussi, Tapio, Kalle, Kostakos, Panos

arXiv.org Artificial IntelligenceJul-15-2025

Advancements in artificial intelligence (AI) have significantly enhanced the realism and interactivity of non-player characters (NPCs) in virtual reality (VR), creating more engaging and believable user experiences. This paper evaluates AI-driven NPCs within a VR interrogation simulator, focusing on their perceived realism, usability, and system performance. The simulator features two AI-powered NPCs, a suspect, and a partner, using GPT-4 Turbo to engage participants in a scenario to determine the suspect's guilt or innocence. A user study with 18 participants assessed the system using the System Usability Scale (SUS), Game Experience Questionnaire (GEQ), and a Virtual Agent Believability Questionnaire, alongside latency measurements for speech-to-text (STT), text-to-speech (TTS), OpenAI GPT-4 Turbo, and overall (cycle) latency. Results showed an average cycle latency of 7 seconds, influenced by the increasing conversational context. Believability scored 6.67 out of 10, with high ratings in behavior, social relationships, and intelligence but moderate scores in emotion and personality. The system achieved a SUS score of 79.44, indicating good usability. These findings demonstrate the potential of large language models to improve NPC realism and interaction in VR while highlighting challenges in reducing system latency and enhancing emotional depth. This research contributes to the development of more sophisticated AI-driven NPCs, revealing the need for performance optimization to achieve increasingly immersive virtual experiences.

large language model, latency, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2507.10469

Country: Europe (0.14)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)

Add feedback

From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration

Song, Mingyang, Qu, Xiaoye, Zhou, Jiawei, Cheng, Yu

arXiv.org Artificial IntelligenceMar-18-2025

Large Vision-Language Models (LVLMs) have achieved significant progress in combining visual comprehension with language generation. Despite this success, the training data of LVLMs still suffers from Long-Tail (LT) problems, where the data distribution is highly imbalanced. Previous works have mainly focused on traditional VLM architectures, i.e., CLIP or ViT, and specific tasks such as recognition and classification. Nevertheless, the exploration of LVLM (e.g. LLaVA) and more general tasks (e.g. Visual Question Answering and Visual Reasoning) remains under-explored. In this paper, we first conduct an in-depth analysis of the LT issues in LVLMs and identify two core causes: the overrepresentation of head concepts and the underrepresentation of tail concepts. Based on the above observation, we propose an $\textbf{A}$daptive $\textbf{D}$ata $\textbf{R}$efinement Framework ($\textbf{ADR}$), which consists of two stages: $\textbf{D}$ata $\textbf{R}$ebalancing ($\textbf{DR}$) and $\textbf{D}$ata $\textbf{S}$ynthesis ($\textbf{DS}$). In the DR stage, we adaptively rebalance the redundant data based on entity distributions, while in the DS stage, we leverage Denoising Diffusion Probabilistic Models (DDPMs) and scarce images to supplement underrepresented portions. Through comprehensive evaluations across eleven benchmarks, our proposed ADR effectively mitigates the long-tail problem in the training data, improving the average performance of LLaVA 1.5 relatively by 4.36%, without increasing the training data volume.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.12821

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > New York > Suffolk County > Stony Brook (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.81)

Industry: Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)

Add feedback

Conversation Games and a Strategic View of the Turing Test

Aryan, Kaveh

arXiv.org Artificial IntelligenceJan-30-2025

Although many game-theoretic models replicate real interactions that often rely on natural language, explicit study of games where language is central to strategic interaction remains limited. This paper introduces the \emph{conversation game}, a multi-stage, extensive-form game based on linguistic strategic interaction. We focus on a subset of the games, called verdict games. In a verdict game, two players alternate to contribute to a conversation, which is evaluated at each stage by a non-strategic judge who may render a conclusive binary verdict, or a decision to continue the dialogue. The game ends once a limit is reached or a verdict is given. We show many familiar processes, such as interrogation or a court process fall under this category. We also, show that the Turing test is an instance of verdict game, and discuss the significance of a strategic view of the Turing test in the age of advanced AI deception. We show the practical relevance of the proposed concepts by simulation experiments, and show that a strategic agent outperforms a naive agent by a high margin.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2501.18455

Country:

North America > Mexico > Mexico City > Mexico City (0.04)
Europe > Middle East > Malta > Eastern Region > Northern Harbour District > St. Julian's (0.04)
Europe > Czechia > Prague (0.04)
Asia > Middle East > Israel > Southern District > Eilat (0.04)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games (1.00)
Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Turing's Test (0.83)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)

Add feedback

LocalValueBench: A Collaboratively Built and Extensible Benchmark for Evaluating Localized Value Alignment and Ethical Safety in Large Language Models

Meadows, Gwenyth Isobel, Lau, Nicholas Wai Long, Susanto, Eva Adelina, Yu, Chi Lok, Paul, Aditya

arXiv.org Artificial IntelligenceJul-27-2024

The proliferation of large language models (LLMs) requires robust evaluation of their alignment with local values and ethical standards, especially as existing benchmarks often reflect the cultural, legal, and ideological values of their creators. \textsc{LocalValueBench}, introduced in this paper, is an extensible benchmark designed to assess LLMs' adherence to Australian values, and provides a framework for regulators worldwide to develop their own LLM benchmarks for local value alignment. Employing a novel typology for ethical reasoning and an interrogation approach, we curated comprehensive questions and utilized prompt engineering strategies to probe LLMs' value alignment. Our evaluation criteria quantified deviations from local values, ensuring a rigorous assessment process. Comparative analysis of three commercial LLMs by USA vendors revealed significant insights into their effectiveness and limitations, demonstrating the critical importance of value alignment. This study offers valuable tools and methodologies for regulators to create tailored benchmarks, highlighting avenues for future research to enhance ethical AI development.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2408.0146

Country:

North America > United States (0.25)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
Oceania > Australia > Queensland > Brisbane (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry:

Law (0.92)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Can Deception Detection Go Deeper? Dataset, Evaluation, and Benchmark for Deception Reasoning

Chen, Kang, Lian, Zheng, Sun, Haiyang, Liu, Bin, Tao, Jianhua

arXiv.org Artificial IntelligenceJun-16-2024

Deception detection has attracted increasing attention due to its importance in real-world scenarios. Its main goal is to detect deceptive behaviors from multimodal clues such as gestures, facial expressions, prosody, etc. However, these bases are usually subjective and related to personal habits. Therefore, we extend deception detection to deception reasoning, further providing objective evidence to support subjective judgment. Specifically, we provide potential lies and basic facts and then analyze why this sentence may be a lie by combining factual inconsistencies and intent behind them. Compared with deception detection, this task is more applicable to real-world scenarios. For example, in interrogation, the police should judge whether a person is lying based on solid evidence. This paper presents our initial attempts at this task, including constructing a dataset and defining evaluation metrics. Meanwhile, this task can serve as a benchmark for evaluating the complex reasoning capability of large language models. Code and data will be made publicly available.

deception reasoning, dialogue, reasoning, (12 more...)

arXiv.org Artificial Intelligence

2402.11432

Country: Asia > China > Beijing > Beijing (0.04)

Genre:

Research Report (1.00)
Overview (0.93)
Personal > Interview (0.68)

Industry:

Law (1.00)
Health & Medicine (0.68)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.37)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Make Them Spill the Beans! Coercive Knowledge Extraction from (Production) LLMs

Zhang, Zhuo, Shen, Guangyu, Tao, Guanhong, Cheng, Siyuan, Zhang, Xiangyu

arXiv.org Artificial IntelligenceDec-7-2023

Large Language Models (LLMs) are now widely used in various applications, making it crucial to align their ethical standards with human values. However, recent jail-breaking methods demonstrate that this alignment can be undermined using carefully constructed prompts. In our study, we reveal a new threat to LLM alignment when a bad actor has access to the model's output logits, a common feature in both open-source LLMs and many commercial LLM APIs (e.g., certain GPT models). It does not rely on crafting specific prompts. Instead, it exploits the fact that even when an LLM rejects a toxic request, a harmful response often hides deep in the output logits. By forcefully selecting lower-ranked output tokens during the auto-regressive generation process at a few critical output positions, we can compel the model to reveal these hidden responses. We term this process model interrogation. This approach differs from and outperforms jail-breaking methods, achieving 92% effectiveness compared to 62%, and is 10 to 20 times faster. The harmful content uncovered through our method is more relevant, complete, and clear. Additionally, it can complement jail-breaking strategies, with which results in further boosting attack performance. Our findings indicate that interrogation can extract toxic knowledge even from models specifically designed for coding tasks.

arxiv preprint arxiv, llm, toxic question, (13 more...)

arXiv.org Artificial Intelligence

2312.04782

Country:

North America > United States (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Consumer Health (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Right to Not Have Your Mind Read

The Atlantic - TechnologyAug-21-2023, 12:00:00 GMT

Jared Genser in many ways fits a certain Washington, D.C., type. He wears navy suits and keeps his hair cut short. He graduated from a top law school, joined a large firm, and made partner at 40. Eventually, he became disenchanted with big law and started his own boutique practice with offices off--where else--Dupont Circle. What distinguishes Genser from the city's other 50-something lawyers is his unusual clientele: He represents high-value political prisoners.

brain data, genser, yuste, (15 more...)

The Atlantic - Technology

Country:

South America > Chile (0.30)
North America > United States > District of Columbia > Washington (0.24)
South America > Brazil (0.04)
(6 more...)

Genre: Personal (0.69)

Industry:

Law (1.00)
Health & Medicine > Health Care Technology (0.84)
Government > Regional Government (0.69)
(2 more...)

Technology:

Information Technology > Communications > Social Media (0.49)
Information Technology > Artificial Intelligence > Machine Learning (0.47)

Add feedback

The Human-or-Machine Matter: Turing-Inspired Reflections on an Everyday Issue

Harel, David, Marron, Assaf

arXiv.org Artificial IntelligenceAug-2-2023

In his seminal paper ``Computing Machinery and Intelligence'', Alan Turing introduced the ``imitation game'' as part of exploring the concept of machine intelligence. The Turing Test has since been the subject of much analysis, debate, refinement and extension. Here we sidestep the question of whether a particular machine can be labeled intelligent, or can be said to match human capabilities in a given context. Instead, we first draw attention to the seemingly simpler question a person may ask themselves in an everyday interaction: ``Am I interacting with a human or with a machine?''. We then shift the focus from seeking a method for eliciting the answer, and, rather, reflect upon the importance and significance of this Human-or-Machine question and the use one may make of a reliable answer thereto. Whereas Turing's original test is widely considered to be more of a thought experiment, the Human-or-Machine matter as discussed here has obvious practical relevance. While it is still unclear if and when machines will be able to mimic human behavior with high fidelity in everyday contexts, we argue that near-term exploration of the issues raised here can contribute to refinement of methods for developing computerized systems, and may also lead to new insights into fundamental characteristics of human behavior.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.04312

Country:

Asia > Middle East > Israel (0.04)
North America > United States > Hawaii (0.04)
Asia > China (0.04)

Genre: Research Report (0.50)

Industry:

Health & Medicine (1.00)
Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(5 more...)

Add feedback

Oracle Computability and Turing Reducibility in the Calculus of Inductive Constructions

Forster, Yannick, Kirst, Dominik, Mück, Niklas

arXiv.org Artificial IntelligenceJul-28-2023

We develop synthetic notions of oracle computability and Turing reducibility in the Calculus of Inductive Constructions (CIC), the constructive type theory underlying the Coq proof assistant. As usual in synthetic approaches, we employ a definition of oracle computations based on meta-level functions rather than object-level models of computation, relying on the fact that in constructive systems such as CIC all definable functions are computable by construction. Such an approach lends itself well to machine-checked proofs, which we carry out in Coq. There is a tension in finding a good synthetic rendering of the higher-order notion of oracle computability. On the one hand, it has to be informative enough to prove central results, ensuring that all notions are faithfully captured. On the other hand, it has to be restricted enough to benefit from axioms for synthetic computability, which usually concern first-order objects. Drawing inspiration from a definition by Andrej Bauer based on continuous functions in the effective topos, we use a notion of sequential continuity to characterise valid oracle computations. As main technical results, we show that Turing reducibility forms an upper semilattice, transports decidability, and is strictly more expressive than truth-table reducibility, and prove that whenever both a predicate $p$ and its complement are semi-decidable relative to an oracle $q$, then $p$ Turing-reduces to $q$.

artificial intelligence, logic & formal reasoning, logic programming, (14 more...)

arXiv.org Artificial Intelligence

2307.15543

Country:

North America > United States > Wisconsin (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York (0.04)
(4 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.49)

Add feedback