AITopics | vignette

Collaborating Authors

vignette

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

a2a7e58309d5190082390ff10ff3b2b8-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-17-2026, 03:10:07 GMT

information, large language model, machine learning, (21 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > Middle East > Jordan (0.04)
(6 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Law > Criminal Law (1.00)
Information Technology > Security & Privacy (1.00)
Banking & Finance (1.00)
(2 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media (1.00)
(5 more...)

Add feedback

Orchestrator Multi-Agent Clinical Decision Support System for Secondary Headache Diagnosis in Primary Care

Wu, Xizhi, Garduno-Rapp, Nelly Estefanie, Rousseau, Justin F, Thakkallapally, Mounika, Zhang, Hang, Ji, Yuelyu, Visweswaran, Shyam, Peng, Yifan, Wang, Yanshan

arXiv.org Artificial IntelligenceDec-10-2025

Unlike most primary headaches, secondary headaches need specialized care and can have devastating consequences if not treated promptly. Clinical guidelines highlight several 'red flag' features, such as thunderclap onset, meningismus, papilledema, focal neurologic deficits, signs of temporal arteritis, systemic illness, and the 'worst headache of their life' presentation. Despite these guidelines, determining which patients require urgent evaluation remains challenging in primary care settings. Clinicians often work with limited time, incomplete information, and diverse symptom presentations, which can lead to under-recognition and inappropriate care. We present a large language model (LLM)-based multi-agent clinical decision support system built on an orchestrator-specialist architecture, designed to perform explicit and interpretable secondary headache diagnosis from free-text clinical vignettes. The multi-agent system decomposes diagnosis into seven domain-specialized agents, each producing a structured and evidence-grounded rationale, while a central orchestrator performs task decomposition and coordinates agent routing. We evaluated the multi-agent system using 90 expert-validated secondary headache cases and compared its performance with a single-LLM baseline across two prompting strategies: question-based prompting (QPrompt) and clinical practice guideline-based prompting (GPrompt). We tested five open-source LLMs (Qwen-30B, GPT-OSS-20B, Qwen-14B, Qwen-8B, and Llama-3.1-8B), and found that the orchestrated multi-agent system with GPrompt consistently achieved the highest F1 scores, with larger gains in smaller models. These findings demonstrate that structured multi-agent reasoning improves accuracy beyond prompt engineering alone and offers a transparent, clinically aligned approach for explainable decision support in secondary headache diagnosis.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2512.04207

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Neurology > Headaches (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Left Leaning Models: How AI Evaluates Economic Policy?

Chupilkin, Maxim

arXiv.org Artificial IntelligenceDec-10-2025

Would artificial intelligence (AI) cut interest rates or adopt conservative monetary policy? Would it deregulate or opt for a more controlled economy? As AI use by economic policymakers, academics, and market participants grows exponentially, it is becoming critical to understand AI preferences over economic policy. However, these preferences are not yet systematically evaluated and remain a black box. This paper makes a conjoint experiment on leading large language models (LLMs) from OpenAI, Anthropic, and Google, asking them to evaluate economic policy under multi-factor constraints. The results are remarkably consistent across models: most LLMs exhibit a strong preference for high growth, low unemployment, and low inequality over traditional macroeconomic concerns such as low inflation and low public debt. Scenario-specific experiments show that LLMs are sensitive to context but still display strong preferences for low unemployment and low inequality even in monetary-policy settings. Numerical sensitivity tests reveal intuitive responses to quantitative changes but also uncover non-linear patterns such as loss aversion.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2507.15771

Country:

North America > United States (0.68)
Europe > United Kingdom > England (0.14)

Genre:

Research Report > Experimental Study (0.94)
Research Report > New Finding (0.68)

Industry: Banking & Finance > Economy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.36)

Add feedback

Echoes of AI Harms: A Human-LLM Synergistic Framework for Bias-Driven Harm Anticipation

Tantalaki, Nicoleta, Vei, Sophia, Vakali, Athena

arXiv.org Artificial IntelligenceDec-4-2025

The growing influence of Artificial Intelligence (AI) systems on decision-making in critical domains has exposed their potential to cause significant harms, often rooted in biases embedded across the AI lifecycle. While existing frameworks and taxonomies document bias or harms in isolation, they rarely establish systematic links between specific bias types and the harms they cause, particularly within real-world sociotechnical contexts. Technical fixes proposed to address AI biases are ill-equipped to address them and are typically applied after a system has been developed or deployed, offering limited preventive value. We propose ECHO, a novel framework for proactive AI harm anticipation through the systematic mapping of AI bias types to harm outcomes across diverse stakeholder and domain contexts. ECHO follows a modular workflow encompassing stakeholder identification, vignette-based presentation of biased AI systems, and dual (human-LLM) harm annotation, integrated within ethical matrices for structured interpretation. This human-centered approach enables early-stage detection of bias-to-harm pathways, guiding AI design and governance decisions from the outset. We validate ECHO in two high-stakes domains (disease diagnosis and hiring), revealing domain-specific, bias-to-harm patterns and demonstrating ECHO's potential to support anticipatory governance of AI systems

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2512.03068

Country:

Europe (1.00)
North America > United States > California (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.93)
Health & Medicine > Therapeutic Area (0.92)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Retrieval-Augmented Generation of Pediatric Speech-Language Pathology vignettes: A Proof-of-Concept Study

Liu, Yilan

arXiv.org Artificial IntelligenceNov-13-2025

Clinical vignettes are essential educational tools in speech-language pathology (SLP), but manual creation is time-intensive. While general-purpose large language models (LLMs) can generate text, they lack domain-specific knowledge, leading to hallucinations and requiring extensive expert revision. This study presents a proof-of-concept system integrating retrieval-augmented generation (RAG) with curated knowledge bases to generate pediatric SLP case materials. A multi-model RAG-based system was prototyped integrating curated domain knowledge with engineered prompt templates, supporting five commercial (GPT-4o, Claude 3.5 Sonnet, Gemini 2.5 Pro) and open-source (Llama 3.2, Qwen 2.5-7B) LLMs. Seven test scenarios spanning diverse disorder types and grade levels were systematically designed. Generated cases underwent automated quality assessment using a multi-dimensional rubric evaluating structural completeness, internal consistency, clinical appropriateness, and IEP goal/session note quality. This proof-of-concept demonstrates technical feasibility for RAG-augmented generation of pediatric SLP vignettes. Commercial models showed marginal quality advantages, but open-source alternatives achieved acceptable performance, suggesting potential for privacy-preserving institutional deployment. Integration of curated knowledge bases enabled content generation aligned with professional guidelines. Extensive validation through expert review, student pilot testing, and psychometric evaluation is required before educational or research implementation. Future applications may extend to clinical decision support, automated IEP goal generation, and clinical reflection training.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2511.086

Country: North America > United States (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Education > Educational Setting > K-12 Education (1.00)
Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (0.80)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Making LLMs Reliable When It Matters Most: A Five-Layer Architecture for High-Stakes Decisions

Jadad, Alejandro R.

arXiv.org Artificial IntelligenceNov-12-2025

Current large language models (LLMs) excel in verifiable domains where outputs can be checked before action but prove less reliable for high-stakes strategic decisions with uncertain outcomes. This gap, driven by mutually reinforcing cognitive biases in both humans and artificial intelligence (AI) systems, threatens the defensibility of valuations and sustainability of investments in the sector. This report describes a framework emerging from systematic qualitative assessment across 7 frontier-grade LLMs and 3 market-facing venture vignettes under time pressure. Detailed prompting specifying decision partnership and explicitly instructing avoidance of sycophancy, confabulation, solution drift, and nihilism achieved initial partnership state but failed to maintain it under operational pressure. Sustaining protective partnership state required an emergent 7-stage calibration sequence, built upon a 4-stage initialization process, within a 5-layer protection architecture enabling bias self-monitoring, human-AI adversarial challenge, partnership state verification, performance degradation detection, and stakeholder protection. Three discoveries resulted: partnership state is achievable through ordered calibration but requires emergent maintenance protocols; reliability degrades when architectural drift and context exhaustion align; and dissolution discipline prevents costly pursuit of fundamentally wrong directions. Cross-model validation revealed systematic performance differences across LLM architectures. This approach demonstrates that human-AI teams can achieve cognitive partnership capable of preventing avoidable regret in high-stakes decisions, addressing return-on-investment expectations that depend on AI systems supporting consequential decision-making without introducing preventable cognitive traps when verification arrives too late.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.07669

Country: North America > United States > California (0.46)

Genre: Research Report (1.00)

Industry:

Health & Medicine (1.00)
Banking & Finance > Trading (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AmarDoctor: An AI-Driven, Multilingual, Voice-Interactive Digital Health Application for Primary Care Triage and Patient Management to Bridge the Digital Health Divide for Bengali Speakers

Nahar, Nazmun, Ruparel, Ritesh Harshad, Kabir, Shariar, Khan, Sumaiya Tasnia, Saha, Shyamasree, Rashid, Mamunur

arXiv.org Artificial IntelligenceOct-30-2025

This study presents AmarDoctor, a multilingual voice-interactive digital health app designed to provide comprehensive patient triage and AI-driven clinical decision support for Bengali speakers, a population largely underserved in access to digital healthcare. AmarDoctor adopts a data-driven approach to strengthen primary care delivery and enable personalized health management. While platforms such as AdaHealth, WebMD, Symptomate, and K-Health have become popular in recent years, they mainly serve European demographics and languages. AmarDoctor addresses this gap with a dual-interface system for both patients and healthcare providers, supporting three major Bengali dialects. At its core, the patient module uses an adaptive questioning algorithm to assess symptoms and guide users toward the appropriate specialist. To overcome digital literacy barriers, it integrates a voice-interactive AI assistant that navigates users through the app services. Complementing this, the clinician-facing interface incorporates AI-powered decision support that enhances workflow efficiency by generating structured provisional diagnoses and treatment recommendations. These outputs inform key services such as e-prescriptions, video consultations, and medical record management. To validate clinical accuracy, the system was evaluated against a gold-standard set of 185 clinical vignettes developed by experienced physicians. Effectiveness was further assessed by comparing AmarDoctor performance with five independent physicians using the same vignette set. Results showed AmarDoctor achieved a top-1 diagnostic precision of 81.08 percent (versus physicians average of 50.27 percent) and a top specialty recommendation precision of 91.35 percent (versus physicians average of 62.6 percent).

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.24724

Country:

Asia (0.94)
Europe > United Kingdom (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Rheumatology (1.00)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (1.00)
(12 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.48)
(2 more...)

Add feedback

Choreographing Trash Cans: On Speculative Futures of Weak Robots in Public Spaces

Axelsson, Minja, Sikau, Lea Luka

arXiv.org Artificial IntelligenceOct-17-2025

Michio Okada first conceptualised "weak robots", which have limited capabilities themselves, and are framed as objects or "social others" which people are invited to assist and take care of. In Okada's work, such robots are used to invite pro-social behaviour from people, such as encouraging them to pick up trash to assist a trash can robot (Okada (2022)). We conceptualise human-robot interaction (HRI) as a stage where weak robots-- designed to be "cute" and vulnerable--play the role of incidental actors that subvert the person engaging with them. Caudwell and Lacey (2020) argue that cuteness as a design choice for robots can encourage users to trust and form relationships with those robots, which introduces ambivalent power dynamics through the production of intimacy . In fact, cuteness can also be seen as a deceptive or "dark" pattern, due to the utilisation of cuteness to prompt affective responses which can be used to collect emotional data, as well as some degree of reduction of user agency (Lacey and Caudwell (2019)). The ability and affordances of cute and weak robots to influence user behaviour merits the discussion of their ethicality, which we do in this paper through design fiction. Unlike traditional HRI research, often confined to laboratory settings, our focus is on spontaneous, real-world interactions that transform everyday environments into sites of performative potential. We argue that the theatricality of these encounters is central to understanding their impact: the presence of a weak and/or cute robot, such as the trash can robot, developed by Okada and the Interaction and Communication Design Lab of the T oyohashi University of T echnology, acts as a disruptive interloper that introduces an observer's effect and, thus, affects the human interlocutors. First, we examine the concept of weak robots through the lens of performativity theory as well as concepts of machine (dys)function.

artificial intelligence, robot, weak robot, (13 more...)

arXiv.org Artificial Intelligence

2510.1381

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Simulating Viva Voce Examinations to Evaluate Clinical Reasoning in Large Language Models

Chiu, Christopher, Pitis, Silviu, van der Schaar, Mihaela

arXiv.org Artificial IntelligenceOct-14-2025

Clinical reasoning in medicine is a hypothesis-driven process where physicians refine diagnoses from limited information through targeted history, physical examination, and diagnostic investigations. In contrast, current medical benchmarks for large language models (LLMs) primarily assess knowledge recall through single-turn questions, where complete clinical information is provided upfront. To address this gap, we introduce VivaBench, a multi-turn benchmark that evaluates sequential clinical reasoning in LLM agents. Our dataset consists of 1762 physician-curated clinical vignettes structured as interactive scenarios that simulate a (oral) examination in medical training, requiring agents to actively probe for relevant findings, select appropriate investigations, and synthesize information across multiple steps to reach a diagnosis. While current LLMs demonstrate competence in diagnosing conditions from well-described clinical presentations, their performance degrades significantly when required to navigate iterative diagnostic reasoning under uncertainty in our evaluation. Our analysis identified several failure modes that mirror common cognitive errors in clinical practice, including: (1) fixation on initial hypotheses, (2) inappropriate investigation ordering, (3) premature diagnostic closure, and (4) failing to screen for critical conditions. These patterns reveal fundamental limitations in how current LLMs reason and make decisions under uncertainty. Through VivaBench, we provide a standardized benchmark for evaluating conversational medical AI systems for real-world clinical decision support. Beyond medical applications, we contribute to the larger corpus of research on agentic AI by demonstrating how sequential reasoning trajectories can diverge in complex decision-making environments.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2510.10278

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.92)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Hematology (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action

Neural Information Processing SystemsOct-10-2025, 11:54:08 GMT

As language models (LMs) are widely utilized in personalized communication scenarios ( e.g., sending emails, writing social media posts) and endowed with a

information, privacylen, trajectory, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > Middle East > Jordan (0.04)
(6 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Law > Criminal Law (1.00)
Information Technology > Security & Privacy (1.00)
Banking & Finance (1.00)
(2 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback