AITopics | adjudicator

Collaborating Authors

adjudicator

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Self-Improving Architecture for Dynamic Safety in Large Language Models

Slater, Tyler

arXiv.org Artificial IntelligenceNov-12-2025

Context: The integration of Large Language Models (LLMs) into core software systems is accelerating. However, existing software architecture patterns are static, while current safety assurance methods are not scalable, leaving systems vulnerable to novel adversarial threats. Objective: To design, implement, and evaluate a novel software architecture that enables an AI-driven system to autonomously and continuously adapt its own safety protocols at runtime. Method: We propose the Self-Improving Safety Framework (SISF), a runtime architecture that couples an unprotected, unaligned base LLM (mistralai/Mistral-7B-v0.1) with a dynamic feedback loop. This loop consists of an AI Adjudicator (GPT-4o) for breach detection and a Policy Synthesis Module (GPT-4 Turbo) that autonomously generates new, generalized safety policies (both heuristic and semantic) in response to failures. Results: We conducted a dynamic learning evaluation using the 520-prompt AdvBench dataset. The unprotected model was 100% vulnerable. Our SISF, starting from zero policies, demonstrated a clear learning curve: it detected 237 breaches, autonomously synthesized 234 new policies, and reduced the overall Attack Success Rate (ASR) to 45.58%. In a subsequent test on 520 benign prompts, the SISF achieved a 0.00% False Positive Rate (FPR), proving its ability to adapt without compromising user utility. Conclusion: An architectural approach to AI safety, based on the principles of self-adaptation, is a viable and effective strategy. Our framework demonstrates a practical path towards building more robust, resilient, and scalable AI-driven systems, shifting safety assurance from a static, pre-deployment activity to an automated, runtime process.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2511.07645

Genre: Research Report > New Finding (0.69)

Industry: Information Technology > Security & Privacy (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

Shall We Play a Game? Language Models for Open-ended Wargames

Matlin, Glenn, Mahajan, Parv, Song, Isaac, Hao, Yixiong, Bard, Ryan, Topp, Stu, Montoya, Evan, Parwani, M. Rehan, Shetty, Soham, Riedl, Mark

arXiv.org Artificial IntelligenceOct-24-2025

Wargames are simulations of conflicts in which participants' decisions influence future events. While casual wargaming can be used for entertainment or socialization, serious wargaming is used by experts to explore strategic implications of decision-making and experiential learning. In this paper, we take the position that Artificial Intelligence (AI) systems, such as Language Models (LMs), are rapidly approaching human-expert capability for strategic planning -- and will one day surpass it. Military organizations have begun using LMs to provide insights into the consequences of real-world decisions during _open-ended wargames_ which use natural language to convey actions and outcomes. We argue the ability for AI systems to influence large-scale decisions motivates additional research into the safety, interpretability, and explainability of AI in open-ended wargames. To demonstrate, we conduct a scoping literature review with a curated selection of 100 unclassified studies on AI in wargames, and construct a novel ontology of open-endedness using the creativity afforded to players, adjudicators, and the novelty provided to observers. Drawing from this body of work, we distill a set of practical recommendations and critical safety considerations for deploying AI in open-ended wargames across common domains. We conclude by presenting the community with a set of high-impact open research challenges for future work.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.17192

Country:

North America > Canada (1.00)
Europe (1.00)
Asia (1.00)
North America > United States > California (0.67)

Genre:

Overview (1.00)
Research Report (0.82)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Revisiting Prompt Optimization with Large Reasoning Models-A Case Study on Event Extraction

Srivastava, Saurabh, Yao, Ziyu

arXiv.org Artificial IntelligenceOct-17-2025

Large Reasoning Models (LRMs) such as DeepSeek-R1 and OpenAI o1 have demonstrated remarkable capabilities in various reasoning tasks. Their strong capability to generate and reason over intermediate thoughts has also led to arguments that they may no longer require extensive prompt engineering or optimization to interpret human instructions and produce accurate outputs. In this work, we aim to systematically study this open question, using the structured task of event extraction for a case study. We experimented with two LRMs (DeepSeek-R1 and o1) and two general-purpose Large Language Models (LLMs) (GPT-4o and GPT-4.5), when they were used as task models or prompt optimizers. Our results show that on tasks as complicated as event extraction, LRMs as task models still benefit from prompt optimization, and that using LRMs as prompt optimizers yields more effective prompts. Our finding also generalizes to tasks beyond event extraction. Finally, we provide an error analysis of common errors made by LRMs and highlight the stability and consistency of LRMs in refining task instructions and event guidelines.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2504.07357

Country:

North America > United States (1.00)
Asia > Middle East (0.67)
Europe (0.67)

Genre: Research Report > New Finding (1.00)

Industry:

Law > Criminal Law (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ProMQA: Question Answering Dataset for Multimodal Procedural Activity Understanding

Hasegawa, Kimihiro, Imrattanatrai, Wiradee, Cheng, Zhi-Qi, Asada, Masaki, Holm, Susan, Wang, Yuran, Fukuda, Ken, Mitamura, Teruko

arXiv.org Artificial IntelligenceOct-29-2024

Multimodal systems have great potential to assist humans in procedural activities, where people follow instructions to achieve their goals. Despite diverse application scenarios, systems are typically evaluated on traditional classification tasks, e.g., action recognition or temporal action segmentation. In this paper, we present a novel evaluation dataset, ProMQA, to measure system advancements in application-oriented scenarios. ProMQA consists of 401 multimodal procedural QA pairs on user recording of procedural activities coupled with their corresponding instruction. For QA annotation, we take a cost-effective human-LLM collaborative approach, where the existing annotation is augmented with LLM-generated QA pairs that are later verified by humans. We then provide the benchmark results to set the baseline performance on ProMQA. Our experiment reveals a significant gap between human performance and that of current systems, including competitive proprietary multimodal models. We hope our dataset sheds light on new aspects of models' multimodal understanding capabilities.

large language model, machine learning, question answering, (24 more...)

arXiv.org Artificial Intelligence

2410.22211

Country:

North America > Canada > Ontario > Toronto (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > Switzerland (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

GPT-4 is judged more human than humans in displaced and inverted Turing tests

Rathi, Ishika, Taylor, Sydney, Bergen, Benjamin K., Jones, Cameron R.

arXiv.org Artificial IntelligenceJul-11-2024

Everyday AI detection requires differentiating between people and AI in informal, online conversations. In many cases, people will not interact directly with AI systems but instead read conversations between AI systems and other people. We measured how well people and large language models can discriminate using Figure 1: A summary of our experimental design. Transcripts two modified versions of the Turing test: inverted were sampled from an interactive Turing test, and displaced. GPT-3.5, GPT-4, and where a human judge interrogates a witness to determine displaced human adjudicators judged whether if they are human or AI. In an inverted Turing test, an agent was human or AI on the basis of a we present transcripts to AI models, who judge whether Turing test transcript. We found that both AI the same witnesses are human or AI. In a displaced and displaced human judges were less accurate Turing test, a separate group of human participants read than interactive interrogators, with below the same transcripts and make this judgement.

accuracy, adjudicator, turing test, (17 more...)

arXiv.org Artificial Intelligence

2407.08853

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Europe > Italy > Tuscany > Florence (0.04)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)
Research Report > Experimental Study (0.94)

Industry:

Health & Medicine (0.68)
Education > Educational Setting (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Turing's Test (1.00)

Add feedback

Social Cue Detection and Analysis Using Transfer Entropy

Jiang, Haoyang, Croft, Elizabeth A., Burke, Michael G.

arXiv.org Artificial IntelligenceDec-13-2023

Robots that work close to humans need to understand and use social cues to act in a socially acceptable manner. Social cues are a form of communication (i.e., information flow) between people. In this paper, a framework is introduced to detect and analyse a class of perceptible social cues that are nonverbal and episodic, and the related information transfer using an information-theoretic measure, namely, transfer entropy. We use a group-joining setting to demonstrate the practicality of transfer entropy for analysing communications between humans. Then we demonstrate the framework in two settings involving social interactions between humans: object-handover and person-following. Our results show that transfer entropy can identify information flows between agents and when and where they occur. Potential applications of the framework include information flow or social cue analysis for interactive robot design and socially-aware robot planning.

information transfer, scenario, social cue, (12 more...)

arXiv.org Artificial Intelligence

2303.02904

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > New York > New York County > New York City (0.05)
Oceania > Australia > Victoria > Melbourne (0.04)
(13 more...)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Event Extraction as Question Generation and Answering

Lu, Di, Ran, Shihao, Tetreault, Joel, Jaimes, Alejandro

arXiv.org Artificial IntelligenceJul-9-2023

Recent work on Event Extraction has reframed the task as Question Answering (QA), with promising results. The advantage of this approach is that it addresses the error propagation issue found in traditional token-based classification approaches by directly predicting event arguments without extracting candidates first. However, the questions are typically based on fixed templates and they rarely leverage contextual information such as relevant arguments. In addition, prior QA-based approaches have difficulty handling cases where there are multiple arguments for the same role. In this paper, we propose QGA-EE, which enables a Question Generation (QG) model to generate questions that incorporate rich contextual information instead of using fixed templates. We also propose dynamic templates to assist the training of QG model. Experiments show that QGA-EE outperforms all prior single-task-based models on the ACE05 English dataset.

crime, machine learning, question answering, (19 more...)

arXiv.org Artificial Intelligence

2307.05567

Country:

Europe > France (0.14)
Asia > South Korea > Seoul > Seoul (0.04)
Asia > India > Tripura (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry:

Law (1.00)
Government (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.90)

Add feedback

Retrieval-Augmented Generative Question Answering for Event Argument Extraction

Du, Xinya, Ji, Heng

arXiv.org Artificial IntelligenceNov-13-2022

Event argument extraction has long been studied as a sequential prediction problem with extractive-based methods, tackling each argument in isolation. Although recent work proposes generation-based methods to capture cross-argument dependency, they require generating and post-processing a complicated target sequence (template). Motivated by these observations and recent pretrained language models' capabilities of learning from demonstrations. We propose a retrieval-augmented generative QA model (R-GQA) for event argument extraction. It retrieves the most similar QA pair and augments it as prompt to the current example's context, then decodes the arguments as answers. Our approach outperforms substantially prior methods across various settings (i.e. fully supervised, domain transfer, and fewshot learning). Finally, we propose a clustering-based sampling strategy (JointEnc) and conduct a thorough analysis of how different strategies influence the few-shot learning performance. The implementations are available at https:// github.com/xinyadu/RGQA

machine learning, natural language, question answering, (20 more...)

arXiv.org Artificial Intelligence

2211.07067

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Ukraine > Kyiv Oblast > Kyiv (0.05)
North America > United States > California > San Diego County > San Diego (0.04)
(11 more...)

Genre: Research Report (0.64)

Industry:

Law (1.00)
Government (0.94)
Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Bi-Directional Iterative Prompt-Tuning for Event Argument Extraction

Dai, Lu, Wang, Bang, Xiang, Wei, Mo, Yijun

arXiv.org Artificial IntelligenceOct-27-2022

Recently, prompt-tuning has attracted growing interests in event argument extraction (EAE). However, the existing prompt-tuning methods have not achieved satisfactory performance due to the lack of consideration of entity information. In this paper, we propose a bi-directional iterative prompt-tuning method for EAE, where the EAE task is treated as a cloze-style task to take full advantage of entity information and pre-trained language models (PLMs). Furthermore, our method explores event argument interactions by introducing the argument roles of contextual entities into prompt construction. Since template and verbalizer are two crucial components in a cloze-style prompt, we propose to utilize the role label semantic knowledge to construct a semantic verbalizer and design three kinds of templates for the EAE task. Experiments on the ACE 2005 English dataset with standard and low-resource settings show that the proposed method significantly outperforms the peer state-of-the-art methods. Our code is available at https://github.com/HustMinsLab/BIP.

argument role, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2210.15843

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Asia > Middle East > Iraq > Baghdad Governorate > Baghdad (0.04)
(7 more...)

Genre: Research Report (1.00)

Industry:

Law (0.96)
Government > Regional Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.66)

Add feedback

DEGREE: A Data-Efficient Generative Event Extraction Model

Hsu, I-Hung, Huang, Kuan-Hao, Boschee, Elizabeth, Miller, Scott, Natarajan, Prem, Chang, Kai-Wei, Peng, Nanyun

arXiv.org Artificial IntelligenceSep-14-2021

Event extraction (EE) aims to identify structured events, including event triggers and their corresponding arguments, from unstructured text. Most of the existing works rely on a large number of labeled instances to train models, while the labeled data could be expensive to be obtained. In this work, we present a data-efficient event extraction method by formulating event extraction as a natural language generation problem. The formulation allows us to inject knowledge of label semantics, event structure, and output dependencies into the model. Given a passage and an event type, our model learns to summarize this passage into a templated sentence in a predefined structure. The template is event-type-specific, manually created, and contains event trigger and argument information. Lastly, a rule-based algorithm is used to derive the trigger and argument predictions from the generated sentence. Our method inherently enjoys the following benefits: (1) The pretraining of the generative language models help incorporate the semantics of the labels for generative EE. (2) The autoregressive generation process and our end-to-end design for extracting triggers and arguments force the model to capture the dependencies among the output triggers and their arguments. (3) The predefined templates form concrete yet flexible rules to hint the models about the valid patterns for each event type, reducing the models' burden to learn structures from the data. Empirical results show that our model achieves superior performance over strong baselines on EE tasks in the low data regime and achieves competitive results to the current state-of-the-art when more data becomes available.

egree, event type, template, (15 more...)

arXiv.org Artificial Intelligence

2108.12724

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Indonesia (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (0.69)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)

Add feedback