AITopics | false alarm rate

Country:

Asia > India > Karnataka > Bengaluru (0.04)
Oceania > Australia > Western Australia > Perth (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(4 more...)

Genre: Research Report (0.67)

Industry:

Health & Medicine (1.00)
Information Technology > Security & Privacy (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Security & Privacy (0.66)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Neural Information Processing SystemsFeb-11-2026, 21:56:22 GMT

f3a4ff4839c56a5f460c88cce3666a2b-Paper.pdf

algorithm, detection delay, post-change parameter, (15 more...)

Country:

Asia > India > Karnataka > Bengaluru (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Industry: Information Technology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningDec-4-2025

E-valuator: Reliable Agent Verifiers with Sequential Hypothesis Testing

Sadhuka, Shuvom, Prinster, Drew, Fannjiang, Clara, Scalia, Gabriele, Regev, Aviv, Wang, Hanchen

Agentic AI systems execute a sequence of actions, such as reasoning steps or tool calls, in response to a user prompt. To evaluate the success of their trajectories, researchers have developed verifiers, such as LLM judges and process-reward models, to score the quality of each action in an agent's trajectory. Although these heuristic scores can be informative, there are no guarantees of correctness when used to decide whether an agent will yield a successful output. Here, we introduce e-valuator, a method to convert any black-box verifier score into a decision rule with provable control of false alarm rates. We frame the problem of distinguishing successful trajectories (that is, a sequence of actions that will lead to a correct response to the user's prompt) and unsuccessful trajectories as a sequential hypothesis testing problem. E-valuator builds on tools from e-processes to develop a sequential hypothesis test that remains statistically valid at every step of an agent's trajectory, enabling online monitoring of agents over arbitrarily long sequences of actions. Empirically, we demonstrate that e-valuator provides greater statistical power and better false alarm rate control than other strategies across six datasets and three agents. We additionally show that e-valuator can be used for to quickly terminate problematic trajectories and save tokens. Together, e-valuator provides a lightweight, model-agnostic framework that converts verifier heuristics into decisions rules with statistical guarantees, enabling the deployment of more reliable agentic systems.

false alarm rate, threshold, trajectory, (14 more...)

arXiv.org Machine Learning

2512.03109

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry:

Leisure & Entertainment > Games (0.69)
Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.61)
(2 more...)

Jilan, Jehad, Nambiar, Niranjana Naveen, Saber, Ahmad Mohammad, Paranjape, Alok, Youssef, Amr, Kundur, Deepa

A Kolmogorov-Arnold Network for Interpretable Cyberattack Detection in AGC Systems

arXiv.org Artificial IntelligenceSep-8-2025

Automatic Generation Control (AGC) is essential for power grid stability but remains vulnerable to stealthy cyberattacks, such as False Data Injection Attacks (FDIAs), which can disturb the system's stability while evading traditional detection methods. Unlike previous works that relied on black-box approaches, this work proposes Kolmogorov-Arnold Networks (KAN) as an interpretable and accurate method for FDIA detection in AGC systems, considering the system nonlinearities. KAN models include a method for extracting symbolic equations, and are thus able to provide more interpretability than the majority of machine learning models. The proposed KAN is trained offline to learn the complex nonlinear relationships between the AGC measurements under different operating scenarios. After training, symbolic formulas that describe the trained model's behavior can be extracted and leveraged, greatly enhancing interpretability. Our findings confirm that the proposed KAN model achieves FDIA detection rates of up to 95.97% and 95.9% for the initial model and the symbolic formula, respectively, with a low false alarm rate, offering a reliable approach to enhancing AGC cybersecurity.

artificial intelligence, kan model, machine learning, (16 more...)

2509.05259

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (1.00)
Energy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Li, Jiawei, Magesh, Akshayaa, Veeravalli, Venugopal V.

Principled Detection of Hallucinations in Large Language Models via Multiple Testing

arXiv.org Artificial IntelligenceAug-28-2025

While Large Language Models (LLMs) have emerged as powerful foundational models to solve a variety of tasks, they have also been shown to be prone to hallucinations, i.e., generating responses that sound confident but are actually incorrect or even nonsensical. In this work, we formulate the problem of detecting hallucinations as a hypothesis testing problem and draw parallels to the problem of out-of-distribution detection in machine learning models. We propose a multiple-testing-inspired method to solve the hallucination detection problem, and provide extensive experimental results to validate the robustness of our approach against state-of-the-art methods.

large language model, machine learning, natural language, (15 more...)

2508.18473

Country: North America > United States > Illinois (0.14)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsAug-22-2025, 01:32:34 GMT

f3a4ff4839c56a5f460c88cce3666a2b-Supplemental.pdf

data mining, detection delay, machine learning, (20 more...)

Country:

Asia > India > Karnataka > Bengaluru (0.04)
Oceania > Australia > Western Australia > Perth (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(4 more...)

Genre: Research Report (0.67)

Industry:

Health & Medicine (1.00)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Neural Information Processing SystemsAug-22-2025, 01:32:30 GMT

Bandit Quickest Changepoint Detection

artificial intelligence, detection delay, machine learning, (17 more...)

Country:

Asia > India > Karnataka > Bengaluru (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Industry: Information Technology > Security & Privacy (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Timans, Alexander, Verma, Rajeev, Nalisnick, Eric, Naesseth, Christian A.

On Continuous Monitoring of Risk Violations under Unknown Shift

arXiv.org Machine LearningJun-23-2025

Machine learning systems deployed in the real world must operate under dynamic and often unpredictable distribution shifts. This challenges the validity of statistical safety assurances on the system's risk established beforehand. Common risk control frameworks rely on fixed assumptions and lack mechanisms to continuously monitor deployment reliability. In this work, we propose a general framework for the real-time monitoring of risk violations in evolving data streams. Leveraging the 'testing by betting' paradigm, we propose a sequential hypothesis testing procedure to detect violations of bounded risks associated with the model's decision-making mechanism, while ensuring control on the false alarm rate. Our method operates under minimal assumptions on the nature of encountered shifts, rendering it broadly applicable. We illustrate the effectiveness of our approach by monitoring risks in outlier detection and set prediction under a variety of shifts.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Machine Learning

2506.16416

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.48)

Palzer, David, Maciejewski, Matthew, Fosler-Lussier, Eric

Improving Neural Diarization through Speaker Attribute Attractors and Local Dependency Modeling

arXiv.org Artificial IntelligenceJun-9-2025

ABSTRACT In recent years, end-to-end approaches have made notable progress in addressing the challenge of speaker diarization, which involves segmenting and identifying speakers in multi-talker recordings. One such approach, Encoder-Decoder Attractors (EDA), has been proposed to handle variable speaker counts as well as better guide the network during training. In this study, we extend the attractor paradigm by moving beyond direct speaker modeling and instead focus on representing more detailed'speaker attributes' through a multistage process of intermediate representations. Additionally, we enhance the architecture by replacing transformers with conformers, a convolution-augmented transformer, to model local dependencies. Experiments demonstrate improved di-arization performance on the CALLHOME dataset.

artificial intelligence, attractor, machine learning, (17 more...)

doi: 10.1109/ICASSP48485.2024.10446213

2506.05593

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

arXiv.org Artificial IntelligenceFeb-4-2025

TransformDAS: Mapping {\Phi}-OTDR Signals to Riemannian Manifold for Robust Classification

Kang, Jiaju, Han, Puyu, Chun, Yang, Wang, Xu, Gong, Luqi

Phase-sensitive optical time-domain reflectometry ({\Phi}-OTDR) is a widely used distributed fiber optic sensing system in engineering. Machine learning algorithms for {\Phi}-OTDR event classification require high volumes and quality of datasets; however, high-quality datasets are currently extremely scarce in the field, leading to a lack of robustness in models, which is manifested by higher false alarm rates in real-world scenarios. One promising approach to address this issue is to augment existing data using generative models combined with a small amount of real-world data. We explored mapping both {\Phi}-OTDR features in a GAN-based generative pipeline and signal features in a Transformer classifier to hyperbolic space to seek more effective model generalization. The results indicate that state-of-the-art models exhibit stronger generalization performance and lower false alarm rates in real-world scenarios when trained on augmented datasets. TransformDAS, in particular, demonstrates the best classification performance, highlighting the benefits of Riemannian manifold mapping in {\Phi}-OTDR data generation and model classification.

artificial intelligence, deep learning, machine learning, (16 more...)

2502.02428

Country: Asia > China (0.30)

Genre: Research Report > Promising Solution (0.68)

Industry: Energy > Oil & Gas (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)