Performance Analysis
The Good, the Bad, and the Sampled: a No-Regret Approach to Safe Online Classification
Baharav, Tavor Z., Dragazis, Spyros, Pacchiano, Aldo
We study the problem of sequentially testing individuals for a binary disease outcome whose true risk is governed by an unknown logistic model. At each round, a patient arrives with feature vector $x_t$, and the decision maker may either pay to administer a (noiseless) diagnostic test--revealing the true label--or skip testing and predict the patient's disease status based on their feature vector and prior history. Our goal is to minimize the total number of costly tests required while guaranteeing that the fraction of misclassifications does not exceed a prespecified error tolerance $ฮฑ$, with probability at least $1-ฮด$. To address this, we develop a novel algorithm that interleaves label-collection and distribution estimation to estimate both $ฮธ^{*}$ and the context distribution $P$, and computes a conservative, data-driven threshold $ฯ_t$ on the logistic score $|x_t^\topฮธ|$ to decide when testing is necessary. We prove that, with probability at least $1-ฮด$, our procedure does not exceed the target misclassification rate, and requires only $O(\sqrt{T})$ excess tests compared to the oracle baseline that knows both $ฮธ^{*}$ and the patient feature distribution $P$. This establishes the first no-regret guarantees for error-constrained logistic testing, with direct applications to cost-sensitive medical screening. Simulations corroborate our theoretical guarantees, showing that in practice our procedure efficiently estimates $ฮธ^{*}$ while retaining safety guarantees, and does not require too many excess tests.
Robust Spatiotemporally Contiguous Anomaly Detection Using Tensor Decomposition
Mondal, Rachita, Indibi, Mert, Maiti, Tapabrata, Aviyente, Selin
Anomaly detection in spatiotemporal data is a challenging problem encountered in a variety of applications, including video surveillance, medical imaging data, and urban traffic monitoring. Existing anomaly detection methods focus mainly on point anomalies and cannot deal with temporal and spatial dependencies that arise in spatio-temporal data. Tensor-based anomaly detection methods have been proposed to address this problem. Although existing methods can capture dependencies across different modes, they are primarily supervised and do not account for the specific structure of anomalies. Moreover, these methods focus mainly on extracting anomalous features without providing any statistical confidence. In this paper, we introduce an unsupervised tensor-based anomaly detection method that simultaneously considers the sparse and spatiotemporally smooth nature of anomalies. The anomaly detection problem is formulated as a regularized robust low-rank + sparse tensor decomposition where the total variation of the tensor with respect to the underlying spatial and temporal graphs quantifies the spatiotemporal smoothness of the anomalies. Once the anomalous features are extracted, we introduce a statistical anomaly scoring framework that accounts for local spatio-temporal dependencies. The proposed framework is evaluated on both synthetic and real data.
A Framework for Selection of Machine Learning Algorithms Based on Performance Metrices and Akaike Information Criteria in Healthcare, Telecommunication, and Marketing Sector
The exponential growth of internet generated data has fueled advancements in artificial intelligence (AI), machine learning (ML), and deep learning (DL) for extracting actionable insights in marketing,telecom, and health sectors. This chapter explores ML applications across three domains namely healthcare, marketing, and telecommunications, with a primary focus on developing a framework for optimal ML algorithm selection. In healthcare, the framework addresses critical challenges such as cardiovascular disease prediction accounting for 28.1% of global deaths and fetal health classification into healthy or unhealthy states, utilizing three datasets. ML algorithms are categorized into eager, lazy, and hybrid learners, selected based on dataset attributes, performance metrics (accuracy, precision, recall), and Akaike Information Criterion (AIC) scores. For validation, eight datasets from the three sectors are employed in the experiments. The key contribution is a recommendation framework that identifies the best ML model according to input attributes, balancing performance evaluation and model complexity to enhance efficiency and accuracy in diverse real-world applications. This approach bridges gaps in automated model selection, offering practical implications for interdisciplinary ML deployment.
CORTEX: Collaborative LLM Agents for High-Stakes Alert Triage
Wei, Bowen, Tay, Yuan Shen, Liu, Howard, Pan, Jinhao, Luo, Kun, Zhu, Ziwei, Jordan, Chris
Security Operations Centers (SOCs) are overwhelmed by tens of thousands of daily alerts, with only a small fraction corresponding to genuine attacks. This overload creates alert fatigue, leading to overlooked threats and analyst burnout. Classical detection pipelines are brittle and context-poor, while recent LLM-based approaches typically rely on a single model to interpret logs, retrieve context, and adjudicate alerts end-to-end -- an approach that struggles with noisy enterprise data and offers limited transparency. We propose CORTEX, a multi-agent LLM architecture for high-stakes alert triage in which specialized agents collaborate over real evidence: a behavior-analysis agent inspects activity sequences, evidence-gathering agents query external systems, and a reasoning agent synthesizes findings into an auditable decision. To support training and evaluation, we release a dataset of fine-grained SOC investigations from production environments, capturing step-by-step analyst actions and linked tool outputs. Across diverse enterprise scenarios, CORTEX substantially reduces false positives and improves investigation quality over state-of-the-art single-agent LLMs.