Performance Analysis
Exploring Spiking Neural Networks for Binary Classification in Multivariate Time Series at the Edge
Ghawaly, James, Nicholson, Andrew, Schuman, Catherine, Diez, Dalton, Young, Aaron, Witherspoon, Brett
We present a general framework for training spiking neural networks (SNNs) to perform binary classification on multivariate time series, with a focus on step-wise prediction and high precision at low false alarm rates. The approach uses the Evolutionary Optimization of Neuromorphic Systems (EONS) algorithm to evolve sparse, stateful SNNs by jointly optimizing their architectures and parameters. Inputs are encoded into spike trains, and predictions are made by thresholding a single output neuron's spike counts. We also incorporate simple voting ensemble methods to improve performance and robustness. To evaluate the framework, we apply it with application-specific optimizations to the task of detecting low signal-to-noise ratio radioactive sources in gamma-ray spectral data. The resulting SNNs, with as few as 49 neurons and 66 synapses, achieve a 51.8% true positive rate (TPR) at a false alarm rate of 1/hr, outperforming PCA (42.7%) and deep learning (49.8%) baselines. A three-model any-vote ensemble increases TPR to 67.1% at the same false alarm rate. Hardware deployment on the microCaspian neuromorphic platform demonstrates 2mW power consumption and 20.2ms inference latency. We also demonstrate generalizability by applying the same framework, without domain-specific modification, to seizure detection in EEG recordings. An ensemble achieves 95% TPR with a 16% false positive rate, comparable to recent deep learning approaches with significant reduction in parameter count.
The Coding Limits of Robust Watermarking for Generative Models
Francati, Danilo, Goonatilake, Yevin Nikhel, Pawar, Shubham, Venturi, Daniele, Ateniese, Giuseppe
We ask a basic question about cryptographic watermarking for generative models: to what extent can a watermark remain reliable when an adversary is allowed to corrupt the encoded signal? To study this question, we introduce a minimal coding abstraction that we call a zero-bit tamper-detection code. This is a secret-key procedure that samples a pseudorandom codeword and, given a candidate word, decides whether it should be treated as unmarked content or as the result of tampering with a valid codeword. It captures the two core requirements of robust watermarking: soundness and tamper detection. Within this abstraction we prove a sharp unconditional limit on robustness to independent symbol corruption. For an alphabet of size $q$, there is a critical corruption rate of $1 - 1/q$ such that no scheme with soundness, even relaxed to allow a fixed constant false positive probability on random content, can reliably detect tampering once an adversary can change more than this fraction of symbols. In particular, in the binary case no cryptographic watermark can remain robust if more than half of the encoded bits are modified. We also show that this threshold is tight by giving simple information-theoretic constructions that achieve soundness and tamper detection for all strictly smaller corruption rates. We then test experimentally whether this limit appears in practice by looking at the recent watermarking for images of Gunn, Zhao, and Song (ICLR 2025). We show that a simple crop and resize operation reliably flipped about half of the latent signs and consistently prevented belief-propagation decoding from recovering the codeword, erasing the watermark while leaving the image visually intact.
Beyond Human Judgment: A Bayesian Evaluation of LLMs' Moral Values Understanding
Skorski, Maciej, Landowska, Alina
How do Large Language Models understand moral dimensions compared to humans? This first large-scale Bayesian evaluation of market-leading language models provides the answer. In contrast to prior work using deterministic ground truth (majority or inclusion rules), we model annotator disagreements to capture both aleatoric uncertainty (inherent human disagreement) and epistemic uncertainty (model domain sensitivity). We evaluated the best language models (Claude Sonnet 4, DeepSeek-V3, Llama 4 Maverick) across 250K+ annotations from nearly 700 annotators in 100K+ texts spanning social networks, news and forums. Our GPU-optimized Bayesian framework processed 1M+ model queries, revealing that AI models typically rank among the top 25\% of human annotators, performing much better than average balanced accuracy. Importantly, we find that AI produces far fewer false negatives than humans, highlighting their more sensitive moral detection capabilities.
Soft decision trees for survival analysis
Consolo, Antonio, Amaldi, Edoardo, Carrizosa, Emilio
Decision trees are popular in survival analysis for their interpretability and ability to model complex relationships. Survival trees, which predict the timing of singular events using censored historical data, are typically built through heuristic approaches. Recently, there has been growing interest in globally optimized trees, where the overall tree is trained by minimizing the error function over all its parameters. We propose a new soft survival tree model (SST), with a soft splitting rule at each branch node, trained via a nonlinear optimization formulation amenable to decomposition. Since SSTs provide for every input vector a specific survival function associated to a single leaf node, they satisfy the conditional computation property and inherit the related benefits. SST and the training formulation combine flexibility with interpretability: any smooth survival function (parametric, semiparametric, or nonparametric) estimated through maximum likelihood can be used, and each leaf node of an SST yields a cluster of distinct survival functions which are associated to the data points routed to it. Numerical experiments on 15 well-known datasets show that SSTs, with parametric and spline-based semiparametric survival functions, trained using an adaptation of the node-based decomposition algorithm proposed by Consolo et al. (2024) for soft regression trees, outperform three benchmark survival trees in terms of four widely-used discrimination and calibration measures. SSTs can also be extended to consider group fairness.
Defending the Edge: Representative-Attention Defense against Backdoor Attacks in Federated Learning
Obioma, Chibueze Peace, Sun, Youcheng, Mustafa, Mustafa A.
Federated learning (FL) remains highly vulnerable to adaptive backdoor attacks that preserve stealth by closely imitating benign update statistics. Existing defenses predominantly rely on anomaly detection in parameter or gradient space, overlooking behavioral constraints that backdoor attacks must satisfy to ensure reliable trigger activation. These anomaly-centric methods fail against adaptive attacks that normalize update magnitudes and mimic benign statistical patterns while preserving backdoor functionality, creating a fundamental detection gap. To address this limitation, this paper introduces FeRA (Federated Representative Attention) -- a novel attention-driven defense that shifts the detection paradigm from anomaly-centric to consistency-centric analysis. FeRA exploits the intrinsic need for backdoor persistence across training rounds, identifying malicious clients through suppressed representation-space variance, an orthogonal property to traditional magnitude-based statistics. The framework conducts multi-dimensional behavioral analysis combining spectral and spatial attention, directional alignment, mutual similarity, and norm inflation across two complementary detection mechanisms: consistency analysis and norm-inflation detection. Through this mechanism, FeRA isolates malicious clients that exhibit low-variance consistency or magnitude amplification. Extensive evaluation across six datasets, nine attacks, and three model architectures under both Independent and Identically Distributed (IID) and non-IID settings confirm FeRA achieves superior backdoor mitigation. Under different non-IID settings, FeRA achieved the lowest average Backdoor Accuracy (BA), about 1.67% while maintaining high clean accuracy compared to other state-of-the-art defenses. The code is available at https://github.com/Peatech/FeRA_defense.git.
SAVeD: Semantic Aware Version Discovery
Our work introduces SAVeD (Semantically Aware Version Detection), a contrastive learning-based framework for identifying versions of structured datasets without relying on metadata, labels, or integration-based assumptions. SAVeD addresses a common challenge in data science of repeated labor due to a difficulty of similar work or transformations on datasets. SAVeD employs a modified SimCLR pipeline, generating augmented table views through random transformations (e.g., row deletion, encoding perturbations). These views are embedded via a custom transformer encoder and contrasted in latent space to optimize semantic similarity. Our model learns to minimize distances between augmented views of the same dataset and maximize those between unrelated tables. We evaluate performance using validation accuracy and separation, defined respectively as the proportion of correctly classified version/non-version pairs on a hold-out set, and the difference between average similarities of versioned and non-versioned tables (defined by a benchmark, and not provided to the model). Our experiments span five canonical datasets from the Semantic Versioning in Databases Benchmark, and demonstrate substantial gains post-training. SAVeD achieves significantly higher accuracy on completely unseen tables in, and a significant boost in separation scores, confirming its capability to distinguish semantically altered versions. Compared to untrained baselines and prior state-of-the-art dataset-discovery methods like Starmie, our custom encoder achieves competitive or superior results.
Subspace Clustering via Tangent Cones
Given samples lying on any of a number of subspaces, subspace clustering is the task of grouping the samples based on the their corresponding subspaces. Many subspace clustering methods operate by assigning a measure of affinity to each pair of points and feeding these affinities into a graph clustering algorithm. This paper proposes a new paradigm for subspace clustering that computes affinities based on the corresponding conic geometry. The proposed conic subspace clustering (CSC) approach considers the convex hull of a collection of normalized data points and the corresponding tangent cones. The union of subspaces underlying the data imposes a strong association between the tangent cone at a sample $x$ and the original subspace containing $x$. In addition to describing this novel geometric perspective, this paper provides a practical algorithm for subspace clustering that leverages this perspective, where a tangent cone membership test is used to estimate the affinities. This algorithm is accompanied with deterministic and stochastic guarantees on the properties of the learned affinity matrix, on the true and false positive rates and spread, which directly translate into the overall clustering accuracy.
On Fairness and Calibration
The machine learning community has become increasingly concerned with the potential for bias and discrimination in predictive models. This has motivated a growing line of work on what it means for a classification procedure to be fair. In this paper, we investigate the tension between minimizing error disparity across different population groups while maintaining calibrated probability estimates. We show that calibration is compatible only with a single error constraint (i.e.
A Linear-Time Kernel Goodness-of-Fit Test
We propose a novel adaptive test of goodness-of-fit, with computational cost linear in the number of samples. We learn the test features that best indicate the differences between observed samples and a reference model, by minimizing the false negative rate. These features are constructed via Stein's method, meaning that it is not necessary to compute the normalising constant of the model. We analyse the asymptotic Bahadur efficiency of the new test, and prove that under a mean-shift alternative, our test always has greater relative efficiency than a previous linear-time kernel test, regardless of the choice of parameters for that test. In experiments, the performance of our method exceeds that of the earlier linear-time test, and matches or exceeds the power of a quadratic-time kernel test. In high dimensions and where model structure may be exploited, our goodness of fit test performs far better than a quadratic-time two-sample test based on the Maximum Mean Discrepancy, with samples drawn from the model.