Performance Analysis
Cross-Lingual Summarization as a Black-Box Watermark Removal Attack
Watermarking has been proposed as a lightweight mechanism to identify AI-generated text, with schemes typically relying on perturbations to token distributions. While prior work shows that paraphrasing can weaken such signals, these attacks remain partially detectable or degrade text quality. We demonstrate that cross-lingual summarization attacks (CLSA) -- translation to a pivot language followed by summarization and optional back-translation -- constitute a qualitatively stronger attack vector. By forcing a semantic bottleneck across languages, CLSA systematically destroys token-level statistical biases while preserving semantic fidelity. In experiments across multiple watermarking schemes (KGW, SIR, XSIR, Unigram) and five languages (Amharic, Chinese, Hindi, Spanish, Swahili), we show that CLSA reduces watermark detection accuracy more effectively than monolingual paraphrase at similar quality levels. Our results highlight an underexplored vulnerability that challenges the practicality of watermarking for provenance or regulation. We argue that robust provenance solutions must move beyond distributional watermarking and incorporate cryptographic or model-attestation approaches. On 300 held-out samples per language, CLSA consistently drives detection toward chance while preserving task utility. Concretely, for XSIR (explicitly designed for cross-lingual robustness), AUROC with paraphrasing is $0.827$, with Cross-Lingual Watermark Removal Attacks (CWRA) [He et al., 2024] using Chinese as the pivot, it is $0.823$, whereas CLSA drives it down to $0.53$ (near chance). Results highlight a practical, low-cost removal pathway that crosses languages and compresses content without visible artifacts.
Point-level Uncertainty Evaluation of Mobile Laser Scanning Point Clouds
Xu, Ziyang, Wysocki, Olaf, Holst, Christoph
Y et, despite this progress, the point clouds acquired by MLS systems operating in real-world environments inevitably contain uncertainty arising from various error sources during acquisition and processing. Although MLS systems have advanced rapidly in both data collection and post-processing, research on uncertainty evaluation has received comparatively less attention and remains underdeveloped (Xu et al., 2025b). From a user's perspective, the quality of point clouds from MLS systems is a critical concern. As the foundational input for many downstream tasks, inadequate assessment of MLS point clouds' quality can easily impact high-precision applications such as navigation and change analysis. This will not only undermine reliability but also result in substantial waste of time and resources, which is unacceptable in real-world applications. There is a clear need for automated and reliable solutions for uncertainty evaluation. In MLS systems, four main categories of error sources contribute to uncertainty: instrumental errors, atmospheric errors, object-and geometry-related errors, and trajectory estimation errors (Habib et al., 2009, Schenk, 2001). Considering the characteristics of these error sources, existing uncertainty evaluation methods can be broadly divided into two categories: forward modeling and backward modeling (Shi et al., 2021). The core idea of forward modeling is grounded in variance-covariance propagation, which involves detailed theoretical analysis of MLS system errors.
NeuroPathNet: Dynamic Path Trajectory Learning for Brain Functional Connectivity Analysis
Guo, Tianqi, Chen, Liping, Peng, Ciyuan, Zhou, Jingjing, Ren, Jing
Understanding the evolution of brain functional networks over time is of great significance for the analysis of cognitive mechanisms and the diagnosis of neurological diseases. Existing methods often have difficulty in capturing the temporal evolution characteristics of connections between specific functional communities. To this end, this paper proposes a new path-level trajectory modeling framework (NeuroPathNet) to characterize the dynamic behavior of connection pathways between brain functional partitions. Based on medically supported static partitioning schemes (such as Yeo and Smith ICA), we extract the time series of connection strengths between each pair of functional partitions and model them using a temporal neural network. We validate the model performance on three public functional Magnetic Resonance Imaging (fMRI) datasets, and the results show that it outperforms existing mainstream methods in multiple indicators. This study can promote the development of dynamic graph learning methods for brain network analysis, and provide possible clinical applications for the diagnosis of neurological diseases.
Learning Beyond Experience: Generalizing to Unseen State Space with Reservoir Computing
Norton, Declan A., Zhang, Yuanzhao, Girvan, Michelle
Machine learning techniques offer an effective approach to modeling dynamical systems solely from observed data. However, without explicit structural priors -- built-in assumptions about the underlying dynamics -- these techniques typically struggle to generalize to aspects of the dynamics that are poorly represented in the training data. Here, we demonstrate that reservoir computing -- a simple, efficient, and versatile machine learning framework often used for data-driven modeling of dynamical systems -- can generalize to unexplored regions of state space without explicit structural priors. First, we describe a multiple-trajectory training scheme for reservoir computers that supports training across a collection of disjoint time series, enabling effective use of available training data. Then, applying this training scheme to multistable dynamical systems, we show that RCs trained on trajectories from a single basin of attraction can achieve out-of-domain generalization by capturing system behavior in entirely unobserved basins.
Stochastic Momentum Methods for Non-smooth Non-Convex Finite-Sum Coupled Compositional Optimization
Chen, Xingyu, Wang, Bokun, Yang, Ming, Lin, Qihang, Yang, Tianbao
Finite-sum Coupled Compositional Optimization (FCCO), characterized by its coupled compositional objective structure, emerges as an important optimization paradigm for addressing a wide range of machine learning problems. In this paper, we focus on a challenging class of non-convex non-smooth FCCO, where the outer functions are non-smooth weakly convex or convex and the inner functions are smooth or weakly convex. Existing state-of-the-art result face two key limitations: (1) a high iteration complexity of $O(1/ε^6)$ under the assumption that the stochastic inner functions are Lipschitz continuous in expectation; (2) reliance on vanilla SGD-type updates, which are not suitable for deep learning applications. Our main contributions are two fold: (i) We propose stochastic momentum methods tailored for non-smooth FCCO that come with provable convergence guarantees; (ii) We establish a new state-of-the-art iteration complexity of $O(1/ε^5)$. Moreover, we apply our algorithms to multiple inequality constrained non-convex optimization problems involving smooth or weakly convex functional inequality constraints. By optimizing a smoothed hinge penalty based formulation, we achieve a new state-of-the-art complexity of $O(1/ε^5)$ for finding an (nearly) $ε$-level KKT solution. Experiments on three tasks demonstrate the effectiveness of the proposed algorithms.
Monitoring the calibration of probability forecasts with an application to concept drift detection involving image classification
Franck, Christopher T., Driscoll, Anne R., Szajnfarber, Zoe, Woodall, William H.
Machine learning approaches for image classification have led to impressive advances in that field. For example, convolutional neural networks are able to achieve remarkable image classification accuracy across a wide range of applications in industry, defense, and other areas. While these machine learning models boast impressive accuracy, a related concern is how to assess and maintain calibration in the predictions these models make. A classification model is said to be well calibrated if its predicted probabilities correspond with the rates events actually occur. While there are many available methods to assess machine learning calibration and recalibrate faulty predictions, less effort has been spent on developing approaches that continually monitor predictive models for potential loss of calibration as time passes. We propose a cumulative sum-based approach with dynamic limits that enable detection of miscalibration in both traditional process monitoring and concept drift applications. This enables early detection of operational context changes that impact image classification performance in the field. The proposed chart can be used broadly in any situation where the user needs to monitor probability predictions over time for potential lapses in calibration. Importantly, our method operates on probability predictions and event outcomes and does not require under-the-hood access to the machine learning model.
Robust variable selection for spatial point processes observed with noise
Sturm, Dominik, Sbalzarini, Ivo F.
We propose a method for variable selection in the intensity function of spatial point processes that combines sparsity-promoting estimation with noise-robust model selection. As high-resolution spatial data becomes increasingly available through remote sensing and automated image analysis, identifying spatial covariates that influence the localization of events is crucial to understand the underlying mechanism. However, results from automated acquisition techniques are often noisy, for example due to measurement uncertainties or detection errors, which leads to spurious displacements and missed events. We study the impact of such noise on sparse point-process estimation across different models, including Poisson and Thomas processes. To improve noise robustness, we propose to use stability selection based on point-process subsampling and to incorporate a non-convex best-subset penalty to enhance model-selection performance. In extensive simulations, we demonstrate that such an approach reliably recovers true covariates under diverse noise scenarios and improves both selection accuracy and stability. We then apply the proposed method to a forestry data set, analyzing the distribution of trees in relation to elevation and soil nutrients in a tropical rain forest. This shows the practical utility of the method, which provides a systematic framework for robust variable selection in spatial point-process models under noise, without requiring additional knowledge of the process.
Beyond Normality: Reliable A/B Testing with Non-Gaussian Data
Gong, Junpeng, Wang, Chunkai, Li, Hao, Ma, Jinyong, Li, Haoxuan, He, Xu
A/B testing has become the cornerstone of decision-making in online markets, guiding how platforms launch new features, optimize pricing strategies, and improve user experience. In practice, we typically employ the pairwise $t$-test to compare outcomes between the treatment and control groups, thereby assessing the effectiveness of a given strategy. To be trustworthy, these experiments must keep Type I error (i.e., false positive rate) under control; otherwise, we may launch harmful strategies. However, in real-world applications, we find that A/B testing often fails to deliver reliable results. When the data distribution departs from normality or when the treatment and control groups differ in sample size, the commonly used pairwise $t$-test is no longer trustworthy. In this paper, we quantify how skewed, long tailed data and unequal allocation distort error rates and derive explicit formulas for the minimum sample size required for the $t$-test to remain valid. We find that many online feedback metrics require hundreds of millions samples to ensure reliable A/B testing. Thus we introduce an Edgeworth-based correction that provides more accurate $p$-values when the available sample size is limited. Offline experiments on a leading A/B testing platform corroborate the practical value of our theoretical minimum sample size thresholds and demonstrate that the corrected method substantially improves the reliability of A/B testing in real-world conditions.
Combining SHAP and Causal Analysis for Interpretable Fault Detection in Industrial Processes
Santos, Pedro Cortes dos, Rocha, Matheus Becali, Krohling, Renato A
Industrial processes generate complex data that challenge fault detection systems, often yielding opaque or underwhelming results despite advanced machine learning techniques. This study tackles such difficulties using the Tennessee Eastman Process, a well-established benchmark known for its intricate dynamics, to develop an innovative fault detection framework. Initial attempts with standard models revealed limitations in both performance and interpretability, prompting a shift toward a more tractable approach. By employing SHAP (SHapley Additive exPlanations), we transform the problem into a more manageable and transparent form, pinpointing the most critical process features driving fault predictions. This reduction in complexity unlocks the ability to apply causal analysis through Directed Acyclic Graphs, generated by multiple algorithms, to uncover the underlying mechanisms of fault propagation. The resulting causal structures align strikingly with SHAP findings, consistently highlighting key process elements-like cooling and separation systems-as pivotal to fault development. Together, these methods not only enhance detection accuracy but also provide operators with clear, actionable insights into fault origins, a synergy that, to our knowledge, has not been previously explored in this context. This dual approach bridges predictive power with causal understanding, offering a robust tool for monitoring complex manufacturing environments and paving the way for smarter, more interpretable fault detection in industrial systems.
Acoustic and Machine Learning Methods for Speech-Based Suicide Risk Assessment: A Systematic Review
Marie, Ambre, Garnier, Marine, Bertin, Thomas, Machart, Laura, Dardenne, Guillaume, Quellec, Gwenolé, Berrouiguet, Sofian
Suicide remains a public health challenge, necessitating improved detection methods to facilitate timely intervention and treatment. This systematic review evaluates the role of Artificial Intelligence (AI) and Machine Learning (ML) in assessing suicide risk through acoustic analysis of speech. Following PRISMA guidelines, we analyzed 33 articles selected from PubMed, Cochrane, Scopus, and Web of Science databases. The last search was conducted in February 2025. Risk of bias was assessed using the PROBAST tool. Studies analyzing acoustic features between individuals at risk of suicide (RS) and those not at risk (NRS) were included, while studies lacking acoustic data, a suicide-related focus, or sufficient methodological details were excluded. Sample sizes varied widely and were reported in terms of participants or speech segments, depending on the study. Results were synthesized narratively based on acoustic features and classifier performance. Findings consistently showed significant acoustic feature variations between RS and NRS populations, particularly involving jitter, fundamental frequency (F0), Mel-frequency cepstral coefficients (MFCC), and power spectral density (PSD). Classifier performance varied based on algorithms, modalities, and speech elicitation methods, with multimodal approaches integrating acoustic, linguistic, and metadata features demonstrating superior performance. Among the 29 classifier-based studies, reported AUC values ranged from 0.62 to 0.985 and accuracies from 60% to 99.85%. Most datasets were imbalanced in favor of NRS, and performance metrics were rarely reported separately by group, limiting clear identification of direction of effect.