nan 0
- Europe > United Kingdom (0.04)
- Africa > Côte d'Ivoire (0.04)
- North America (0.04)
- Africa > Nigeria > Lagos State > Lagos (0.04)
Improving the stability of the covariance-controlled adaptive Langevin thermostat for large-scale Bayesian sampling
Stochastic gradient Langevin dynamics and its variants approximate the likelihood of an entire dataset, via random (and typically much smaller) subsets, in the setting of Bayesian sampling. Due to the (often substantial) improvement of the computational efficiency, they have been widely used in large-scale machine learning applications. It has been demonstrated that the so-called covariance-controlled adaptive Langevin (CCAdL) thermostat, which incorporates an additional term involving the covariance matrix of the noisy force, outperforms popular alternative methods. A moving average is used in CCAdL to estimate the covariance matrix of the noisy force, in which case the covariance matrix will converge to a constant matrix in long-time limit. Moreover, it appears in our numerical experiments that the use of a moving average could reduce the stability of the numerical integrators, thereby limiting the largest usable stepsize. In this article, we propose a modified CCAdL (i.e., mCCAdL) thermostat that uses the scaling part of the scaling and squaring method together with a truncated Taylor series approximation to the exponential to numerically approximate the exact solution to the subsystem involving the additional term proposed in CCAdL. We also propose a symmetric splitting method for mCCAdL, instead of an Euler-type discretisation used in the original CCAdL thermostat. We demonstrate in our numerical experiments that the newly proposed mCCAdL thermostat achieves a substantial improvement in the numerical stability over the original CCAdL thermostat, while significantly outperforming popular alternative stochastic gradient methods in terms of the numerical accuracy for large-scale machine learning applications.
- Europe > United Kingdom (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Switzerland (0.04)
A Tensor Residual Circuit Neural Network Factorized with Matrix Product Operation
It is challenging to reduce the complexity of neural networks while maintaining their generalization ability and robustness, especially for practical applications. Conventional solutions for this problem incorporate quantum-inspired neural networks with Kronecker products and hybrid tensor neural networks with MPO factorization and fully-connected layers. Nonetheless, the generalization power and robustness of the fully-connected layers are not as outstanding as circuit models in quantum computing. In this paper, we propose a novel tensor circuit neural network (TCNN) that takes advantage of the characteristics of tensor neural networks and residual circuit models to achieve generalization ability and robustness with low complexity. The proposed activation operation and parallelism of the circuit in complex number field improves its non-linearity and efficiency for feature learning. Moreover, since the feature information exists in the parameters in both the real and imaginary parts in TCNN, an information fusion layer is proposed for merging features stored in those parameters to enhance the generalization capability. Experimental results confirm that TCNN showcases more outstanding generalization and robustness with its average accuracies on various datasets 2\%-3\% higher than those of the state-of-the-art compared models. More significantly, while other models fail to learn features under noise parameter attacking, TCNN still showcases prominent learning capability owing to its ability to prevent gradient explosion. Furthermore, it is comparable to the compared models on the number of trainable parameters and the CPU running time. An ablation study also indicates the advantage of the activation operation, the parallelism architecture and the information fusion layer.
Scalable, Explainable and Provably Robust Anomaly Detection with One-Step Flow Matching
Li, Zhong, Huang, Qi, Zhu, Yuxuan, Yang, Lincen, Amiri, Mohammad Mohammadi, van Stein, Niki, van Leeuwen, Matthijs
We introduce Time-Conditioned Contraction Matching (TCCM), a novel method for semi-supervised anomaly detection in tabular data. TCCM is inspired by flow matching, a recent generative modeling framework that learns velocity fields between probability distributions and has shown strong performance compared to diffusion models and generative adversarial networks. Instead of directly applying flow matching as originally formulated, TCCM builds on its core idea -- learning velocity fields between distributions -- but simplifies the framework by predicting a time-conditioned contraction vector toward a fixed target (the origin) at each sampled time step. This design offers three key advantages: (1) a lightweight and scalable training objective that removes the need for solving ordinary differential equations during training and inference; (2) an efficient scoring strategy called one time-step deviation, which quantifies deviation from expected contraction behavior in a single forward pass, addressing the inference bottleneck of existing continuous-time models such as DTE (a diffusion-based model with leading anomaly detection accuracy but heavy inference cost); and (3) explainability and provable robustness, as the learned velocity field operates directly in input space, making the anomaly score inherently feature-wise attributable; moreover, the score function is Lipschitz-continuous with respect to the input, providing theoretical guarantees under small perturbations. Extensive experiments on the ADBench benchmark show that TCCM strikes a favorable balance between detection accuracy and inference cost, outperforming state-of-the-art methods -- especially on high-dimensional and large-scale datasets. The source code is available at our GitHub repository.
- Europe > Netherlands > South Holland > Leiden (0.04)
- North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.92)
- Research Report > Promising Solution (0.67)
- Health & Medicine > Diagnostic Medicine (0.46)
- Information Technology > Security & Privacy (0.45)
- Europe > United Kingdom (0.04)
- Africa > Côte d'Ivoire (0.04)
- North America (0.04)
- Africa > Nigeria > Lagos State > Lagos (0.04)
Federated Learning for Surgical Vision in Appendicitis Classification: Results of the FedSurg EndoVis 2024 Challenge
Kirchner, Max, Hoffmann, Hanna, Jenke, Alexander C., Saldanha, Oliver L., Pfeiffer, Kevin, Kanjo, Weam, Alekseenko, Julia, de Boer, Claas, Kolamuri, Santhi Raj, Mazza, Lorenzo, Padoy, Nicolas, Bano, Sophia, Reinke, Annika, Maier-Hein, Lena, Stoyanov, Danail, Kather, Jakob N., Kolbinger, Fiona R., Bodenstedt, Sebastian, Speidel, Stefanie
Purpose: The FedSurg challenge was designed to benchmark the state of the art in federated learning for surgical video classification. Its goal was to assess how well current methods generalize to unseen clinical centers and adapt through local fine-tuning while enabling collaborative model development without sharing patient data. Methods: Participants developed strategies to classify inflammation stages in appendicitis using a preliminary version of the multi-center Appendix300 video dataset. The challenge evaluated two tasks: generalization to an unseen center and center-specific adaptation after fine-tuning. Submitted approaches included foundation models with linear probing, metric learning with triplet loss, and various FL aggregation schemes (FedAvg, FedMedian, FedSAM). Performance was assessed using F1-score and Expected Cost, with ranking robustness evaluated via bootstrapping and statistical testing. Results: In the generalization task, performance across centers was limited. In the adaptation task, all teams improved after fine-tuning, though ranking stability was low. The ViViT-based submission achieved the strongest overall performance. The challenge highlighted limitations in generalization, sensitivity to class imbalance, and difficulties in hyperparameter tuning in decentralized training, while spatiotemporal modeling and context-aware preprocessing emerged as promising strategies. Conclusion: The FedSurg Challenge establishes the first benchmark for evaluating FL strategies in surgical video classification. Findings highlight the trade-off between local personalization and global robustness, and underscore the importance of architecture choice, preprocessing, and loss design. This benchmarking offers a reference point for future development of imbalance-aware, adaptive, and robust FL methods in clinical surgical AI.
- Europe > United Kingdom > England > Greater London > London (0.14)
- Europe > Germany > Saxony > Dresden (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)
- (8 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.67)
- Law (1.00)
- Government (1.00)
- Information Technology > Security & Privacy (0.93)
- (4 more...)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Privacy-Preserving Tabular Synthetic Data Generation Using TabularARGN
Sidorenko, Andrey, Tiwald, Paul
Synthetic data generation has become essential for securely sharing and analyzing sensitive data sets. Traditional anonymization techniques, however, often fail to adequately preserve privacy. We introduce the Tabular Auto-Regressive Generative Network (TabularARGN), a neural network architecture specifically designed for generating high-quality synthetic tabular data. Using a discretization-based auto-regressive approach, TabularARGN achieves high data fidelity while remaining computationally efficient. We evaluate TabularARGN against existing synthetic data generation methods, showing competitive results in statistical similarity, machine learning utility, and detection robustness. We further perform an in-depth privacy evaluation using systematic membership-inference attacks, highlighting the robustness and effective privacy-utility balance of our approach.
- Europe > Austria > Vienna (1.00)
- Asia > China > Beijing > Beijing (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- (4 more...)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
Cultural Bias in Large Language Models: Evaluating AI Agents through Moral Questionnaires
Are AI systems truly representing human values, or merely averaging across them? Our study suggests a concerning reality: Large Language Models (LLMs) fail to represent diverse cultural moral frameworks despite their linguistic capabilities. We expose significant gaps between AI-generated and human moral intuitions by applying the Moral Foundations Questionnaire across 19 cultural contexts. Comparing multiple state-of-the-art LLMs' origins against human baseline data, we find these models systematically homogenize moral diversity. Surprisingly, increased model size doesn't consistently improve cultural representation fidelity. Our findings challenge the growing use of LLMs as synthetic populations in social science research and highlight a fundamental limitation in current AI alignment approaches. Without data-driven alignment beyond prompting, these systems cannot capture the nuanced, culturally-specific moral intuitions. Our results call for more grounded alignment objectives and evaluation metrics to ensure AI systems represent diverse human values rather than flattening the moral landscape.
- Europe > Belgium (0.04)
- South America > Peru (0.04)
- South America > Colombia (0.04)
- (22 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.88)
Enhancing Object Detection Accuracy in Underwater Sonar Images through Deep Learning-based Denoising
Wang, Ziyu, Xue, Tao, Wang, Yanbin, Li, Jingyuan, Zhang, Haibin, Xu, Zhiqiang, Xu, Gaofei
Xidian University, China Xidian University, China Jiangxi University of Science and Technology, China Institute of Deep-sea Science and Engineering, China Abstract --Sonar image object detection is crucial for underwater robotics and other applications. However, various types of noise in sonar images can affect the accuracy of object detection. Denoising, as a critical preprocessing step, aims to remove noise while retaining useful information to improve detection accuracy. Although deep learning-based denoising algorithms perform well on optical images, their application to underwater sonar images remains underexplored. This paper systematically evaluates the effectiveness of several deep learning-based denoising algorithms, originally designed for optical images, in the context of underwater sonar image object detection. We apply nine trained denoising models to images from five open-source sonar datasets, each processing different types of noise. We then test the denoised images using four object detection algorithms. The results show that different denoising models have varying effects on detection performance. By combining the strengths of multiple denoising models, the detection results can be optimized, thus more effectively suppressing noise. Additionally, we adopt a multi-frame denoising technique, using different outputs generated by multiple denoising models as multiple frames of the same scene for further processing to enhance detection accuracy. This method, originally designed for optical images, leverages complementary noise-reduction effects. Experimental results show that denoised sonar images improve the performance of object detection algorithms compared to the original sonar images. I NTRODUCTION Underwater sonar imaging plays an indispensable role in marine exploration and various ocean industries, providing valuable insights into underwater environments. Unlike optical imaging, where light propagation is restricted, sonar systems utilize sound waves that travel farther, allowing them to cover larger underwater areas. This makes sonar images an ideal choice for applications such as seabed mapping, underwater object detection, and navigation. However, despite the advantages of sonar imaging, its image quality is often severely compromised by noise, which negatively impacts the accuracy of downstream tasks, such as object detection. In sonar images, noise can originate from various factors, including environmental interference, sensor imperfections, and the inherent characteristics of sound wave propagation Corresponding authors: Tao Xue, Y anbin Wang. in water. Common types of sonar image noise include Gaussian noise, speckle noise, and Poisson noise. Gaussian noise typically arises from random fluctuations in sensor readings or environmental changes. Speckle noise, caused by sound wave scattering, manifests as granular interference, which can obscure object boundaries.
- Asia > China (0.94)
- North America > United States (0.14)
- Europe > Italy (0.14)
Automatic Evaluation of Healthcare LLMs Beyond Question-Answering
Arias-Duart, Anna, Martin-Torres, Pablo Agustin, Hinjos, Daniel, Bernabeu-Perez, Pablo, Ganzabal, Lucia Urcelay, Mallo, Marta Gonzalez, Gururajan, Ashwin Kumar, Lopez-Cuena, Enrique, Alvarez-Napagao, Sergio, Garcia-Gasulla, Dario
Current Large Language Models (LLMs) benchmarks are often based on open-ended or close-ended QA evaluations, avoiding the requirement of human labor. Close-ended measurements evaluate the factuality of responses but lack expressiveness. Open-ended capture the model's capacity to produce discourse responses but are harder to assess for correctness. These two approaches are commonly used, either independently or together, though their relationship remains poorly understood. This work is focused on the healthcare domain, where both factuality and discourse matter greatly. It introduces a comprehensive, multi-axis suite for healthcare LLM evaluation, exploring correlations between open and close benchmarks and metrics. Findings include blind spots and overlaps in current methodologies. As an updated sanity check, we release a new medical benchmark--CareQA--, with both open and closed variants. Finally, we propose a novel metric for open-ended evaluations --Relaxed Perplexity-- to mitigate the identified limitations.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Montenegro (0.04)
- Europe > Croatia > Dubrovnik-Neretva County > Dubrovnik (0.04)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Materials > Chemicals (0.67)
- Health & Medicine > Therapeutic Area > Neurology (0.46)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)