Goto

Collaborating Authors

 standardization


Parajudica: An RDF-Based Reasoner and Metamodel for Multi-Framework Context-Dependent Data Compliance Assessments

Moreau, Luc, Rossi, Alfred, Stalla-Bourdillon, Sophie

arXiv.org Artificial Intelligence

We demonstrate the utility of this resource and accompanying metamodel through application to existing legal frameworks and industry standards, offering insights for comparative framework analysis. Applications include compliance policy enforcement, compliance monitoring, data discovery, and risk assessment.


Bridging the Modality Gap by Similarity Standardization with Pseudo-Positive Samples

Yamashita, Shuhei, Shirafuji, Daiki, Saito, Tatsuhiko

arXiv.org Artificial Intelligence

Advances in vision-language models (VLMs) have enabled effective cross-modality retrieval. However, when both text and images exist in the database, similarity scores would differ in scale by modality. This phenomenon, known as the modality gap, hinders accurate retrieval. Most existing studies address this issue with manually labeled data, e.g., by fine-tuning VLMs on them. In this work, we propose a similarity standardization approach with pseudo data construction. We first compute the mean and variance of the similarity scores between each query and its paired data in text or image modality. Using these modality-specific statistics, we standardize all similarity scores to compare on a common scale across modalities. These statistics are calculated from pseudo pairs, which are constructed by retrieving the text and image candidates with the highest cosine similarity to each query. We evaluate our method across seven VLMs using two multi-modal QA benchmarks (MMQA and WebQA), where each question requires retrieving either text or image data. Our experimental results show that our method significantly improves retrieval performance, achieving average Recall@20 gains of 64% on MMQA and 28% on WebQA when the query and the target data belong to different modalities. Compared to E5-V, which addresses the modality gap through image captioning, we confirm that our method more effectively bridges the modality gap.



Supplementary Materials for: Online Training Through Time for Spiking Neural Networks

Neural Information Processing Systems

A.3 Proof of Theorem 1 In this subsection, we prove Theorem 1 with Assumption 1. Assumption 1. l = 1,, N, t = 1,, T, diag null As described in Sections 4.1 and 4.2, for gradients of OTTT, we have Remark 2. The above conclusion mainly focuses on the gradients for connection weights Remark 3. Note that the gradients based on spike representation may also include small errors since A.4 Proof of Theorem 2 In this subsection, we prove Theorem 2. Theorem 2. If Assumption 1 holds, As described in Sections 4.1 and 4.2 and similar to the proof of Theorem 1, let Remark 4. The above conclusion considers the single-layer condition. It can be generalized to the multi-layer condition. Therefore, the conclusion can be directly generalized to these conditions as well. L} based on the gradient-based optimizer. For VGG network structures, we directly impose sWS on all weights. For more illustrations and other details, please directly refer to [4].



LUME-DBN: Full Bayesian Learning of DBNs from Incomplete data in Intensive Care

Pirola, Federico, Stella, Fabio, Grzegorczyk, Marco

arXiv.org Artificial Intelligence

Dynamic Bayesian networks (DBNs) are increasingly used in healthcare due to their ability to model complex temporal relationships in patient data while maintaining interpretability, an essential feature for clinical decision-making. However, existing approaches to handling missing data in longitudinal clinical datasets are largely derived from static Bayesian networks literature, failing to properly account for the temporal nature of the data. This gap limits the ability to quantify uncertainty over time, which is particularly critical in settings such as intensive care, where understanding the temporal dynamics is fundamental for model trustworthiness and applicability across diverse patient groups. Despite the potential of DBNs, a full Bayesian framework that integrates missing data handling remains underdeveloped. In this work, we propose a novel Gibbs sampling-based method for learning DBNs from incomplete data. Our method treats each missing value as an unknown parameter following a Gaussian distribution. At each iteration, the unobserved values are sampled from their full conditional distributions, allowing for principled imputation and uncertainty estimation. We evaluate our method on both simulated datasets and real-world intensive care data from critically ill patients. Compared to standard model-agnostic techniques such as MICE, our Bayesian approach demonstrates superior reconstruction accuracy and convergence properties. These results highlight the clinical relevance of incorporating full Bayesian inference in temporal models, providing more reliable imputations and offering deeper insight into model behavior. Our approach supports safer and more informed clinical decision-making, particularly in settings where missing data are frequent and potentially impactful.


Market-Driven Subset Selection for Budgeted Training

Jha, Ashish, Leplat, Valentin, Phan, AH

arXiv.org Artificial Intelligence

Training large language models on massive datasets is computationally expensive, yet empirical evidence suggests that substantial portions of training examples contribute minimally to final performance. Data subset selection addresses this inefficiency by identifying small, high-utility subsets under resource constraints. However, example utility is inherently multi-faceted, encompassing uncertainty, distributional rarity, and diversity signals that are heterogeneous and typically combined through ad hoc weighted sums lacking theoretical grounding. We propose a market-based framework that treats each training example as a tradeable contract and employs the Logarithmic Market Scoring Rule to aggregate multiple utility signals into coherent prices. Heterogeneous signals act as traders, a single liquidity parameter controls concentration versus smoothing, and topic-wise normalization ensures calibrated aggregation. Token budgets are handled explicitly through a price-per-token decision rule with an interpretable length-bias parameter. We establish theoretical connections to maximum-entropy aggregation and provide utility recovery guarantees under noisy but monotone signals. On GSM8K mathematical reasoning under strict 60k-token budgets, our selector achieves parity with strong single-signal baselines while exhibiting lower variance and incurring less than 0.1 GPU-hour overhead. On AGNews classification at 5-25\% retention rates, the market formulation delivers competitive accuracy with improved stability. Our framework unifies multi-signal data curation under fixed computational budgets for prompt-level reasoning and classification tasks.



Navigating the EU AI Act: Foreseeable Challenges in Qualifying Deep Learning-Based Automated Inspections of Class III Medical Devices

Diaz, Julio Zanon, Brennan, Tommy, Corcoran, Peter

arXiv.org Artificial Intelligence

As deep learning (DL) technologies advance, their application in automated visual inspection for Class III medical devices offers significant potential to enhance quality assurance and reduce human error. However, the adoption of such AI-based systems introduces new regulatory complexities-particularly under the EU Artificial Intelligence (AI) Act, which imposes high-risk system obligations that differ in scope and depth from established regulatory frameworks such as the Medical Device Regulation (MDR) and the U.S. FDA Quality System Regulation (QSR). This paper presents a high-level technical assessment of the foreseeable challenges that manufacturers are likely to encounter when qualifying DL-based automated inspections -- specifically static models -- within the existing medical device compliance landscape. It examines divergences in risk management principles, dataset governance, model validation, explainability requirements, and post-deployment monitoring obligations. The discussion also explores potential implementation strategies and highlights areas of uncertainty, including data retention burdens, global compliance implications, and the practical difficulties of achieving statistical significance in validation with limited defect data. Disclaimer: This paper presents a technical perspective and does not constitute legal or regulatory advice.


Fostering Robots: A Governance-First Conceptual Framework for Domestic, Curriculum-Based Trajectory Collection

Pablo-Marti, Federico, Fernandez, Carlos Mir

arXiv.org Artificial Intelligence

We propose a conceptual, empirically testable framework for Robot Fostering, -a curriculum-driven, governance-first approach to domestic robot deployments, emphasizing long-term, curated interaction trajectories. We formalize trajectory quality with quantifiable metrics and evaluation protocols aligned with EU-grade governance standards, delineating a low-resource empirical roadmap to enable rigorous validation through future pilot studies.