Goto

Collaborating Authors

 Performance Analysis


IterMask3D: Unsupervised Anomaly Detection and Segmentation with Test-Time Iterative Mask Refinement in 3D Brain MR

arXiv.org Artificial Intelligence

Unsupervised anomaly detection and segmentation methods train a model to learn the training distribution as'normal'. In the testing phase, they identify patterns that deviate from this normal distribution as'anomalies'. To learn the'normal' distribution, prevailing methods corrupt the images and train a model to reconstruct them. During testing, the model attempts to reconstruct corrupted inputs based on the learned'normal' distribution. Deviations from this distribution lead to high reconstruction errors, which indicate potential anomalies. However, corrupting an input image inevitably causes information loss even in normal regions, leading to suboptimal reconstruction and an increased risk of false positives. To alleviate this, we propose IterMask3D, an iterative spatial mask-refining strategy designed for 3D brain MRI. We iteratively spatially mask areas of the image as corruption and reconstruct them, then shrink the mask based on reconstruction error. This process iteratively unmasks'normal' areas to the model, whose information further guides reconstruction of'normal' patterns under the mask to be reconstructed accurately, reducing false positives. In addition, to achieve better reconstruction performance, we also propose using high-frequency image content as additional structural information to guide the reconstruction of the masked area. Extensive experiments on the detection of both synthetic and real-world imaging artifacts, as well as segmentation of various pathological lesions across multiple MRI sequences, consistently demonstrate the effectiveness of our proposed method. Introduction Segmenting anomalies is crucial in the field of medical image analysis as it enables applications such as early disease detection and diagnosis, guides treatment planning, and reduces clinical workload. Conventional anomaly segmentation methods are mostly supervised, relying on annotated training data, where images contain anomalies with corresponding manual labels. Trained on a limited set of data types, the model can only segment anomalies resembling those in its training data, and struggles to detect other types of unseen anomalies. In the context of brain MRI images, the focus of this study, this type of methods typically target a specific pathology (Kamnitsas et al., 2017; Isensee et al., 2021; Stollenga et al., 2015). Unsupervised anomaly segmentation, on the other hand, does not require any'anomalous' images or their manual segmentations during training. Instead, it is trained exclusively on'normal' images and treats this training distribution as the'normal' reference. During testing, the method regards any deviation from this reference as'anomaly' and attempts to segment it.


NoisePrints: Distortion-Free Watermarks for Authorship in Private Diffusion Models

arXiv.org Artificial Intelligence

With the rapid adoption of diffusion models for visual content generation, proving authorship and protecting copyright have become critical. This challenge is particularly important when model owners keep their models private and may be unwilling or unable to handle authorship issues, making third-party verification essential. A natural solution is to embed watermarks for later verification. However, existing methods require access to model weights and rely on computationally heavy procedures, rendering them impractical and non-scalable. To address these challenges, we propose , a lightweight watermarking scheme that utilizes the random seed used to initialize the diffusion process as a proof of authorship without modifying the generation process. Our key observation is that the initial noise derived from a seed is highly correlated with the generated visual content. By incorporating a hash function into the noise sampling process, we further ensure that recovering a valid seed from the content is infeasible. We also show that sampling an alternative seed that passes verification is infeasible, and demonstrate the robustness of our method under various manipulations. Finally, we show how to use cryptographic zero-knowledge proofs to prove ownership without revealing the seed. By keeping the seed secret, we increase the difficulty of watermark removal. In our experiments, we validate NoisePrints on multiple state-of-the-art diffusion models for images and videos, demonstrating efficient verification using only the seed and output, without requiring access to model weights.


Digital Twin-based Out-of-Distribution Detection in Autonomous Vessels

arXiv.org Artificial Intelligence

An autonomous vessel (AV) is a complex cyber-physical system (CPS) with software enabling many key functionalities, e.g., navigation software enables an AV to autonomously or semi-autonomously follow a path to its destination. Digital twins of such AVs enable advanced functionalities such as running what-if scenarios, performing predictive maintenance, and enabling fault diagnosis. Due to technological improvements, real-time analyses using continuous data from vessels' real-time operations have become increasingly possible. However, the literature has little explored developing advanced analyses in real-time data in AVs with digital twins built with machine learning techniques. To this end, we present a novel digital twin-based approach (ODDIT) to detect future out-of-distribution (OOD) states of an AV before reaching them, enabling proactive intervention. Such states may indicate anomalies requiring attention (e.g., manual correction by the ship master) and assist testers in scenario-centered testing. The digital twin consists of two machine-learning models predicting future vessel states and whether the predicted state will be OOD. We evaluated ODDIT with five vessels across waypoint and zigzag maneuvering under simulated conditions, including sensor and actuator noise and environmental disturbances i.e., ocean current. ODDIT achieved high accuracy in detecting OOD states, with AUROC and TNR@TPR95 scores reaching 99\% across multiple vessels.


A Modular Object Detection System for Humanoid Robots Using YOLO

arXiv.org Artificial Intelligence

Within the field of robotics, computer vision remains a significant barrier to progress, with many tasks hindered by inefficient vision systems. This research proposes a generalized vision module leveraging YOLOv9, a state-of-the-art framework optimized for computationally constrained environments like robots. The model is trained on a dataset tailored to the FIRA robotics Hurocup. A new vision module is implemented in ROS1 using a virtual environment to enable YOLO compatibility. Performance is evaluated using metrics such as frames per second (FPS) and Mean Average Precision (mAP). Performance is then compared to the existing geometric framework in static and dynamic contexts. The YOLO model achieved comparable precision at a higher computational cost then the geometric model, while providing improved robustness.


What "Not" to Detect: Negation-Aware VLMs via Structured Reasoning and Token Merging

arXiv.org Artificial Intelligence

State-of-the-art vision-language models (VLMs) suffer from a critical failure in understanding negation, often referred to as affirmative bias. This limitation is particularly severe in described object detection (DOD) tasks. To address this, we propose two primary contributions: (1) a new dataset pipeline and (2) a novel, lightweight adaptation recipe. First, we introduce CoVAND, a dataset constructed with a systematic chain-of-thought (CoT) and VQA-based pipeline to generate high-quality, instance-grounded negation data. Second, we propose NegToMe, a novel text token merging module that directly tackles the architectural cause of affirmative bias. NegToMe fundamentally addresses the structural loss of negation cues in tokenization, grouping them with attributes into coherent semantic phrases. It maintains correct polarity at the input level, enabling robust negation understanding even with limited data. For instance, to prevent a model from treating the fragmented tokens "not" and "girl" as simply "girl", NegToMe binds them into a single token whose meaning is correctly distinguished from that of "girl" alone. This module is integrated with a parameter-efficient and strategic LoRA fine-tuning approach. Our method significantly improves performance on challenging negation benchmarks with a lowered false positive rate, boosting NMS-AP by up to +10.8 points on OVDEval and demonstrating generalization to SoTA VLMs. This work marks a crucial step forward in addressing negation understanding for real-world detection applications.


Sample-Centric Multi-Task Learning for Detection and Segmentation of Industrial Surface Defects

arXiv.org Artificial Intelligence

Industrial surface defect inspection for sample-wise quality control (QC) must simultaneously decide whether a given sample contains defects and localize those defects spatially. In real production lines, extreme foreground-background imbalance, defect sparsity with a long-tailed scale distribution, and low contrast are common. As a result, pixel-centric training and evaluation are easily dominated by large homogeneous regions, making it difficult to drive models to attend to small or low-contrast defects-one of the main bottlenecks for deployment. Empirically, existing models achieve strong pixel-overlap metrics (e.g., mIoU) but exhibit insufficient stability at the sample level, especially for sparse or slender defects. The root cause is a mismatch between the optimization objective and the granularity of QC decisions. To address this, we propose a sample-centric multi-task learning framework and evaluation suite. Built on a shared-encoder architecture, the method jointly learns sample-level defect classification and pixel-level mask localization. Sample-level supervision modulates the feature distribution and, at the gradient level, continually boosts recall for small and low-contrast defects, while the segmentation branch preserves boundary and shape details to enhance per-sample decision stability and reduce misses. For evaluation, we propose decision-linked metrics, Seg_mIoU and Seg_Recall, which remove the bias of classical mIoU caused by empty or true-negative samples and tightly couple localization quality with sample-level decisions. Experiments on two benchmark datasets demonstrate that our approach substantially improves the reliability of sample-level decisions and the completeness of defect localization.


LLM-Guided Synthetic Augmentation (LGSA) for Mitigating Bias in AI Systems

arXiv.org Artificial Intelligence

This is the preprint version of the article "LLM - Guided Synthetic Augmentation (LGSA) for Mitigating Bias in AI Systems." This version is made available on arXiv for early dissemination. If accepted, the final authenticated version will be published in the respective venue. Dr. G opichand G School of Computer Science and Engineering Vellore Institute of Technology Vellore - 632014, TamilNadu, India gopichand.g@vit.ac.in Abstract -- Bias in Artificial Intelligence systems, especially those that rely on natural language data, brings up serious ethical and practical issues. When certain groups are underrepresented, it often leads to uneven performance across different demographics. Whil e traditional fairness methods like pre - processing, in - processing, and post - processing can be helpful, they usually depend on protected - attribute labels, create a trade - off between accuracy and fairness, and struggle to adapt across various datas ets. To tackle these challenges, this study presents LLM - Guided Synthetic Augmentation (LGSA), a process that leverages large language models to create counterfactual examples for underrepresented groups while keeping label integrity intact. We put LGSA to the test on a controlled dataset of short English sentences that included gendered pronouns, professions, and binary task labels. The process involved using structured prompts to a large language model to generate gender - swapped paraphrases, followed by a thorough quality control process. This included checking for semantic similarity, verifying attributes, screening for toxi city, and conducting human spot checks. The augmented dataset broadened training coverage and was utilized to train a classifier under consistent experimental conditions. The results showed that LGSA significantly lessens performance disparities without co mpromising accuracy. The baseline model achieved an impressive 96.7% accuracy but had a gender bias gap of 7.2%. A simple swap augmentation brought the gap down to 0.7% but also reduced accuracy to 95.6%. In contrast, LGSA achieved an overall accuracy of 9 9.1%, showing strong performance on female - labeled examples and a reduced gap of 1.9%. These results indicate that LGSA is a powerful and dependable strategy for mitigating bias. By generating diverse and semantically accurate counterfactuals, this method enhances the balance of subgroup performance, narrows bias gaps, and maintains high ove rall task accuracy and label fidelity, showcasing its potential as a practical framework for fairness - focused AI systems.


Balancing Performance and Reject Inclusion: A Novel Confident Inlier Extrapolation Framework for Credit Scoring

arXiv.org Artificial Intelligence

Reject Inference (RI) methods aim to address sample bias by inferring missing repayment data for rejected credit applicants. Traditional approaches often assume that the behavior of rejected clients can be extrapolated from accepted clients, despite potential distributional differences between the two populations. To mitigate this blind extrapolation, we propose a novel Confident Inlier Extrapolation framework (CI-EX). CI-EX iteratively identifies the distribution of rejected client samples using an outlier detection model and assigns labels to rejected individuals closest to the distribution of the accepted population based on probabilities derived from a supervised classification model. The effectiveness of our proposed framework is validated through experiments on two large real-world credit datasets. Performance is evaluated using the Area Under the Curve (AUC) as well as RI-specific metrics such as Kickout and a novel metric introduced in this work, denoted as Area under the Kickout. Our findings reveal that RI methods, including the proposed framework, generally involve a trade-off between AUC and RI-specific metrics. However, the proposed CI-EX framework consistently outperforms existing RI models from the credit literature in terms of RI-specific metrics while maintaining competitive performance in AUC across most experiments.


Beyond Discrete Categories: Multi-Task Valence-Arousal Modeling for Pet Vocalization Analysis

arXiv.org Artificial Intelligence

Traditional pet emotion recognition from vocalizations, based on discrete classification, struggles with ambiguity and capturing intensity variations. We propose a continuous Valence-Arousal (VA) model that represents emotions in a two-dimensional space. Our method uses an automatic VA label generation algorithm, enabling large-scale annotation of 42,553 pet vocalization samples. A multi-task learning framework jointly trains VA regression with auxiliary tasks (emotion, body size, gender) to enhance prediction by improving feature learning. Our Audio Transformer model achieves a validation Valence Pearson correlation of r = 0.9024 and an Arousal r = 0.7155, effectively resolving confusion between discrete categories like "territorial" and "happy." This work introduces the first continuous VA framework for pet vocalization analysis, offering a more expressive representation for human-pet interaction, veterinary diagnostics, and behavioral training. The approach shows strong potential for deployment in consumer products like AI pet emotion translators.


AutoCode: LLMs as Problem Setters for Competitive Programming

arXiv.org Artificial Intelligence

Writing competitive programming problems is exacting. Authors must: set constraints, input distributions, and edge cases that rule out shortcuts; target specific algorithms (e.g., max-flow, dynamic programming, data structures); and calibrate complexity beyond the reach of most competitors. We argue that this makes for an ideal test of general large language model capabilities and study whether they can do this reliably. We introduce AutoCode, which uses multiple rounds of validation to yield competition-grade problem statements and test cases. On held-out problems, AutoCode test suites approach 99% consistency with official judgments, a significant improvement over current state-of-the-art methods like HardTests, which achieve less than 81%. Furthermore, starting with a random seed problem, AutoCode can create novel variants with reference and brute-force solutions. By cross-verifying these generated solutions against test cases, we can further filter out malformed problems. Our system ensures high correctness, as verified by human experts. AutoCode successfully produces novel problems judged by Grandmaster-level (top 0.3%) competitive programmers to be of contest quality.