Goto

Collaborating Authors

 quality image


SupplementaryMaterial

Neural Information Processing Systems

The relative performance gain for Fig.1 c) is In Tab. 6, we show FPS(F) FPS(E) of various feature fusion models with the varied set sizeN. Notethatmethodswithout intra-set relationships, PFE [11] and CFAN [3], are computationally very fast and require little memory. Incontrast, the maximum set sizeN for RSA [7] is384 because the intra-set attention with the feature map is a memory-intensivemodule. In other words, it is the mean of the row-wise entropy of the normalized assignment map. Lower entropy value tells you that the cluster features are deviating from a simple average of all samples.



Explainable Fundus Image Curation and Lesion Detection in Diabetic Retinopathy

arXiv.org Artificial Intelligence

Diabetic Retinopathy (DR) affects individuals with long-term diabetes. Without early diagnosis, DR can lead to vision loss. Fundus photography captures the structure of the retina along with abnormalities indicative of the stage of the disease. Artificial Intelligence (AI) can support clinicians in identifying these lesions, reducing manual workload, but models require high-quality annotated datasets. Due to the complexity of retinal structures, errors in image acquisition and lesion interpretation of manual annotators can occur. We proposed a quality-control framework, ensuring only high-standard data is used for evaluation and AI training. First, an explainable feature-based classifier is used to filter inadequate images. The features are extracted both using image processing and contrastive learning. Then, the images are enhanced and put subject to annotation, using deep-learning-based assistance. Lastly, the agreement between annotators calculated using derived formulas determines the usability of the annotations.


DRetNet: A Novel Deep Learning Framework for Diabetic Retinopathy Diagnosis

arXiv.org Artificial Intelligence

Diabetic retinopathy (DR) is a leading cause of blindness worldwide, necessitating early detection to prevent vision loss. Current automated DR detection systems often struggle with poor-quality images, lack interpretability, and insufficient integration of domain-specific knowledge. To address these challenges, we introduce a novel framework that integrates three innovative contributions: (1) Adaptive Retinal Image Enhancement Using Physics-Informed Neural Networks (PINNs): this technique dynamically enhances retinal images by incorporating physical constraints, improving the visibility of critical features such as microaneurysms, hemorrhages, and exudates; (2) Hybrid Feature Fusion Network (HFFN): by combining deep learning embeddings with handcrafted features, HFFN leverages both learned representations and domain-specific knowledge to enhance generalization and accuracy; (3) Multi-Stage Classifier with Uncertainty Quantification: this method breaks down the classification process into logical stages, providing interpretable predictions and confidence scores, thereby improving clinical trust. The proposed framework achieves an accuracy of 92.7%, a precision of 92.5%, a recall of 92.6%, an F1-score of 92.5%, an AUC of 97.8%, a mAP of 0.96, and an MCC of 0.85. Ophthalmologists rated the framework's predictions as highly clinically relevant (4.8/5), highlighting its alignment with real-world diagnostic needs. Qualitative analyses, including Grad-CAM visualizations and uncertainty heatmaps, further enhance the interpretability and trustworthiness of the system. The framework demonstrates robust performance across diverse conditions, including low-quality images, noisy data, and unseen datasets. These features make the proposed framework a promising tool for clinical adoption, enabling more accurate and reliable DR detection in resource-limited settings.


Cluster and Aggregate: Face Recognition with Large Probe Set Supplementary Material

Neural Information Processing Systems

The number of layers L in CN is equal to 2. For recent SoT A backbone models, the performance is saturated above 98 .5 . The performance gain is observed in both backbones. As the probe size increases, the role of a feature fusion model also increases. The relative performance gain for Fig.1 c) is calculated as We measured the FPS with Nvidia RTX3090. When a few samples' contribution is larger than the others Lower entropy value tells you that the cluster features are deviating from a simple average of all samples.



Ambient Diffusion Omni: Training Good Models with Bad Data

arXiv.org Artificial Intelligence

We show how to use low-quality, synthetic, and out-of-distribution images to improve the quality of a diffusion model. Typically, diffusion models are trained on curated datasets that emerge from highly filtered data pools from the Web and other sources. We show that there is immense value in the lower-quality images that are often discarded. We present Ambient Diffusion Omni, a simple, principled framework to train diffusion models that can extract signal from all available images during training. Our framework exploits two properties of natural images -- spectral power law decay and locality. We first validate our framework by successfully training diffusion models with images synthetically corrupted by Gaussian blur, JPEG compression, and motion blur. We then use our framework to achieve state-of-the-art ImageNet FID, and we show significant improvements in both image quality and diversity for text-to-image generative modeling. The core insight is that noise dampens the initial skew between the desired high-quality distribution and the mixed distribution we actually observe. We provide rigorous theoretical justification for our approach by analyzing the trade-off between learning from biased data versus limited unbiased data across diffusion times.


Progressive Transfer Learning for Multi-Pass Fundus Image Restoration

arXiv.org Artificial Intelligence

Diabetic retinopathy is a leading cause of vision impairment, making its early diagnosis through fundus imaging critical for effective treatment planning. However, the presence of poor quality fundus images caused by factors such as inadequate illumination, noise, blurring and other motion artifacts yields a significant challenge for accurate DR screening. In this study, we propose progressive transfer learning for multi pass restoration to iteratively enhance the quality of degraded fundus images, ensuring more reliable DR screening. Unlike previous methods that often focus on a single pass restoration, multi pass restoration via PTL can achieve a superior blind restoration performance that can even improve most of the good quality fundus images in the dataset. Initially, a Cycle GAN model is trained to restore low quality images, followed by PTL induced restoration passes over the latest restored outputs to improve overall quality in each pass. The proposed method can learn blind restoration without requiring any paired data while surpassing its limitations by leveraging progressive learning and fine tuning strategies to minimize distortions and preserve critical retinal features. To evaluate PTL's effectiveness on multi pass restoration, we conducted experiments on DeepDRiD, a large scale fundus imaging dataset specifically curated for diabetic retinopathy detection. Our result demonstrates state of the art performance, showcasing PTL's potential as a superior approach to iterative image quality restoration.


High performance Lunar landing simulations

arXiv.org Artificial Intelligence

Autonomous precision navigation to land onto the Moon relies on vision sensors. Computer vision algorithms are designed, trained and tested using synthetic simulations. High quality terrain models have been produced by Moon orbiters developed by several nations, with resolutions ranging from tens or hundreds of meters globally down to few meters locally. The SurRender software is a powerful simulator able to exploit the full potential of these datasets in raytracing. New interfaces include tools to fuse multi-resolution DEMs and procedural texture generation. A global model of the Moon at 20m resolution was integrated representing several terabytes of data which SurRender can render continuously and in real-time. This simulator will be a precious asset for the development of future missions.


Cluster and Aggregate: Face Recognition with Large Probe Set

arXiv.org Artificial Intelligence

Feature fusion plays a crucial role in unconstrained face recognition where inputs (probes or galleries) comprise of a set of N low quality images whose individual qualities vary. Advances in attention and recurrent modules have led to feature fusion that can model the relationship among the images in the input set. However, attention mechanisms cannot scale to large N due to their quadratic complexity and recurrent modules suffer from input order sensitivity. We propose a two-stage feature fusion paradigm, Cluster and Aggregate, that can both scale to large N and maintain the ability to perform sequential inference with order invariance. Specifically, Cluster stage is a linear assignment of N inputs to M global cluster centers, and Aggregation stage is a fusion over M clustered features. The clustered features play an integral role when the inputs are sequential as they can serve as a summarization of past features. By leveraging the order-invariance of incremental averaging operation, we design an update rule that achieves batch-order invariance, which guarantees that the contributions of early image in the sequence do not diminish as time steps increase. Experiments on IJB-B and IJB-S benchmark datasets show the superiority of the proposed two-stage paradigm in unconstrained face recognition. Code and pretrained models are available in Link.