Goto

Collaborating Authors

 real-esrgan


Super-Resolution Generative Adversarial Networks based Video Enhancement

Çetin, Kağan, Akça, Hacer, Gerek, Ömer Nezih

arXiv.org Artificial Intelligence

This study introduces an enhanced approach to video super-resolution by extending ordinary Single-Image Super-Resolution (SISR) Super-Resolution Generative Adversarial Network (SRGAN) structure to handle spatio-temporal data. While SRGAN has proven effective for single-image enhancement, its design does not account for the temporal continuity required in video processing. To address this, a modified framework that incorporates 3D Non-Local Blocks is proposed, which is enabling the model to capture relationships across both spatial and temporal dimensions. An experimental training pipeline is developed, based on patch-wise learning and advanced data degradation techniques, to simulate real-world video conditions and learn from both local and global structures and details. This helps the model generalize better and maintain stability across varying video content while maintaining the general structure besides the pixel-wise correctness. Two model variants--one larger and one more lightweight--are presented to explore the trade-offs between performance and efficiency. The results demonstrate improved temporal coherence, sharper textures, and fewer visual artifacts compared to traditional single-image methods. This work contributes to the development of practical, learning-based solutions for video enhancement tasks, with potential applications in streaming, gaming, and digital restoration.


Data Augmentation and Resolution Enhancement using GANs and Diffusion Models for Tree Segmentation

Ferreira, Alessandro dos Santos, Ramos, Ana Paula Marques, Junior, José Marcato, Gonçalves, Wesley Nunes

arXiv.org Artificial Intelligence

Urban forests play a key role in enhancing environmental quality and supporting biodiversity in cities. Mapping and monitoring these green spaces are crucial for urban planning and conservation, yet accurately detecting trees is challenging due to complex landscapes and the variability in image resolution caused by different satellite sensors or UAV flight altitudes. While deep learning architectures have shown promise in addressing these challenges, their effectiveness remains strongly dependent on the availability of large and manually labeled datasets, which are often expensive and difficult to obtain in sufficient quantity. In this work, we propose a novel pipeline that integrates domain adaptation with GANs and Diffusion models to enhance the quality of low-resolution aerial images. Our proposed pipeline enhances low-resolution imagery while preserving semantic content, enabling effective tree segmentation without requiring large volumes of manually annotated data. Leveraging models such as pix2pix, Real-ESRGAN, Latent Diffusion, and Stable Diffusion, we generate realistic and structurally consistent synthetic samples that expand the training dataset and unify scale across domains. This approach not only improves the robustness of segmentation models across different acquisition conditions but also provides a scalable and replicable solution for remote sensing scenarios with scarce annotation resources. Experimental results demonstrated an improvement of over 50% in IoU for low-resolution images, highlighting the effectiveness of our method compared to traditional pipelines.


Multi-Stage Generative Upscaler: Reconstructing Football Broadcast Images via Diffusion Models

Martini, Luca, Zolezzi, Daniele, Iacono, Saverio, Vercelli, Gianni Viardo

arXiv.org Artificial Intelligence

Generative Artificial Intelligence (genAI) represents a groundbreaking approach to creativity and automation, empowering machines to produce novel and highly realistic data, including images, text, and music. Among the diverse generative models, Diffusion Models have emerged as a powerful technique for high-quality image synthesis. Rooted in the principles of probabilistic modeling, Diffusion Models iteratively refine noise into detailed and coherent representations, achieving remarkable performance in domains like image generation, image inpainting and style transfer. Diffusion Models have gained traction due to their versatility and robustness, allowing them to excel in challenging tasks where conventional generative approaches, such as Generative Adversarial Networks (GANs), often struggle. These models leverage a forward-backward diffusion process, where images are progressively noised during the forward phase and restored to their original form during the reverse phase.


Super-Resolution for Interferometric Imaging: Model Comparisons and Performance Analysis

Abdioglu, Hasan Berkay, Gursoy, Rana, Isik, Yagmur, Balci, Ibrahim Cem, Unal, Taha, Bayer, Kerem, Inal, Mustafa Ismail, Serin, Nehir, Kosar, Muhammed Furkan, Esmer, Gokhan Bora, Uvet, Huseyin

arXiv.org Artificial Intelligence

This study investigates the application of Super-Resolution techniques in holographic microscopy to enhance quantitative phase imaging. An off-axis Mach-Zehnder interferometric setup was employed to capture interferograms. The study evaluates two Super-Resolution models, RCAN and Real-ESRGAN, for their effectiveness in reconstructing high-resolution interferograms from a microparticle-based dataset. The models were assessed using two primary approaches: image-based analysis for structural detail enhancement and morphological evaluation for maintaining sample integrity and phase map accuracy. The results demonstrate that RCAN achieves superior numerical precision, making it ideal for applications requiring highly accurate phase map reconstruction, while Real-ESRGAN enhances visual quality and structural coherence, making it suitable for visualization-focused applications. This study highlights the potential of Super-Resolution models in overcoming diffraction-imposed resolution limitations in holographic microscopy, opening the way for improved imaging techniques in biomedical diagnostics, materials science, and other high-precision fields.


Appeal prediction for AI up-scaled Images

Göring, Steve, Merten, Rasmus, Raake, Alexander

arXiv.org Artificial Intelligence

DNN- or AI-based up-scaling algorithms are gaining in popularity due to the improvements in machine learning. Various up-scaling models using CNNs, GANs or mixed approaches have been published. The majority of models are evaluated using PSRN and SSIM or only a few example images. However, a performance evaluation with a wide range of real-world images and subjective evaluation is missing, which we tackle in the following paper. For this reason, we describe our developed dataset, which uses 136 base images and five different up-scaling methods, namely Real-ESRGAN, BSRGAN, waifu2x, KXNet, and Lanczos. Overall the dataset consists of 1496 annotated images. The labeling of our dataset focused on image appeal and has been performed using crowd-sourcing employing our open-source tool AVRate Voyager. We evaluate the appeal of the different methods, and the results indicate that Real-ESRGAN and BSRGAN are the best. Furthermore, we train a DNN to detect which up-scaling method has been used, the trained models have a good overall performance in our evaluation. In addition to this, we evaluate state-of-the-art image appeal and quality models, here none of the models showed a high prediction performance, therefore we also trained two own approaches. The first uses transfer learning and has the best performance, and the second model uses signal-based features and a random forest model with good overall performance. We share the data and implementation to allow further research in the context of open science.


Efficient Medicinal Image Transmission and Resolution Enhancement via GAN

Sharma, Rishabh Kumar, Sharma, Mukund, Sharma, Pushkar, Aparjeeta, Jeetashree

arXiv.org Artificial Intelligence

While X-ray imaging is indispensable in medical diagnostics, it inherently carries with it those noises and limitations on resolution that mask the details necessary for diagnosis. B/W X-ray images require a careful balance between noise suppression and high-detail preservation to ensure clarity in soft-tissue structures and bone edges. While traditional methods, such as CNNs and early super-resolution models like ESRGAN, have enhanced image resolution, they often perform poorly regarding high-frequency detail preservation and noise control for B/W imaging. We are going to present one efficient approach that improves the quality of an image with the optimization of network transmission in the following paper. The pre-processing of X-ray images into low-resolution files by Real-ESRGAN, a version of ESRGAN elucidated and improved, helps reduce the server load and transmission bandwidth. Lower-resolution images are upscaled at the receiving end using Real-ESRGAN, fine-tuned for real-world image degradation. The model integrates Residual-in-Residual Dense Blocks with perceptual and adversarial loss functions for high-quality upscaled images with low noise. We further fine-tune Real-ESRGAN by adapting it to the specific B/W noise and contrast characteristics. This suppresses noise artifacts without compromising detail. The comparative evaluation conducted shows that our approach achieves superior noise reduction and detail clarity compared to state-of-the-art CNN-based and ESRGAN models, apart from reducing network bandwidth requirements. These benefits are confirmed both by quantitative metrics, including Peak Signal-to-Noise Ratio and Structural Similarity Index, and by qualitative assessments, which indicate the potential of Real-ESRGAN for diagnostic-quality X-ray imaging and for efficient medical data transmission.


Using Super-Resolution Imaging for Recognition of Low-Resolution Blurred License Plates: A Comparative Study of Real-ESRGAN, A-ESRGAN, and StarSRGAN

Wang, Ching-Hsiang

arXiv.org Artificial Intelligence

With the robust development of technology, license plate recognition technology can now be properly applied in various scenarios, such as road monitoring, tracking of stolen vehicles, detection at parking lot entrances and exits, and so on. However, the precondition for these applications to function normally is that the license plate must be 'clear' enough to be recognized by the system with the correct license plate number. If the license plate becomes blurred due to some external factors, then the accuracy of recognition will be greatly reduced. Although there are many road surveillance cameras in Taiwan, the quality of most cameras is not good, often leading to the inability to recognize license plate numbers due to low photo resolution. Therefore, this study focuses on using super-resolution technology to process blurred license plates. This study will mainly fine-tune three super-resolution models: Real-ESRGAN, A-ESRGAN, and StarSRGAN, and compare their effectiveness in enhancing the resolution of license plate photos and enabling accurate license plate recognition. By comparing different super-resolution models, it is hoped to find the most suitable model for this task, providing valuable references for future researchers.


Single MR Image Super-Resolution using Generative Adversarial Network

Rashid, Shawkh Ibne, Shakibapour, Elham, Ebrahimi, Mehran

arXiv.org Artificial Intelligence

Spatial resolution of medical images can be improved using super-resolution methods. Real Enhanced Super Resolution Generative Adversarial Network (Real-ESRGAN) is one of the recent effective approaches utilized to produce higher resolution images, given input images of lower resolution. In this paper, we apply this method to enhance the spatial resolution of 2D MR images. In our proposed approach, we slightly modify the structure of the Real-ESRGAN to train 2D Magnetic Resonance images (MRI) taken from the Brain Tumor Segmentation Challenge (BraTS) 2018 dataset. The obtained results are validated qualitatively and quantitatively by computing SSIM (Structural Similarity Index Measure), NRMSE (Normalized Root Mean Square Error), MAE (Mean Absolute Error), and VIF (Visual Information Fidelity) values.