Goto

Collaborating Authors

 distorted image


Looking Into the Water by Unsupervised Learning of the Surface Shape

Neural Information Processing Systems

We address the problem of looking into the water from the air, where we seek to remove image distortions caused by refractions at the water surface. Our approach is based on modeling the different water surface structures at various points in time, assuming the underlying image is constant. To this end, we propose a model that consists of two neural-field networks. The first network predicts the height of the water surface at each spatial position and time, and the second network predicts the image color at each position. Using both networks, we reconstruct the observed sequence of images and can therefore use unsupervised training.


Image Quality Assessment for Embodied AI

arXiv.org Artificial Intelligence

Embodied AI has developed rapidly in recent years, but it is still mainly deployed in laboratories, with various distortions in the Real-world limiting its application. Traditionally, Image Quality Assessment (IQA) methods are applied to predict human preferences for distorted images; however, there is no IQA method to assess the usability of an image in embodied tasks, namely, the perceptual quality for robots. To provide accurate and reliable quality indicators for future embodied scenarios, we first propose the topic: IQA for Embodied AI. Specifically, we (1) based on the Mertonian system and meta-cognitive theory, constructed a perception-cognition-decision-execution pipeline and defined a comprehensive subjective score collection process; (2) established the Embodied-IQA database, containing over 36k reference/distorted image pairs, with more than 5m fine-grained annotations provided by Vision Language Models/Vision Language Action-models/Real-world robots; (3) trained and validated the performance of mainstream IQA methods on Embodied-IQA, demonstrating the need to develop more accurate quality indicators for Embodied AI. We sincerely hope that through evaluation, we can promote the application of Embodied AI under complex distortions in the Real-world. Project page: https://github.com/lcysyzxdxc/EmbodiedIQA


No-Reference Image Contrast Assessment with Customized EfficientNet-B0

arXiv.org Artificial Intelligence

Image contrast was a fundamental factor in visual perception and played a vital role in overall image quality. However, most no reference image quality assessment NR IQA models struggled to accurately evaluate contrast distortions under diverse real world conditions. In this study, we proposed a deep learning based framework for blind contrast quality assessment by customizing and fine-tuning three pre trained architectures, EfficientNet B0, ResNet18, and MobileNetV2, for perceptual Mean Opinion Score, along with an additional model built on a Siamese network, which indicated a limited ability to capture perceptual contrast distortions. Each model is modified with a contrast-aware regression head and trained end to end using targeted data augmentations on two benchmark datasets, CID2013 and CCID2014, containing synthetic and authentic contrast distortions. Performance is evaluated using Pearson Linear Correlation Coefficient and Spearman Rank Order Correlation Coefficient, which assess the alignment between predicted and human rated scores. Among these three models, our customized EfficientNet B0 model achieved state-of-the-art performance with PLCC = 0.9286 and SRCC = 0.9178 on CCID2014 and PLCC = 0.9581 and SRCC = 0.9369 on CID2013, surpassing traditional methods and outperforming other deep baselines. These results highlighted the models robustness and effectiveness in capturing perceptual contrast distortion. Overall, the proposed method demonstrated that contrast aware adaptation of lightweight pre trained networks can yield a high performing, scalable solution for no reference contrast quality assessment suitable for real time and resource constrained applications.


Will Smith accused of using AI to create fake crowd in concert performance footage

FOX News

Fox News Flash top entertainment and celebrity headlines are here. Will Smith is facing accusations of using artificial intelligence to create a crowd in a video shared online. Smith, 56, posted a YouTube clip allegedly featuring scenes from a tour performance, but eagle-eyed fans were quick to point out purported inaccuracies in the video. The "Gettin' Jiggy Wit It" singer appeared to be singing to a packed room while on tour, only for distorted images to materialize in the crowd. Will Smith faced backlash for alleged AI use in a video shared online.


InvZW: Invariant Feature Learning via Noise-Adversarial Training for Robust Image Zero-Watermarking

arXiv.org Artificial Intelligence

This paper introduces a novel deep learning framework for robust image zero-watermarking based on distortion-invariant feature learning. As a zero-watermarking scheme, our method leaves the original image unaltered and learns a reference signature through optimization in the feature space. The proposed framework consists of two key modules. In the first module, a feature extractor is trained via noise-adversarial learning to generate representations that are both invariant to distortions and semantically expressive. This is achieved by combining adversarial supervision against a distortion discriminator and a reconstruction constraint to retain image content. In the second module, we design a learning-based multibit zero-watermarking scheme where the trained invariant features are projected onto a set of trainable reference codes optimized to match a target binary message. Extensive experiments on diverse image datasets and a wide range of distortions show that our method achieves state-of-the-art robustness in both feature stability and watermark recovery. Comparative evaluations against existing self-supervised and deep watermarking techniques further highlight the superiority of our framework in generalization and robustness.


Text-Guided Image Invariant Feature Learning for Robust Image Watermarking

arXiv.org Artificial Intelligence

Ensuring robustness in image watermarking is crucial for and maintaining content integrity under diverse transformations. Recent self-supervised learning (SSL) approaches, such as DINO, have been leveraged for watermarking but primarily focus on general feature representation rather than explicitly learning invariant features. In this work, we propose a novel text-guided invariant feature learning framework for robust image watermarking. Our approach leverages CLIP's multimodal capabilities, using text embeddings as stable semantic anchors to enforce feature invariance under distortions. We evaluate the proposed method across multiple datasets, demonstrating superior robustness against various image transformations. Compared to state-of-the-art SSL methods, our model achieves higher cosine similarity in feature consistency tests and outperforms existing watermarking schemes in extraction accuracy under severe distortions. These results highlight the efficacy of our method in learning invariant representations tailored for robust deep learning-based watermarking.


Frequency-domain alignment of heterogeneous, multidimensional separations data through complex orthogonal Procrustes analysis

arXiv.org Artificial Intelligence

Multidimensional separations data have the capacity to reveal detailed information about complex biological samples. However, data analysis has been an ongoing challenge in the area since the peaks that represent chemical factors may drift over the course of several analytical runs along the first and second dimension retention times. This makes higher-level analyses of the data difficult, since a 1-1 comparison of samples is seldom possible without sophisticated pre-processing routines. Further complicating the issue is the fact that closely co-eluting components will need to be resolved, typically using some variants of Parallel Factor Analysis (PARAFAC), Multivariate Curve Resolution (MCR), or the recently explored Shift-Invariant Multi-linearity. These algorithms work with a user-specified number of components, and regions of interest that are then summarized as a peak table that is invariant to shift. However, identifying regions of interest across truly heterogeneous data remains an ongoing issue, for automated deployment of these algorithms. This work offers a very simple solution to the alignment problem through a orthogonal Procrustes analysis of the frequency-domain representation of synthetic multidimensional separations data, for peaks that are logarithmically transformed to simulate shift while preserving the underlying topology of the data. Using this very simple method for analysis, two synthetic chromatograms can be compared under close to the worst possible scenarios for alignment.


Assessing Image Quality Using a Simple Generative Representation

arXiv.org Artificial Intelligence

Perceptual image quality assessment (IQA) is the task of predicting the visual quality of an image as perceived by a human observer. Current state-of-the-art techniques are based on deep representations trained in discriminative manner. Such representations may ignore visually important features, if they are not predictive of class labels. Recent generative models successfully learn low-dimensional representations using auto-encoding and have been argued to preserve better visual features. Here we leverage existing auto-encoders and propose VAE-QA, a simple and efficient method for predicting image quality in the presence of a full-reference. We evaluate our approach on four standard benchmarks and find that it significantly improves generalization across datasets, has fewer trainable parameters, a smaller memory footprint and faster run time.


2AFC Prompting of Large Multimodal Models for Image Quality Assessment

arXiv.org Artificial Intelligence

While abundant research has been conducted on improving high-level visual understanding and reasoning capabilities of large multimodal models~(LMMs), their visual quality assessment~(IQA) ability has been relatively under-explored. Here we take initial steps towards this goal by employing the two-alternative forced choice~(2AFC) prompting, as 2AFC is widely regarded as the most reliable way of collecting human opinions of visual quality. Subsequently, the global quality score of each image estimated by a particular LMM can be efficiently aggregated using the maximum a posterior estimation. Meanwhile, we introduce three evaluation criteria: consistency, accuracy, and correlation, to provide comprehensive quantifications and deeper insights into the IQA capability of five LMMs. Extensive experiments show that existing LMMs exhibit remarkable IQA ability on coarse-grained quality comparison, but there is room for improvement on fine-grained quality discrimination. The proposed dataset sheds light on the future development of IQA models based on LMMs. The codes will be made publicly available at https://github.com/h4nwei/2AFC-LMMs.


Progressive Feature Fusion Network for Enhancing Image Quality Assessment

arXiv.org Artificial Intelligence

Image compression has been applied in the fields of image storage and video broadcasting. However, it's formidably tough to distinguish the subtle quality differences between those distorted images generated by different algorithms. In this paper, we propose a new image quality assessment framework to decide which image is better in an image group. To capture the subtle differences, a fine-grained network is adopted to acquire multi-scale features. Subsequently, we design a cross subtract block for separating and gathering the information within positive and negative image pairs. Enabling image comparison in feature space. After that, a progressive feature fusion block is designed, which fuses multi-scale features in a novel progressive way. Hierarchical spatial 2D features can thus be processed gradually. Experimental results show that compared with the current mainstream image quality assessment methods, the proposed network can achieve more accurate image quality assessment and ranks second in the benchmark of CLIC in the image perceptual model track.