real image
Image of Thai police in sparkly dresses with handcuffed suspect turns out to be AI fake
The real image, which the police station has since shared, shows the officers in normal clothes and no female officer in the picture at all. The real image, which the police station has since shared, shows the officers in normal clothes and no female officer in the picture at all. Picture was created by administrator in charge of station's Facebook account who wanted to create'friendlier image' It was an arresting image and an irresistible story. A group of tough Thai police officers - five men and one woman - all wearing elaborate festival-style dresses, surrounding a drug dealer they had caught while undercover. The image, released by local police, was so compelling that it found its way on to the front page of the UK's Daily Star, as well as in picture stories in the Telegraph, the Sun and the New York Post. The Sun wrote: "The burly crew of five men and one woman slipped into skin tight sequins and feathers for the covert mission in Thailand ."
GenImage: AMillion-Scale Benchmark for Detecting AI-Generated Image
The extraordinary ability of generative models to generate photographic images has intensified concerns about the spread of disinformation, thereby leading to the demand for detectors capable of distinguishing between AI-generated fake images and real images. However, the lack of large datasets containing images from the most advanced image generators poses an obstacle to the development of such detectors. In this paper, we introduce the GenImage dataset, which has the following advantages: 1) Plenty of Images, including over one million pairs of AI-generated fake images and collected real images.
Appendix - An Image is Worth More Than a Thousand Words: Towards Disentanglement in The Wild Table of Contents
We use the images at 256 256resolution. We follow [21] and use all the images for training. The images used for the qualitative visualizations contain random images from the web and samples from CelebA-HQ. AFHQ [8] 15,000high quality images categorized into three domains: cat, dog and wildlife. We use the images at 128 128 resolution, holding out 500 images from each domain for testing.
Depth Anything V2
This work presents Depth Anything V2. Without pursuing fancy techniques, we aim to reveal crucial findings to pave the way towards building a powerful monocular depth estimation model. Notably, compared with V1, this version produces much finer and more robust depth predictions through three key practices: 1) replacing all labeled real images with synthetic images, 2) scaling up the capacity of our teacher model, and 3) teaching student models via the bridge of large-scale pseudo-labeled real images. Compared with the latest models built on Stable Diffusion, our models are significantly more efficient (more than 10x faster) and more accurate. We offer models of different scales (ranging from 25M to 1.3B params) to support extensive scenarios. Benefiting from their strong generalization capability, we fine-tune them with metric depth labels to obtain our metric depth models. In addition to our models, considering the limited diversity and frequent noise in current test sets, we construct a versatile evaluation benchmark with sparse depth annotations to facilitate future research.
General Articulated Objects Manipulation in Real Images via Part-Aware Diffusion Process
Articulated object manipulation in real images is a fundamental step in computer and robotic vision tasks. Recently, several image editing methods based on diffusion models have been proposed to manipulate articulated objects according to text prompts. However, these methods often generate weird artifacts or even fail in real images. To this end, we introduce the Part-Aware Diffusion Model to approach the manipulation of articulated objects in real images. First, we develop Abstract 3D Models to represent and manipulate articulated objects efficiently. Then we propose dynamic feature maps to transfer the appearance of objects from input images to edited ones, meanwhile generating the novel-appearing parts reasonably. Extensive experiments are provided to illustrate the advanced manipulation capabilities of our method concerning state-of-the-art editing works. Additionally, we verify our method on 3D articulated object understanding forembodied robot scenarios and the promising results prove that our method supports this task strongly. The project page is https://mvig-rhos.com/pa_diffusion.
MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild
This paper addresses the problem of 3D human pose estimation in the wild. A significant challenge is the lack of training data, i.e., 2D images of humans annotated with 3D poses. Such data is necessary to train state-of-the-art CNN architectures. Here, we propose a solution to generate a large set of photorealistic synthetic images of humans with 3D pose annotations. We introduce an image-based synthesis engine that artificially augments a dataset of real images with 2D human pose annotations using 3D Motion Capture (MoCap) data.