Goto

Collaborating Authors

 eyeglass


Type-to-Track: Retrieve Any Object via Prompt-based Tracking Supplementary Appendix 1 Dataset Taxonomy nmsyndefcapretr

Neural Information Processing Systems

We introduce two new evaluation scenarios cap and retr so that they are more specific on the object level than on the category level. It is because defining objects by category synonyms and category names and definition is insufficient to describe them accurately, leading to ambiguous results. The benchmarking sets can provide more accurate and meaningful evaluations of multiple object retrieval methods by focusing on the object level. We include a comprehensive taxonomy of prompt types used to construct our settings. However, the retr setting on the MOT17 could not be constructed because test annotations for this dataset are unavailable. To construct this setting, bounding boxes will be filtered to the corresponding retrieval prompt when it changes. Section 2 describes how to construct this retrieval prompt .



Efficient Few-shot Identity Preserving Attribute Editing for 3D-aware Deep Generative Models

arXiv.org Artificial Intelligence

Identity preserving editing of faces is a generative task that enables modifying the illumination, adding/removing eyeglasses, face aging, editing hairstyles, modifying expression etc., while preserving the identity of the face. Recent progress in 2D generative models have enabled photorealistic editing of faces using simple techniques leveraging the compositionality in GANs. However, identity preserving editing for 3D faces with a given set of attributes is a challenging task as the generative model must reason about view consistency from multiple poses and render a realistic 3D face. Further, 3D portrait editing requires large-scale attribute labelled datasets and presents a trade-off between editability in low-resolution and inflexibility to editing in high resolution. In this work, we aim to alleviate some of the constraints in editing 3D faces by identifying latent space directions that correspond to photorealistic edits. To address this, we present a method that builds on recent advancements in 3D-aware deep generative models and 2D portrait editing techniques to perform efficient few-shot identity preserving attribute editing for 3D-aware generative models. We aim to show from experimental results that using just ten or fewer labelled images of an attribute is sufficient to estimate edit directions in the latent space that correspond to 3D-aware attribute editing. In this work, we leverage an existing face dataset with masks to obtain the synthetic images for few attribute examples required for estimating the edit directions. Further, to demonstrate the linearity of edits, we investigate one-shot stylization by performing sequential editing and use the (2D) Attribute Style Manipulation (ASM) technique to investigate a continuous style manifold for 3D consistent identity preserving face aging. Code and results are available at: https://vishal-vinod.github.io/gmpi-edit/


Semantic Token Reweighting for Interpretable and Controllable Text Embeddings in CLIP

arXiv.org Artificial Intelligence

A text encoder within Vision-Language Models (VLMs) like CLIP plays a crucial role in translating textual input into an embedding space shared with images, thereby facilitating the interpretative analysis of vision tasks through natural language. Despite the varying significance of different textual elements within a sentence depending on the context, efforts to account for variation of importance in constructing text embeddings have been lacking. We propose a framework of Semantic Token Reweighting to build Interpretable text embeddings (SToRI), which incorporates controllability as well. SToRI refines the text encoding process in CLIP by differentially weighting semantic elements based on contextual importance, enabling finer control over emphasis responsive to data-driven insights and user preferences. The efficacy of SToRI is demonstrated through comprehensive experiments on few-shot image classification and image retrieval tailored to user preferences.


Reproducibility Study of "ITI-GEN: Inclusive Text-to-Image Generation"

arXiv.org Artificial Intelligence

Text-to-image generative models often present issues regarding fairness with respect to certain sensitive attributes, such as gender or skin tone. This study aims to reproduce the results presented in "ITI-Gen: Inclusive Text-to-Image Generation" by Zhang et al. (2023a), which introduces a model to improve inclusiveness in these kinds of models. We show that most of the claims made by the authors about ITI-Gen hold: it improves the diversity and quality of generated images, it is scalable to different domains, it has plug-and-play capabilities, and it is efficient from a computational point of view. However, ITI-Gen sometimes uses undesired attributes as proxy features and it is unable to disentangle some pairs of (correlated) attributes such as gender and baldness. In addition, when the number of considered attributes increases, the training time grows exponentially and ITI-Gen struggles to generate inclusive images for all elements in the joint distribution. To solve these issues, we propose using Hard Prompt Search with negative prompting, a method that does not require training and that handles negation better than vanilla Hard Prompt Search. Nonetheless, Hard Prompt Search (with or without negative prompting) cannot be used for continuous attributes that are hard to express in natural language, an area where ITI-Gen excels as it is guided by images during training. Finally, we propose combining ITI-Gen and Hard Prompt Search with negative prompting.


Has Great Potential! Meet Your A.I. Realtor

The New Yorker

The spectre of artificial intelligence is worrying lots of workers, but one office is welcoming it with open arms and an apple pie in the oven. "There are many people who, at 2 a.m., are on their phones, looking at what's on the market," Fredrik Eklund, of the real-estate agency the Eklund Gomes Team, said the other day. He sat in the reception area of his Flatiron office wearing a pale-pink blazer, jeans, and thick black-framed eyeglasses. "Now they can talk to Maya. Her shop is open 24/7, and she is always there."


SC2GAN: Rethinking Entanglement by Self-correcting Correlated GAN Space

arXiv.org Artificial Intelligence

Generative Adversarial Networks (GANs) can synthesize realistic images, with the learned latent space shown to encode rich semantic information with various interpretable directions. However, due to the unstructured nature of the learned latent space, it inherits the bias from the training data where specific groups of visual attributes that are not causally related tend to appear together, a phenomenon also known as spurious correlations, e.g., age and eyeglasses or women and lipsticks. Consequently, the learned distribution often lacks the proper modelling of the missing examples. The interpolation following editing directions for one attribute could result in entangled changes with other attributes. To address this problem, previous works typically adjust the learned directions to minimize the changes in other attributes, yet they still fail on strongly correlated features. In this work, we study the entanglement issue in both the training data and the learned latent space for the StyleGAN2-FFHQ model. We propose a novel framework SC$^2$GAN that achieves disentanglement by re-projecting low-density latent code samples in the original latent space and correcting the editing directions based on both the high-density and low-density regions. By leveraging the original meaningful directions and semantic region-specific layers, our framework interpolates the original latent codes to generate images with attribute combination that appears infrequently, then inverts these samples back to the original latent space. We apply our framework to pre-existing methods that learn meaningful latent directions and showcase its strong capability to disentangle the attributes with small amounts of low-density region samples added.


ITI-GEN: Inclusive Text-to-Image Generation

arXiv.org Artificial Intelligence

Text-to-image generative models often reflect the biases of the training data, leading to unequal representations of underrepresented groups. This study investigates inclusive text-to-image generative models that generate images based on human-written prompts and ensure the resulting images are uniformly distributed across attributes of interest. Unfortunately, directly expressing the desired attributes in the prompt often leads to sub-optimal results due to linguistic ambiguity or model misrepresentation. Hence, this paper proposes a drastically different approach that adheres to the maxim that "a picture is worth a thousand words". We show that, for some attributes, images can represent concepts more expressively than text. For instance, categories of skin tones are typically hard to specify by text but can be easily represented by example images. Building upon these insights, we propose a novel approach, ITI-GEN, that leverages readily available reference images for Inclusive Text-to-Image GENeration. The key idea is learning a set of prompt embeddings to generate images that can effectively represent all desired attribute categories. More importantly, ITI-GEN requires no model fine-tuning, making it computationally efficient to augment existing text-to-image models. Extensive experiments demonstrate that ITI-GEN largely improves over state-of-the-art models to generate inclusive images from a prompt. Project page: https://czhang0528.github.io/iti-gen.


Class Attribute Inference Attacks: Inferring Sensitive Class Information by Diffusion-Based Attribute Manipulations

arXiv.org Artificial Intelligence

Neural network-based image classifiers are powerful tools for computer vision tasks, but they inadvertently reveal sensitive attribute information about their classes, raising concerns about their privacy. To investigate this privacy leakage, we introduce the first Class Attribute Inference Attack (CAIA), which leverages recent advances in text-to-image synthesis to infer sensitive attributes of individual classes in a black-box setting, while remaining competitive with related white-box attacks. Our extensive experiments in the face recognition domain show that CAIA can accurately infer undisclosed sensitive attributes, such as an individual's hair color, gender, and racial appearance, which are not part of the training labels. Interestingly, we demonstrate that adversarial robust models are even more vulnerable to such privacy leakage than standard models, indicating that a trade-off between robustness and privacy exists.


Wearing glasses makes people appear LESS intelligent, surprising study claims

Daily Mail - Science & tech

From the title character in Napoleon Dynamite to McLovin in Superbad, stereotypical'nerds' are often depicted wearing glasses. But a new study suggests that if you want people to think you're intelligent, you should consider swapping your glasses for contact lenses. Researchers from the University of Jordan found that people are seen as less attractive, less confident, and less intelligent when wearing glasses. A new study suggests that if you want people to think you're intelligent, you should consider swapping your glasses for contact lenses (stock image) From the title character in Napoleon Dynamite (pictured) to McLovin in Superbad, stereotypical'nerds' are often depicted wearing glasses Children as young as five perceive thinner people as happier and more attractive than overweight people, a study has revealed. Researchers from the University of Gdańsk showed preschool boys and girls images of men and women with various body types, and asked them to rate who was the most attractive and happiest.