Goto

Collaborating Authors

 Generative AI


Improving dermatology classifiers across populations using images generated by large diffusion models

arXiv.org Artificial Intelligence

Dermatological classification algorithms developed without sufficiently diverse training data may generalize poorly across populations. While intentional data collection and annotation offer the best means for improving representation, new computational approaches for generating training data may also aid in mitigating the effects of sampling bias. In this paper, we show that DALL$\cdot$E 2, a large-scale text-to-image diffusion model, can produce photorealistic images of skin disease across skin types. Using the Fitzpatrick 17k dataset as a benchmark, we demonstrate that augmenting training data with DALL$\cdot$E 2-generated synthetic images improves classification of skin disease overall and especially for underrepresented groups.


Schr\"{o}dinger's Bat: Diffusion Models Sometimes Generate Polysemous Words in Superposition

arXiv.org Artificial Intelligence

Recent work has shown that despite their impressive capabilities, text-to-image diffusion models such as DALL-E 2 (Ramesh et al., 2022) can display strange behaviours when a prompt contains a word with multiple possible meanings, often generating images containing both senses of the word (Rassin et al., 2022). In this work we seek to put forward a possible explanation of this phenomenon. Using the similar Stable Diffusion model (Rombach et al., 2022), we first show that when given an input that is the sum of encodings of two distinct words, the model can produce an image containing both concepts represented in the sum. We then demonstrate that the CLIP encoder used to encode prompts (Radford et al., 2021) encodes polysemous words as a superposition of meanings, and that using linear algebraic techniques we can edit these representations to influence the senses represented in the generated images. Combining these two findings, we suggest that the homonym duplication phenomenon described by Rassin et al. (2022) is caused by diffusion models producing images representing both of the meanings that are present in superposition in the encoding of a polysemous word.


Robustness Analysis of Deep Learning Models for Population Synthesis

arXiv.org Artificial Intelligence

Deep generative models have become useful for synthetic data generation, particularly population synthesis. The models implicitly learn the probability distribution of a dataset and can draw samples from a distribution. Several models have been proposed, but their performance is only tested on a single cross-sectional sample. The implementation of population synthesis on single datasets is seen as a drawback that needs further studies to explore the robustness of the models on multiple datasets. While comparing with the real data can increase trust and interpretability of the models, techniques to evaluate deep generative models' robustness for population synthesis remain underexplored. In this study, we present bootstrap confidence interval for the deep generative models, an approach that computes efficient confidence intervals for mean errors predictions to evaluate the robustness of the models to multiple datasets. Specifically, we adopt the tabular-based Composite Travel Generative Adversarial Network (CTGAN) and Variational Autoencoder (VAE), to estimate the distribution of the population, by generating agents that have tabular data using several samples over time from the same study area. The models are implemented on multiple travel diaries of Montreal Origin- Destination Survey of 2008, 2013, and 2018 and compare the predictive performance under varying sample sizes from multiple surveys. Results show that the predictive errors of CTGAN have narrower confidence intervals indicating its robustness to multiple datasets of the varying sample sizes when compared to VAE. Again, the evaluation of model robustness against varying sample size shows a minimal decrease in model performance with decrease in sample size. This study directly supports agent-based modelling by enabling finer synthetic generation of populations in a reliable environment.


Can you match the car with the director who 'designed' it? AI creates vehicles reflecting filmmakers

Daily Mail - Science & tech

If the world's most famous film directors designed a car, what would it look like? Thanks to artificial intelligence, we now have the answer. A car scrap collection firm based in the UK used DALL-E-2 - an AI platform that creates images based on text inputs - to imagine what cars designed by Wes Anderson, Christopher Nolan, Alfred Hitchcock and James Cameron, among others, would look like. The results show how powerful this particular artificial intelligence system, which was built by OpenAI, can be when given this type of assignment. British-born director Christopher Nolan is recognized for his work on the Batman trilogy as well as films like Interstellar, which explored many quantum physics themes, and the World War II film Dunkirk.


Artificial Intelligence in art underlines deeper implications for workers

#artificialintelligence

Recently, a man used the AI image generator Midjourney to enter a fine arts contest for the Colorado State Fair under the "Digital Arts/Digitally-Manipulated Photography" category. His piece won the top prize, sparking conversation from artists about the validity of AI in art. Many have flocked to AIs like Midjourney and DALL-E 2, which are designed to create illustrations from simple, one-sentence prompts, for their low-effort input and high-quality output. DALL-E 2 can "create realistic images and art from a description in natural language." The program, along with Midjourney and other image generation AI like Stable Diffusion, dominated internet searches for months.


The Impact of Generative AI on the Future of Visual Content Marketing

arXiv.org Artificial Intelligence

In today's world of marketing, it is necessary to have visually appealing content. Visual material has become an essential area of focus for every company as a result of the widespread availability of gadgets for mass communication and extended visual advancements. Similarly, artificial intelligence is also gaining ground and it is proving to be the most revolutionary technological advancement thus far. The integration of visual content with artificial intelligence is the key to acquiring and retaining loyal customers; its absence from the overarching marketing strategy of any production raises a red flag that could ultimately result in a smaller market share for that company.


Human Evaluation of Text-to-Image Models on a Multi-Task Benchmark

arXiv.org Artificial Intelligence

We provide a new multi-task benchmark for evaluating text-to-image models. We perform a human evaluation comparing the most common open-source (Stable Diffusion) and commercial (DALL-E 2) models. Twenty computer science AI graduate students evaluated the two models, on three tasks, at three difficulty levels, across ten prompts each, providing 3,600 ratings. Text-to-image generation has seen rapid progress to the point that many recent models have demonstrated their ability to create realistic high-resolution images for various prompts. However, current text-to-image methods and the broader body of research in vision-language understanding still struggle with intricate text prompts that contain many objects with multiple attributes and relationships. We introduce a new text-to-image benchmark that contains a suite of thirty-two tasks over multiple applications that capture a model's ability to handle different features of a text prompt. For example, asking a model to generate a varying number of the same object to measure its ability to count or providing a text prompt with several objects that each have a different attribute to identify its ability to match objects and attributes correctly. Rather than subjectively evaluating text-to-image results on a set of prompts, our new multi-task benchmark consists of challenge tasks at three difficulty levels (easy, medium, and hard) and human ratings for each generated image.


La veille de la cybersécurité

#artificialintelligence

GitHub Copilot dubs itself as an "AI pair programmer" for software developers, automatically suggesting code in real time. According to GitHub, Copilot is "powered by Codex, a generative pretrained AI model created by OpenAI" and has been trained on "natural language text and source code from publicly available sources, including code in public repositories on GitHub." However, a class-action lawsuit filed against GitHub Copilot, its parent company Microsoft, and OpenAI claims open-source software piracy and violations of open-source licenses. "The spirit of open source is not just a space where people want to keep it open," says Sal Kimmich, an open-source developer advocate at Sonatype, machine learning engineer, and open source contributor and maintainer. "We have developed processes in order to keep open source secure, and that requires traceability, observability, and verification. Copilot is obscuring the original provenance of those [code] snippets."


Filling the internet with AI-created images will harm future AIs

New Scientist

The popularity of text-to-image artificial intelligences could be their own downfall – if the pictures they produce proliferate too much, they could contaminate the data sets that new models are trained on, harming performance. AI tools like DALL-E 2, Midjourney and Stable Diffusion can create pictures of whatever people request, and their images are increasingly being shared online.


[AI Week] How To Think In The Third Millennium

#artificialintelligence

AI Week at Betaworks begins Monday, 11/28/22 Betaworks is hosting a week of events & discussions to explore the new creative frontiers opened up by generative AI. We've witnessed unprecedented developments in how artists & engineers are using generative AI to add creative agency to how we work & play. Over the course of 4 days we will be showcasing insights & projects from an exciting roster of builders & researchers. About this Event: Over the last two thousand years, text has inexplicably become the most ubiquitous & the least ergonomic user interface element in the world. We've moved from hand-scribed writing to software word processing, but the underlying material of knowledge is the same - a trail of cryptic characters, lined up next to each other.