Goto

Collaborating Authors

 Generative AI


Testing OpenAI's whisper with a Scottish accent

#artificialintelligence

OpenAI's recent release of Whisper boasts human-level robustness and accuracy in speech recognition. I'm not Scottish (although I was born pretty close), but I immediately wanted to test it with a Scottish accent and compare it to "human-level". Having bought an unexciting new iPhone, at least I could put its A16 Bionic chip with 16-core Neural Engine through its paces for my experiment. Once the boring tech stuff was out of the way, I shared the test app on TestFlight with a few colleagues, yielding much amusement with its borderline magical results. Here's a little clip from the start of Trainspotting, which is particularly challenging for machines to understand; a Scottish accent over the top of Iggy Pop isn't something you'd train for.


Artyficial intelligence: what does creative AI mean for marketers? - Raconteur

#artificialintelligence

It wasn't meant to happen like this. Yes, the robots were always going to come for everyone's jobs, but it was the menial ones that were set to go first. Freed from the need to fill out spreadsheets and perform administrative duties, we were all supposed to have extra time to indulge in more creative, fulfilling pursuits. Yet Microsoft Excel still exists while AI algorithms are producing works of art that are both commercially viable and critically respected. An AI artist, Jason Allen, recently caused outrage among old-school digital artists by winning a digital art competition. One of the writers of US publication The Atlantic, Charlie Warzel, provoked the ire of illustrators around the world by choosing to adorn an article about controversial radio host Alex Jones with an AI-generated caricature as opposed to using a stock photo or commissioning a portrait.


Ultra-Large AI Models Are Over

#artificialintelligence

I don't mean'over' as in "you won't see a new large AI model ever again" but as in "AI companies have reasons to not pursue them as a core research goal--indefinitely." This article isn't a critique of the past years--even if I don't buy the "scale is all you need" argument, I acknowledge just how far scaling has advanced the field. Parallelism can be drawn between the 2020-2022 scaling race and--keeping the distance--the 50s-70s space race. Both advanced science significantly as a byproduct of other intentions. While space exploration was innovative in nature, the quest for novelty isn't present in the "bigger is better" AI trend: To conquer space, the US and USSR had to design novel paths toward a clear goal. In contrast, AI companies have blindly followed a predefined path without knowing why or whether it'd lead us anywhere. You can't put the cart before the horse.


LAION-5B: An open large-scale dataset for training next generation image-text models

arXiv.org Artificial Intelligence

Groundbreaking language-vision architectures like CLIP and DALL-E proved the utility of training on large amounts of noisy image-text data, without relying on expensive accurate labels used in standard vision unimodal supervised learning. The resulting models showed capabilities of strong text-guided image generation and transfer to downstream tasks, while performing remarkably at zero-shot classification with noteworthy out-of-distribution robustness. Since then, large-scale language-vision models like ALIGN, BASIC, GLIDE, Flamingo and Imagen made further improvements. Studying the training and capabilities of such models requires datasets containing billions of image-text pairs. Until now, no datasets of this size have been made openly available for the broader research community. To address this problem and democratize research on large-scale multi-modal models, we present LAION-5B - a dataset consisting of 5.85 billion CLIP-filtered image-text pairs, of which 2.32B contain English language. We show successful replication and fine-tuning of foundational models like CLIP, GLIDE and Stable Diffusion using the dataset, and discuss further experiments enabled with an openly available dataset of this scale. Additionally we provide several nearest neighbor indices, an improved web-interface for dataset exploration and subset generation, and detection scores for watermark, NSFW, and toxic content detection. Announcement page https://laion.ai/laion-5b-a-new-era-of-open-large-scale-multi-modal-datasets/


AI Sentience: I asked OpenAI about Google Lamda and Blake Lemoine

#artificialintelligence

The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly. Human: Hello, who are you? AI: I am an AI created by OpenAI. How can I help you today?


Microsoft brings DALL-E 2 to the masses with Designer and Image Creator

#artificialintelligence

Microsoft is making a major investment in DALL-E 2, OpenAI's AI-powered system that generates images from text, by bringing it to first-party apps and services. During its Ignite conference this week, Microsoft announced that it's integrating DALL-E 2 with the newly announced Microsoft Designer app and Image Creator tool in Bing and Microsoft Edge. With the advent of DALL-E 2 and open source alternatives like Stable Diffusion in recent years, AI image generators have exploded in popularity. In September, OpenAI said that more than 1.5 million users were actively creating over 2 million images a day with DALL-E 2, including artists, creative directors and authors. Brands such as Stitch Fix, Nestlรฉ and Heinz have piloted DALL-E 2 for ad campaigns and other commercial use cases, while certain architectural firms have used DALL-E 2 and tools akin to it to conceptualize new buildings.


Photographer Creates AI Girlfriend to Stave Off Nosy Relatives

#artificialintelligence

Unmesh Dinda from PiXimperfect has displayed the awesome power of artificially intelligent (AI) photo editing by creating a girlfriend that doesn't exist. Dinda's convincing selfie of a loved-up couple on a city break even has extremely realistic lighting and shadows that fit perfectly within the photo. The only catch: Dinda is the only real human in the photo and the woman was created through the power of AI. "If your relatives are more concerned about you getting married than you are, you need to send them a photo like this. This will keep them wondering for a while," Dinda says on his YouTube video. Last month, DALL-E announced that it will allow users to edit images with human faces after previously banning the practice.


Unconditional Image-Text Pair Generation with Multimodal Cross Quantizer

arXiv.org Artificial Intelligence

Although deep generative models have gained a lot of attention, most of the existing works are designed for unimodal generation. In this paper, we explore a new method for unconditional image-text pair generation. We design Multimodal Cross-Quantization VAE (MXQ-VAE), a novel vector quantizer for joint image-text representations, with which we discover that a joint image-text representation space is effective for semantically consistent image-text pair generation. To learn a multimodal semantic correlation in a quantized space, we combine VQ-VAE with a Transformer encoder and apply an input masking strategy. Specifically, MXQ-VAE accepts a masked image-text pair as input and learns a quantized joint representation space, so that the input can be converted to a unified code sequence, then we perform unconditional image-text pair generation with the code sequence. Extensive experiments show the correlation between the quantized joint space and the multimodal generation capability on synthetic and real-world datasets. In addition, we demonstrate the superiority of our approach in these two aspects over several baselines. The source code is publicly available at: https://github.com/ttumyche/MXQ-VAE.


๐Ÿ“ ๐Ÿ“บ Edge#234: Inside Meta AI's Make-A-Video

#artificialintelligence

On Thursdays, we dive deep into one of the freshest research papers or technology frameworks that is worth your attention. Our goal is to keep you up to date with new developments in AI to complement the concepts we debate in other editions of our newsletter. Text-to-Video (T2V) is considered the next frontier for generative artificial intelligence (AI) models. While the text-to-image (T2I) space is experiencing a revolution with models like DALL-E, Stable Diffusion, and Midjouney, T2V still remains a monumental challenge. Recently, researchers from Meta AI unveiled Make-A-Video, a T2V model able to create realistic short video clips from textual inputs.


AI-generated imagery is the new clip art as Microsoft adds DALL-E to its Office suite

#artificialintelligence

Microsoft doesn't say whether its Designer app can generate images of people, for example. The company says OpenAI has filtered "explicit sexual and violent content from the dataset used to train the model" and that it's also "deployed filters to limit generation of images that violate content policy" and "additional query blocking on sensitive topics." But, such filters are always permeable, and the tools could still be used to generate troubling imagery -- from NSFW creations to offensive or insensitive content.