Goto

Collaborating Authors

 laion


Into the LAION's Den: Investigating Hate in Multimodal Datasets

Neural Information Processing Systems

While the impact of model scaling has been extensively studied, we are only beginning to scratch the surface of data scaling and its consequences. This is especially of critical importance in the context of vision-language datasets such as LAION. These datasets are continually growing in size and are built based on large-scale internet dumps such as the Common Crawl, which is known to have numerous drawbacks ranging from quality, legality, and content. The datasets then serve as the backbone for large generative models, contributing to the operationalization and perpetuation of harmful societal and historical biases and stereotypes. In this paper, we investigate the effect of scaling datasets on hateful content through a comparative audit of two datasets: LAION-400M and LAION-2B.


Into the LAION's Den: Investigating Hate in Multimodal Datasets

Neural Information Processing Systems

Scale the model, scale the data, scale the compute' is the reigning sentiment in the world of generative AI today. While the impact of model scaling has been extensively studied, we are only beginning to scratch the surface of data scaling and its consequences. This is especially of critical importance in the context of vision-language datasets such as LAION. These datasets are continually growing in size and are built based on large-scale internet dumps such as the Common Crawl, which is known to have numerous drawbacks ranging from quality, legality, and content. The datasets then serve as the backbone for large generative models, contributing to the operationalization and perpetuation of harmful societal and historical biases and stereotypes.


Topological Perspectives on Optimal Multimodal Embedding Spaces

B, Abdul Aziz A., Rahim, A. B Abdul

arXiv.org Artificial Intelligence

Recent strides in multimodal model development have ignited a paradigm shift in the realm of text-to-image generation. Among these advancements, CLIP stands out as a remarkable achievement which is a sophisticated autoencoder adept at encoding both textual and visual information within a unified latent space. This paper delves into a comparative analysis between CLIP and its recent counterpart, CLOOB. To unravel the intricate distinctions within the embedding spaces crafted by these models, we employ topological data analysis. Our approach encompasses a comprehensive examination of the modality gap drivers, the clustering structures existing across both high and low dimensions, and the pivotal role that dimension collapse plays in shaping their respective embedding spaces. Empirical experiments substantiate the implications of our analyses on downstream performance across various contextual scenarios. Through this investigation, we aim to shed light on the nuanced intricacies that underlie the comparative efficacy of CLIP and CLOOB, offering insights into their respective strengths and weaknesses, and providing a foundation for further refinement and advancement in multimodal model research.


Can AI image generators be policed to prevent explicit deepfakes of children?

The Guardian

Child abusers are creating AI-generated "deepfakes" of their targets in order to blackmail them into filming their own abuse, beginning a cycle of sextortion that can last for years. Creating simulated child abuse imagery is illegal in the UK, and Labour and the Conservatives have aligned on the desire to ban all explicit AI-generated images of real people. But there is little global agreement on how the technology should be policed. Worse, no matter how strongly governments take action, the creation of more images will always be a press of a button away – explicit imagery is built into the foundations of AI image generation. In December, researchers at Stanford University made a disturbing discovery: buried among the billions of images making up one of the largest training sets for AI image generators was hundreds, maybe thousands, of instances of child sexual abuse material (CSAM).


AI image generators trained on pictures of child sexual abuse, study finds

The Guardian

Hidden inside the foundation of popular artificial intelligence (AI) image generators are thousands of images of child sexual abuse, according to new research published on Wednesday. The operators of some of the largest and most-used sets of images utilized to train AI shut off access to them in response to the study. The Stanford Internet Observatory found more than 3,200 images of suspected child sexual abuse in the giant AI database LAION, an index of online images and captions that's been used to train leading AI image-makers such as Stable Diffusion. The watchdog group based at Stanford University worked with the Canadian Centre for Child Protection and other anti-abuse charities to identify the illegal material and report the original photo links to law enforcement. More than 1,000 of the suspected images were confirmed as child sexual abuse material.


See inside the stereotyping machines pushing American bias across the internet

Washington Post - Technology News

Artificial intelligence image tools have a tendency to spin up disturbing clichés: Asian women are hypersexual. These stereotypes don't reflect the real world; they stem from the data that trains the technology. Grabbed from the internet, these troves can be toxic -- rife with pornography, misogyny, violence and bigotry. Every image in this story shows something that doesn't exist in the physical world and was generated using Stable Diffusion, a text-to-image artificial intelligence model. Stability AI, maker of the popular image generator Stable Diffusion XL, told The Washington Post it had made a significant investment in reducing bias in its latest model, which was released in July.


EU urged to protect grassroots AI research or risk losing out to US

The Guardian

The EU has been warned that it risks handing control of artificial intelligence to US tech firms if it does not act to protect grassroots research in its forthcoming AI bill. In an open letter coordinated by the German research group Laion, or Large-scale AI Open Network, the European parliament was told that "one-size-fits-all" rules risked eliminating open research and development. "Rules that require a researcher or developer to monitor or control downstream use could make it impossible to release open-source AI in Europe," which would "entrench large firms" and "hamper efforts to improve transparency, reduce competition, limit academic freedom, and drive investment in AI overseas", the letter says. It adds: "Europe cannot afford to lose AI sovereignty. Eliminating open-source R&D will leave the European scientific community and economy critically dependent on a handful of foreign and proprietary firms for essential AI infrastructure."


The future of AI relies on a high schoolteacher's free database

The Japan Times

In front of a suburban house on the outskirts of the northern Germany city of Hamburg, a single word -- "LAION" -- is scrawled in pencil across a mailbox. It's the only indication that the home belongs to the person behind a massive data gathering effort central to the artificial intelligence boom that has seized the world's attention. That person is high schoolteacher Christoph Schuhmann, and LAION, short for "Large-scale AI Open Network," is his passion project. When Schuhmann isn't teaching physics and computer science to German teens, he works with a small team of volunteers building the world's biggest free AI training data set, which has already been used in text-to-image generators such as Google's Imagen and Stable Diffusion. Databases like LAION are central to AI text-to-image generators, which rely on them for the enormous amounts of visual material used to deconstruct and create new images.


OpenCLIP for Image Search and Automatic Captioning

#artificialintelligence

I have been using and writing about OpenAI's CLIP system since it came out in 2021 [1]. It consists of image and text encoding models that can be used for various forms of cross-modal comparison, like using a text query to find the best matching image in a library quickly. In December 2022, an independent group of researchers known as LAION released a paper called "Reproducible scaling laws for contrastive language-image learning" [2] that describes how they first reimplemented and trained a model similar to CLIP and then experimented with improving the system by training with a larger dataset and using new ML techniques. They call their new model OpenCLIP. In this article, I will provide some background info on the original CLIP, describe how LAION improved the model, and show some results from my experiments with the two systems using images from the Library of Congress's Flickr photostream.


A ChatGPT Alternative Is Now Available As Open Source

#artificialintelligence

What will RLHF-based PaLM apps be able to accomplish? With the model's expanding scale, performance across all activities keeps becoming better, creating more opportunities. Up to 540 billion parameters can be used with PaLM. Comparatively, GPT-3 only has about 175. The first open source ChatGPT equivalent appears to have appeared.