Goto

Collaborating Authors

 Generative AI


April 20: OpenAI's DALL-E

#artificialintelligence

Artificial intelligence research group OpenAI (co-founded by Elon Musk, among others) has created DALL-E 2, a text-to-image generation program. The system takes a description written by a user and produces an image. The demo video is worth a watch (2 min). Here's a DALL-E 2 generated image from the description "Shiba Inu dog wearing a beret and black turtleneck." Bonus: DALL-E comes from "Salvador Dalí" combined with "WALL-E".



A perfect Illustration with DALL·E.

#artificialintelligence

I daily get requests (since not everybody still has DALL·E access) to create images with specific prompts. Q. Can DALL-E create an illustration to a text if you input this text as PROMPT? But things are more complicated than they seem. Sure, DALL·E is Transformer-driven, so every single part of a prompt you put in will be taken with focussed attention by the model to create a coherent image with inner logic. Sometimes you get some unusual pictures.



Microsoft Demos AI Development at Build, Using OpenAI Codex

#artificialintelligence

One of the more intriguing technologies demonstrated at this week's Microsoft Build conference was OpenAI Codex, a machine learning model that translates natural language into code "across more than a dozen programming languages." In a keynote presentation entitled "The Future of AI Development Tools," Microsoft chief technology officer Kevin Scott said that "Codex lets us use natural language to express our intentions, and the machine takes on the responsibility of translating those intentions into code." You heard that right: the machine does the coding for you! This could be the beginning of a paradigm shift in programming. It certainly takes the low-code trend to another level, because now you can (potentially) use AI software to talk an app into existence.


Imagen: Will AI text-to-image generators put illustrators out of a job?

#artificialintelligence

Examples of images created by Google's Imagen AI Tech firms are racing to create artificial intelligence algorithms that can produce high-quality images from text prompts, with the technology seeming to advance so quickly that some predict that human illustrators and stock photographers will soon be out of a job. In reality, limitations with these AI systems mean it will probably be a while before they can be used by the general public. Text-to-image generators that use neural networks have made remarkable progress in recent years. The latest, Imagen from Google, comes hot on the heels of DALL-E 2, which was announced by OpenAI in April. Both models use a neural network that is trained on a large number of examples to categorise how images relate to text descriptions. When given a new text description, the neural network repeatedly generates images, altering them until they most closely match the text based on what it has learned.


Visual Turing Test

#artificialintelligence

On a rainy evening during the Easter holidays, I figured I could spice up the dinner conversation by talking about an exciting AI development with my family. As a younger brother, the heavy burden of being "the tech guy" fell on my shoulders -- therefore the rest of my family are not the most technical, with my sister being an artist both at heart and as a profession. The algorithm I had enthusiastically talked about was the newly released Dall-e 2 model that could produce visual art comparable to human art. The conversation went deep into what constitutes art, and what makes us'special' and a claim was made that the depth of emotion that a human artist conveys would be lacking from the AI-generated designs and therefore could be discerned with 100% certainty. You may guess my instant reaction to the last claim: it was time to conduct a test!


Will AI text-to-image generators put illustrators out of a job?

New Scientist

Examples of images created by Google's Imagen AI Tech firms are racing to create artificial intelligence algorithms that can produce high-quality images from text prompts, with the technology seeming to advance so quickly that some predict that human illustrators and stock photographers will soon be out of a job. In reality, limitations with these AI systems mean it will probably be a while before they can be used by the general public. Text-to-image generators that use neural networks have made remarkable progress in recent years. The latest, Imagen from Google, comes hot on the heels of DALL-E 2, which was announced by OpenAI in April. Both models use a neural network that is trained on a large number of examples to categorise how images relate to text descriptions. When given a new text description, the neural network repeatedly generates images, altering them until they most closely match the text based on what it has learned.


The Dark Secret Behind Those Cute AI-generated Animal Images - AI Summary

#artificialintelligence

It's no secret that large models, such as DALL-E 2 and Imagen, trained on vast numbers of documents and images taken from the web, absorb the worst aspects of that data as well as the best. Scroll down the Imagen website--past the dragon fruit wearing a karate belt and the small cactus wearing a hat and sunglasses--to the section on societal impact and you get this: "While a subset of our training data was filtered to removed noise and undesirable content, such as pornographic imagery and toxic language, we also utilized [the] LAION-400M dataset which is known to contain a wide range of inappropriate content including pornographic imagery, racist slurs, and harmful social stereotypes. Imagen relies on text encoders trained on uncurated web-scale data, and thus inherits the social biases and limitations of large language models. It's the same kind of acknowledgement that OpenAI made when it revealed GPT-3 in 2019: "internet-trained models have internet-scale biases." And as Mike Cook, who researches AI creativity at Queen Mary University of London, has pointed out, it's in the ethics statements that accompanied Google's large language model PaLM and OpenAI's DALL-E 2. In short, these firms know that their models are capable of producing awful content, and they have no idea how to fix that. It's no secret that large models, such as DALL-E 2 and Imagen, trained on vast numbers of documents and images taken from the web, absorb the worst aspects of that data as well as the best. Scroll down the Imagen website--past the dragon fruit wearing a karate belt and the small cactus wearing a hat and sunglasses--to the section on societal impact and you get this: "While a subset of our training data was filtered to removed noise and undesirable content, such as pornographic imagery and toxic language, we also utilized [the] LAION-400M dataset which is known to contain a wide range of inappropriate content including pornographic imagery, racist slurs, and harmful social stereotypes.


Content is King; AI is Joker.

#artificialintelligence

When planning an AI-assisted content generation UX/UI (user experience and user interface), these three aspects are to be decided upon: 1) interaction mode: copilot or automatic, 2) work unit (e.g. an image or a full album, document clause or a full document, code function or a micro-service, …), 3) starting point: updating existing content samples or inventing new content from scratch. Let's elaborate on the interaction mode options. In Copilot mode, an AI assistant can, for example, suggest, auto-complete, extend, check, test, and improve the content. Usually done in iterations, guided by the user, and with small work units. In Automatic mode, an AI assistant can, for example, i) replicate previous human actions or preferences and apply them to new samples, or ii) create or compose new samples with certain representation properties.