Goto

Collaborating Authors

 Generative AI


Text-to-image Diffusion Models in Generative AI: A Survey

arXiv.org Artificial Intelligence

Abstract--This survey reviews text-to-image diffusion models in the context that diffusion models have emerged to be popular for a wide range of generative tasks. As a self-contained work, this survey starts with a brief introduction of how a basic diffusion model works for image synthesis, followed by how condition or guidance improves learning. Based on that, we present a review of state-of-the-art methods on text-conditioned image synthesis, i.e. text-to-image. We further summarize applications beyond text-to-image generation: text-guided creative generation and text-guided image editing. Beyond the progress made so far, we discuss existing challenges and promising future directions. The volume of the relevant works makes humans read a story in text, they can draw relevant images it increasingly challenging for readers to keep abreast of in their heads by imagination, which helps them understand the recent development of text-to-image diffusion model and enjoy more. However, as far as we that generates visually realistic images from textural descriptions, know, there is no survey work focusing on recent progress i.e., the text-to-image task, is a non-trivial task of diffusion-based text-to-image generation yet. A branch of and therefore can be seen as a major milestone toward related surveys [19], [20], [21], [22] reviews the progress of human-like or general artificial intelligence [1], [2], [3], [4].


A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI

arXiv.org Artificial Intelligence

Generative AI has demonstrated impressive performance in various fields, among which speech synthesis is an interesting direction. With the diffusion model as the most popular generative model, numerous works have attempted two active tasks: text to speech and speech enhancement. This work conducts a survey on audio diffusion model, which is complementary to existing surveys that either lack the recent progress of diffusion-based speech synthesis or highlight an overall picture of applying diffusion model in multiple fields. Specifically, this work first briefly introduces the background of audio and diffusion model. As for the text-to-speech task, we divide the methods into three categories based on the stage where diffusion model is adopted: acoustic model, vocoder and end-to-end framework. Moreover, we categorize various speech enhancement tasks by either certain signals are removed or added into the input speech. Comparisons of experimental results and discussions are also covered in this survey.


Make a fun, infinitely replayable game in 5 minutes with GPT-4

#artificialintelligence

Kids don't have any problem coming up with descriptions for universes. Describe the lore for a fictional universe that incorporates medieval technology and life with elemental magic. The world is divided into two major realms locked in nearly perpetual, centuries-old conflict. The shared religion of this world tells of a young adventurer that will come to unite these two realms. The inhabitants of this land harness the forces of nature – Earth, Air, Water, and Fire – to shape their destinies.


Google starts testing generative AI features in Gmail, Docs

#artificialintelligence

After announcing the upcoming generative artificial intelligence (AI) features to Workspace apps two weeks ago, Google has now started public testing them in Gmail and Docs. The current trusted test programme includes consumer, enterprise and education users (over 18 years) in the US, reports 9To5Google. This "small group" is invited by the tech giant to join and they must sign up and opt-in. The testers can also leave the programme at any time. Users in the programme can use generative AI in Gmail to draft everything from a birthday invitation to a job cover letter.


Elon Musk and Other AI Experts Want to Pause AI Progress

#artificialintelligence

Artificial intelligence (AI) has been advancing at an unprecedented pace, and its development and deployment have sparked concerns among prominent AI experts, tech entrepreneurs, and scientists. A letter written by the Future of Life Institute, an organization focused on technological risks to humanity, calls for a pause on the development and testing of AI technologies more powerful than OpenAI's language model GPT-4 so that the risks it may pose can be properly studied. The letter has been signed by hundreds of individuals, including those working on advanced AI models. The letter warns that language models like GPT-4 can already compete with humans at a growing range of tasks and could be used to automate jobs and spread misinformation. Furthermore, the letter raises the distant prospect of AI systems that could replace humans and remake civilization. Therefore, the pause should be "public and verifiable" and should involve all those working on advanced AI models like GPT-4.


What is Bard? Google's AI Chatbot Explained

#artificialintelligence

Google Bard is built on Google's Language Model for Dialogue Applications (LaMDA) technology. LaMDA was built on Transformer, Google's neural network architecture released in 2017. Because Google released Transformer as open source, it has been the framework for other generative AI tools, including the GPT-3 language model used in ChatGPT. OpenAI's ChatGPT is tuned to generative AI, producing everything from synopses to creative writing. Alternatively, Bard is designed around search.


GPT-4 and the Next Frontier of Generative AI – Towards AI

#artificialintelligence

Originally published on Towards AI. This is a follow-up to my part 1 on ChatGPT. GPT-4 has burst onto the scene! Open AI officially released the larger and more powerful successor to GPT-3 with many improvements, including the ability to process images, draft a lawsuit, and handle up to a 25,000-word input.¹ During testing, Open AI reported that it was smart enough to find a solution for solving a CAPTCHA by hiring a human on taskrabbit to do it for GPT-4.²


Can We Enhance AI Safety By Teaching AI To Love Humans And Learning How To Love AI?

#artificialintelligence

Large language models (LLMs) based on transformer architectures have taken the world by storm, with ChatGPT quickly becoming a household name. While the concept of generative AI is not new and can be traced back to Jürgen Schmidhuber's (now at KAUST) work in the 1990s and even further into history, Ian Goodfellow's generative adversarial networks (GANs) and Google's transformers published in 2017 enabled the development and industrialization of multi-purpose AI. My teams have been working in this area since 2015 both in generative biology and generative chemistry, with AI-generated drugs in human clinical trials and the most advanced departments in pharma companies using our software, and we have utilized LLMs almost since they were first published. OpenAI's GPT has also been available to the public since 2020. However, the public release and consumerization of ChatGPT have taken the world by surprise and triggered a new cycle of hyper investment and productization of LLMs that are propagating into the search market. Although both Recurrent Neural Network (RNN) and transformer-based LLMs, as well as multimodal LLMs, are surprisingly good at language understanding and generation, I believe they are still as far from human-level consciousness as a calculator.


OpenAI Invests $23.5 Million in 1X's Humanoid Robot NEO; Direct competitor to Tesla Inc's Optimus – Evincism

#artificialintelligence

OpenAI's startup fund invested $23.5 million in a Series A2 funding round on the engineering company 1X, on 23rd March 2023.[1] Another key product manufactured by 1X is EVE, a high-mobility robot attached with wheels as feet. EVE's ability to gently move, manipulate objects, and interact with the world makes it ideal for use in real-world applications. It uses a base level of training to move about our spaces, turning corners and opening doors using shared autonomy. The robot could replace laborers involved in construction, manufacturing industries, etc and potentially solve the labor shortage crisis.


5 tech trends Intuit leaders are watching in 2023 - Intuit Blog

#artificialintelligence

As the calendar turns to 2023, Intuit's leading technologists share the trends they'll be watching closely. Ranging from the generative AI phenomenon, the rise in data protection regulations, and implications of Web3 for the fintech landscape, to how "thinking like criminals" will pay off for companies looking to stave off bad actors, Intuit leaders weigh in on what the future holds. Opportunities to catalyze innovation abound for tech companies. Generative AI is rapidly becoming more powerful and more prominent, popularized by chatbots and apps such as ChatGPT and Lensa, but it still needs to develop and mature before it can safely be used in industries where the accuracy of statements are critical, such as finance or medicine. Within the next several years, generative AI will likely play a pivotal role in helping create personalized conversational systems to provide financial or medical advice and guidance directly to customers.