Goto

Collaborating Authors

 Generative AI


Microsoft in Advanced Talks to Increase Investment in OpenAI

WSJ.com: WSJD - Technology

Microsoft is in advanced talks for a new round of funding in OpenAI, according to a person familiar with the matter, as the software giant seeks to further incorporate artificial intelligence into its products. No deal has been reached between the two sides and the funding amount could vary as negotiations evolve, the person said. The companies have held talks in recent weeks, according to people familiar with the matter. Microsoft invested $1 billion in OpenAI in 2019. The new cash could help bankroll the tremendous computing power OpenAI needs to run its various artificial intelligence products on Azure, Microsoft's cloud computing service.


Microsoft's new AI art generator will spark your imagination

PCWorld

Microsoft has begun rolling out Image Creator from Microsoft Bing in a preview to select markets, preparing the AI art generator for a wider rollout to Microsoft Edge later this month. In a blog post and related video, the company showed off how Image Creator will work and explained in more detail what limitations it will place upon prompts that users generate. Microsoft said last week that it would be bringing AI art to both Bing and Edge, using the more advanced DALL-E 2 algorithm to generate the art. It appears that Image Creator will be accessible from Bing.com and a related version will be available from Edge soon after. The company showed off Image Creator working within the Edge sidebar, which can carve out a small vertical column to display search results and other information as well as some useful utilities.


How Open Source is eating AI

#artificialintelligence

By August, it had been cloned in the open by two master's students as OpenGPT-2 By November, OpenAI released their 1.5B parameter model, after a cautious staged release process May 2020: OpenAI released GPT-3 as a paper and a closed beta API in June 2020. Mar 2021: EleutherAI released their open GPT-Neo 1.3B and 2.7B models May 2022: Meta released OPT-175B for researchers (with logbook! and an open license) The Text-to-Image cycle took 4? months: Apr 2022: OpenAI announces DALL-E 2 with a limited "research preview" The timelines above are highly cherrypicked of course; the story is much longer if you take into account the longer development history starting from the academic papers for diffusion (2015) and transformer models (2017) and older work on GANs. But what is more interesting is what has happened since: OpenAI's audio-to-text model, Whisper, was released under MIT license in September with no API paywall. Of course, there is less scope for abuse in the audio-to-text domain, but more than a few people have speculated that the reception to Stable Diffusion's release influenced the open sourcing decision. Sufficiently advanced community is indistinguishable from magic.


Generally Intelligent secures cash from OpenAI vets to build capable AI systems

#artificialintelligence

A new AI research company is launching out of stealth today with an ambitious goal: to research the fundamentals of human intelligence that machines currently lack. Called Generally Intelligent, it plans to do this by turning these fundamentals into an array of tasks to be solved and by designing and testing different systems' ability to learn to solve them in highly complex 3D worlds built by their team. "We believe that generally intelligent computers will someday unlock extraordinary potential for human creativity and insight," CEO Kanjun Qiu told TechCrunch in an email interview. "However, today's AI models are missing several key elements of human intelligence, which inhibits the development of general-purpose AI systems that can be deployed safely โ€ฆ Generally Intelligent's work aims to understand the fundamentals of human intelligence in order to engineer safe AI systems that can learn and understand the way humans do." Qiu, the former chief of staff at Dropbox and the co-founder of Ember Hardware, which designed laser displays for VR headsets, co-founded Generally Intelligent in 2021 after shutting down her previous startup, Sourceress, a recruiting company that used AI to scour the web.


OpenAI offers early look at DALL-E API, showcases text-to-image use case

#artificialintelligence

Did you miss a session from MetaBeat 2022? Head over to the on-demand library for all of our featured sessions here. The DALL-E API won't be officially announced until later this fall, according to OpenAI, but today the company shared details about a customer already leveraging the DALL-E API for a specific enterprise use case. New York City-based Cala, a startup that bills itself as the "world's first operating system for fashion," offers a digital platform (including a mobile app launched in March) that allows creators to design and produce clothing lines, unifying the process from product ideation through order fulfillment. With the addition of DALL-E-powered text-to-image generating tools, users can generate new visual design ideas from natural text descriptions or uploaded reference images โ€“ which the company says are first-of-its-kind capabilities for the fashion industry.


Efficient Diffusion Models for Vision: A Survey

arXiv.org Artificial Intelligence

Diffusion Models (DMs) have demonstrated state-of-the-art performance in content generation without requiring adversarial training. These models are trained using a two-step process. First, a forward - diffusion - process gradually adds noise to a datum (usually an image). Then, a backward - reverse diffusion - process gradually removes the noise to turn it into a sample of the target distribution being modelled. DMs are inspired by non-equilibrium thermodynamics and have inherent high computational complexity. Due to the frequent function evaluations and gradient calculations in high-dimensional spaces, these models incur considerable computational overhead during both training and inference stages. This can not only preclude the democratization of diffusion-based modelling, but also hinder the adaption of diffusion models in real-life applications. Not to mention, the efficiency of computational models is fast becoming a significant concern due to excessive energy consumption and environmental scares. These factors have led to multiple contributions in the literature that focus on devising computationally efficient DMs. In this review, we present the most recent advances in diffusion models for vision, specifically focusing on the important design aspects that affect the computational efficiency of DMs. In particular, we emphasize the recently proposed design choices that have led to more efficient DMs. Unlike the other recent reviews, which discuss diffusion models from a broad perspective, this survey is aimed at pushing this research direction forward by highlighting the design strategies in the literature that are resulting in practicable models for the broader research community. We also provide a future outlook of diffusion models in vision from their computational efficiency viewpoint. EEP generative modelling has emerged as one of the most exciting computational tools that is even challenging human creativity [1].


Composing Ensembles of Pre-trained Models via Iterative Consensus

arXiv.org Artificial Intelligence

Large pre-trained models exhibit distinct and complementary capabilities dependent on the data they are trained on. Language models such as GPT-3 are capable of textual reasoning but cannot understand visual information, while vision models such as DALL-E can generate photorealistic photos but fail to understand complex language descriptions. In this work, we propose a unified framework for composing ensembles of different pre-trained models - combining the strengths of each individual model to solve various multimodal problems in a zero-shot manner. We use pre-trained models as "generators" or "scorers" and compose them via closed-loop iterative consensus optimization. The generator constructs proposals and the scorers iteratively provide feedback to refine the generated result. Such closed-loop communication enables models to correct errors caused by other models, significantly boosting performance on downstream tasks, e.g. We demonstrate that consensus achieved by an ensemble of scorers outperforms the feedback of a single scorer, by leveraging the strengths of each expert model. Results show that the proposed method can be used as a general purpose framework for a wide range of zero-shot multimodal tasks, such as image generation, video question answering, mathematical reasoning, and robotic manipulation. Large pre-trained models have shown remarkable zero-shot generalization abilities, ranging from zero-shot image generation and natural language processing to machine reasoning and action planning. Such models are trained on large datasets scoured from the internet, often consisting of billions of datapoints. Individual pre-trained models capture different aspects of knowledge on the internet, with language models (LMs) capturing textual information in news, articles, and Wikipedia pages, and visual-language models (VLMs) modeling the alignments between visual and textual information. While it is desirable to have a single sizable pre-trained model capturing all possible modalities of data on the internet, such a comprehensive model is challenging to obtain and maintain, requiring intensive memory, an enormous amount of energy, months of training time, and millions of dollars. A more scalable alternative approach is to compose different pre-trained models together, leveraging the knowledge from different expert models to solve complex multimodal tasks. Building a unified framework for composing multiple models is challenging.


Make-A-Video: Text-to-Video Generation's Next... Generation? - NAB Amplify

#artificialintelligence

The inevitable has happened, albeit a little sooner than expected. After all the hoopla surrounding text-to-image AI generators in recent months, Meta is first out of the gate with a text-to-video version. Perhaps Meta wanted to establish some headline leadership in this space, since the results aren't ready for primetime. But as developments in text-to-image generation has shown, by the time you read this the technology will already have advanced. Meta is only giving a glimpse to the public at the tech it calls Make-A-Video.


Focus on Whisper, OpenAI's automatic speech recognition system - Actu IA

#artificialintelligence

OpenAI recently released Whisper, a 1.6 billion parameter AI model capable of transcribing and translating speech audio from 97 different languages, showing robust performance on a wide range of automated speech recognition (ASR) tasks. The model trained on 680,000 hours of audio data collected from the web was soon published as open source on GitHub. Whisper uses a transform-encoder-decoder architecture, the input audio is split into 30-second chunks, converted to a log-Mel spectrogram, and then passed through an encoder. Unlike most state-of-the-art ASR models, it has not been fitted to a specific data set, but instead has been trained using weak supervision on a large-scale noisy data set collected from the Internet. Although it did not beat the specialized LibriSpeech performance models, in zero-shot evaluations on a diverse dataset, Whisper proved to be more robust and made 50% fewer errors than those models.


David O. Houwen on LinkedIn: #AI #LLMs #OpenAI

#artificialintelligence

Do not keep calm and carry on, girls!'' Do we really care more about Van Gogh's sunflowers than real ones? Gedorfge Monbiot The Guardian The response by the media and government to the two Just Stop Oil activists who threw soup at Vincent van Gogh's Sunflowers in the National Gallery in London speaks volumes. Decorating the glass protecting the painting with tomato soup (the painting itself was, as the protesters calculated, undamaged) appears to horrify some people more than the collapse of our planet, which these campaigners are seeking to prevent. Everywhere I see claims that the "extreme" tactics of environmental campaigners will prompt people to "stop listening". But how could we listen any less to the warnings of scientists and campaigners and eminent committees?