AITopics | Generative AI

Collaborating Authors

Generative AI

News Overviews Instructional Materials AI-Alerts Classics

Microsoft in Advanced Talks to Increase Investment in OpenAI

WSJ.com: WSJD - TechnologyOct-20-2022, 21:34:00 GMT

Microsoft is in advanced talks for a new round of funding in OpenAI, according to a person familiar with the matter, as the software giant seeks to further incorporate artificial intelligence into its products. No deal has been reached between the two sides and the funding amount could vary as negotiations evolve, the person said. The companies have held talks in recent weeks, according to people familiar with the matter. Microsoft invested $1 billion in OpenAI in 2019. The new cash could help bankroll the tremendous computing power OpenAI needs to run its various artificial intelligence products on Azure, Microsoft's cloud computing service.

increase investment, microsoft, openai, (4 more...)

WSJ.com: WSJD - Technology

Industry: Information Technology > Software (0.38)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Microsoft's new AI art generator will spark your imagination

PCWorldOct-20-2022, 17:44:44 GMT

Microsoft has begun rolling out Image Creator from Microsoft Bing in a preview to select markets, preparing the AI art generator for a wider rollout to Microsoft Edge later this month. In a blog post and related video, the company showed off how Image Creator will work and explained in more detail what limitations it will place upon prompts that users generate. Microsoft said last week that it would be bringing AI art to both Bing and Edge, using the more advanced DALL-E 2 algorithm to generate the art. It appears that Image Creator will be accessible from Bing.com and a related version will be available from Edge soon after. The company showed off Image Creator working within the Edge sidebar, which can carve out a small vertical column to display search results and other information as well as some useful utilities.

ai art generator, image creator, microsoft, (6 more...)

PCWorld

Industry: Law Enforcement & Public Safety (0.33)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.78)

Add feedback

How Open Source is eating AI

#artificialintelligenceOct-20-2022, 16:35:18 GMT

By August, it had been cloned in the open by two master's students as OpenGPT-2 By November, OpenAI released their 1.5B parameter model, after a cautious staged release process May 2020: OpenAI released GPT-3 as a paper and a closed beta API in June 2020. Mar 2021: EleutherAI released their open GPT-Neo 1.3B and 2.7B models May 2022: Meta released OPT-175B for researchers (with logbook! and an open license) The Text-to-Image cycle took 4? months: Apr 2022: OpenAI announces DALL-E 2 with a limited "research preview" The timelines above are highly cherrypicked of course; the story is much longer if you take into account the longer development history starting from the academic papers for diffusion (2015) and transformer models (2017) and older work on GANs. But what is more interesting is what has happened since: OpenAI's audio-to-text model, Whisper, was released under MIT license in September with no API paywall. Of course, there is less scope for abuse in the audio-to-text domain, but more than a few people have speculated that the reception to Stable Diffusion's release influenced the open sourcing decision. Sufficiently advanced community is indistinguishable from magic.

license, open source, stable diffusion, (14 more...)

#artificialintelligence

Country:

Europe > Italy (0.04)
Asia > Singapore (0.04)
Asia > India (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Generally Intelligent secures cash from OpenAI vets to build capable AI systems

#artificialintelligenceOct-20-2022, 16:35:09 GMT

A new AI research company is launching out of stealth today with an ambitious goal: to research the fundamentals of human intelligence that machines currently lack. Called Generally Intelligent, it plans to do this by turning these fundamentals into an array of tasks to be solved and by designing and testing different systems' ability to learn to solve them in highly complex 3D worlds built by their team. "We believe that generally intelligent computers will someday unlock extraordinary potential for human creativity and insight," CEO Kanjun Qiu told TechCrunch in an email interview. "However, today's AI models are missing several key elements of human intelligence, which inhibits the development of general-purpose AI systems that can be deployed safely … Generally Intelligent's work aims to understand the fundamentals of human intelligence in order to engineer safe AI systems that can learn and understand the way humans do." Qiu, the former chief of staff at Dropbox and the co-founder of Ember Hardware, which designed laser displays for VR headsets, co-founded Generally Intelligent in 2021 after shutting down her previous startup, Sourceress, a recruiting company that used AI to scour the web.

agent, ai system, intelligence, (15 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Creativity & Intelligence (0.98)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.45)

Add feedback

OpenAI offers early look at DALL-E API, showcases text-to-image use case

#artificialintelligenceOct-20-2022, 14:51:41 GMT

Did you miss a session from MetaBeat 2022? Head over to the on-demand library for all of our featured sessions here. The DALL-E API won't be officially announced until later this fall, according to OpenAI, but today the company shared details about a customer already leveraging the DALL-E API for a specific enterprise use case. New York City-based Cala, a startup that bills itself as the "world's first operating system for fashion," offers a digital platform (including a mobile app launched in March) that allows creators to design and produce clothing lines, unifying the process from product ideation through order fulfillment. With the addition of DALL-E-powered text-to-image generating tools, users can generate new visual design ideas from natural text descriptions or uploaded reference images – which the company says are first-of-its-kind capabilities for the fashion industry.

dall-e api, showcase text-to-image use case, use case, (11 more...)

#artificialintelligence

Country: North America > United States > New York (0.25)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Efficient Diffusion Models for Vision: A Survey

Ulhaq, Anwaar, Akhtar, Naveed, Pogrebna, Ganna

arXiv.org Artificial IntelligenceOct-20-2022

Diffusion Models (DMs) have demonstrated state-of-the-art performance in content generation without requiring adversarial training. These models are trained using a two-step process. First, a forward - diffusion - process gradually adds noise to a datum (usually an image). Then, a backward - reverse diffusion - process gradually removes the noise to turn it into a sample of the target distribution being modelled. DMs are inspired by non-equilibrium thermodynamics and have inherent high computational complexity. Due to the frequent function evaluations and gradient calculations in high-dimensional spaces, these models incur considerable computational overhead during both training and inference stages. This can not only preclude the democratization of diffusion-based modelling, but also hinder the adaption of diffusion models in real-life applications. Not to mention, the efficiency of computational models is fast becoming a significant concern due to excessive energy consumption and environmental scares. These factors have led to multiple contributions in the literature that focus on devising computationally efficient DMs. In this review, we present the most recent advances in diffusion models for vision, specifically focusing on the important design aspects that affect the computational efficiency of DMs. In particular, we emphasize the recently proposed design choices that have led to more efficient DMs. Unlike the other recent reviews, which discuss diffusion models from a broad perspective, this survey is aimed at pushing this research direction forward by highlighting the design strategies in the literature that are resulting in practicable models for the broader research community. We also provide a future outlook of diffusion models in vision from their computational efficiency viewpoint. EEP generative modelling has emerged as one of the most exciting computational tools that is even challenging human creativity [1].

artificial intelligence, diffusion model, machine learning, (11 more...)

arXiv.org Artificial Intelligence

2210.09292

Country:

North America > United States (0.28)
Oceania > Australia (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Energy (1.00)
Information Technology > Security & Privacy (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Add feedback

Composing Ensembles of Pre-trained Models via Iterative Consensus

Li, Shuang, Du, Yilun, Tenenbaum, Joshua B., Torralba, Antonio, Mordatch, Igor

arXiv.org Artificial IntelligenceOct-20-2022

Large pre-trained models exhibit distinct and complementary capabilities dependent on the data they are trained on. Language models such as GPT-3 are capable of textual reasoning but cannot understand visual information, while vision models such as DALL-E can generate photorealistic photos but fail to understand complex language descriptions. In this work, we propose a unified framework for composing ensembles of different pre-trained models - combining the strengths of each individual model to solve various multimodal problems in a zero-shot manner. We use pre-trained models as "generators" or "scorers" and compose them via closed-loop iterative consensus optimization. The generator constructs proposals and the scorers iteratively provide feedback to refine the generated result. Such closed-loop communication enables models to correct errors caused by other models, significantly boosting performance on downstream tasks, e.g. We demonstrate that consensus achieved by an ensemble of scorers outperforms the feedback of a single scorer, by leveraging the strengths of each expert model. Results show that the proposed method can be used as a general purpose framework for a wide range of zero-shot multimodal tasks, such as image generation, video question answering, mathematical reasoning, and robotic manipulation. Large pre-trained models have shown remarkable zero-shot generalization abilities, ranging from zero-shot image generation and natural language processing to machine reasoning and action planning. Such models are trained on large datasets scoured from the internet, often consisting of billions of datapoints. Individual pre-trained models capture different aspects of knowledge on the internet, with language models (LMs) capturing textual information in news, articles, and Wikipedia pages, and visual-language models (VLMs) modeling the alignments between visual and textual information. While it is desirable to have a single sizable pre-trained model capturing all possible modalities of data on the internet, such a comprehensive model is challenging to obtain and maintain, requiring intensive memory, an enormous amount of energy, months of training time, and millions of dollars. A more scalable alternative approach is to compose different pre-trained models together, leveraging the knowledge from different expert models to solve complex multimodal tasks. Building a unified framework for composing multiple models is challenging.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2210.11522

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Energy (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

Add feedback

Make-A-Video: Text-to-Video Generation's Next... Generation? - NAB Amplify

#artificialintelligenceOct-19-2022, 19:35:41 GMT

The inevitable has happened, albeit a little sooner than expected. After all the hoopla surrounding text-to-image AI generators in recent months, Meta is first out of the gate with a text-to-video version. Perhaps Meta wanted to establish some headline leadership in this space, since the results aren't ready for primetime. But as developments in text-to-image generation has shown, by the time you read this the technology will already have advanced. Meta is only giving a glimpse to the public at the tech it calls Make-A-Video.

artificial intelligence, machine learning, make-a-video, (12 more...)

#artificialintelligence

Industry: Media > Film (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.33)

Add feedback

Focus on Whisper, OpenAI's automatic speech recognition system - Actu IA

#artificialintelligenceOct-19-2022, 10:56:09 GMT

OpenAI recently released Whisper, a 1.6 billion parameter AI model capable of transcribing and translating speech audio from 97 different languages, showing robust performance on a wide range of automated speech recognition (ASR) tasks. The model trained on 680,000 hours of audio data collected from the web was soon published as open source on GitHub. Whisper uses a transform-encoder-decoder architecture, the input audio is split into 30-second chunks, converted to a log-Mel spectrogram, and then passed through an encoder. Unlike most state-of-the-art ASR models, it has not been fitted to a specific data set, but instead has been trained using weak supervision on a large-scale noisy data set collected from the Internet. Although it did not beat the specialized LibriSpeech performance models, in zero-shot evaluations on a diverse dataset, Whisper proved to be more robust and made 50% fewer errors than those models.

automatic speech recognition system, openai, translation, (8 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.70)

Add feedback

David O. Houwen on LinkedIn: #AI #LLMs #OpenAI

#artificialintelligenceOct-19-2022, 09:11:22 GMT

Do not keep calm and carry on, girls!'' Do we really care more about Van Gogh's sunflowers than real ones? Gedorfge Monbiot The Guardian The response by the media and government to the two Just Stop Oil activists who threw soup at Vincent van Gogh's Sunflowers in the National Gallery in London speaks volumes. Decorating the glass protecting the painting with tomato soup (the painting itself was, as the protesters calculated, undamaged) appears to horrify some people more than the collapse of our planet, which these campaigners are seeking to prevent. Everywhere I see claims that the "extreme" tactics of environmental campaigners will prompt people to "stop listening". But how could we listen any less to the warnings of scientists and campaigners and eminent committees?

houwen, linkedin, vincent van gogh, (10 more...)

#artificialintelligence

Technology:

Information Technology > Communications > Social Media (0.85)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.40)

Add feedback