Goto

Collaborating Authors

 Generative AI


I Don't Need $\mathbf{u}$: Identifiable Non-Linear ICA Without Side Information

arXiv.org Machine Learning

In this work we introduce a new approach for identifiable non-linear ICA models. Recently there has been a renaissance in identifiability results in deep generative models, not least for non-linear ICA. These prior works, however, have assumed access to a sufficiently-informative auxiliary set of observations, denoted $\mathbf{u}$. We show here how identifiability can be obtained in the absence of this side-information, rendering possible fully-unsupervised identifiable non-linear ICA. While previous theoretical results have established the impossibility of identifiable non-linear ICA in the presence of infinitely-flexible universal function approximators, here we rely on the intrinsically-finite modelling capacity of any particular chosen parameterisation of a deep generative model. In particular, we focus on generative models which perform clustering in their latent space -- a model structure which matches previous identifiable models, but with the learnt clustering providing a synthetic form of auxiliary information. We evaluate our proposals using VAEs, on synthetic and image datasets, and find that the learned clusterings function effectively: deep generative models with latent clusterings are empirically identifiable, to the same degree as models which rely on side information.


What is WuDao 2.0, China's artificial intelligence model capable of writing poems and generating recipes that surpassed Google and Musk's OpenAI - Market Research Telecast

#artificialintelligence

The specialists of the Academy of Artificial Intelligence in Beijing (China) this week presented the most sophisticated natural language processing model in the world, which uses 1.75 trillion parameters to simulate conversational speech, write poems, understand images and even generate recipes, pick up the South China Morning Post newspaper. El WuDao 2.0, which in Chinese means'understanding of natural laws', is a previously trained artificial intelligence model that was developed with the help of more than 100 scientists. It is more powerful than the models of its main competitors: the GPT-3 from the company OpenAI (co-founded by Elon Musk), which was launched with 175,000 million parameters, and the Switch Transformer from Google, which uses 1.6 trillion parameters. The model develops both in Chinese and English acquired skills as you have'studied' 4.9 terabytes of images and texts, including 1.2 terabytes of text in those two languages. WuDao 2.0 already has 22 partners, such as smartphone maker Xiaomi or short video giant Kuaishou.


Chinese AI lab challenges Google, OpenAI with a model of 1.75 trillion parameters- PingWest

#artificialintelligence

In the race to build the underlying technologies that can power the next wave of AI revolution, a Chinese lab just toppled OpenAI, the venerated US-based research lab, in terms of who can train a gigantic deep learning model with the most training parameters--as for whether or not there is a race, at least ranking members of the lab believe so. The Beijing Academy of Artificial Intelligence, styled as BAAI and known in Chinese as 北京智源人工智能研究院, launched the latest version of Wudao 悟道, a pre-trained deep learning model that the lab dubbed as "China's first," and "the world's largest ever," with a whopping 1.75 trillion parameters. Unlike conventional deep learning models that are usually task-specific, Wudao is a multi-modal model trained to tackle both text and image, two dramatically different sets of problems. At BAAI's annual academic conference on Tuesday, the institution demonstrated Wudao performing tasks such as natural language processing, text generation, image recognition, image generation, etc. The model is capable of writing poems and couplets in the traditional Chinese styles, answer questions, write essays, generate alt text for images, and generate corresponding images from natural language description with a decent level of photorealism. It is even able to power "virtual idols", with the help of XiaoIce, a Chinese company spun off of Microsoft--so there can be voice support too, in addition to text and image.


From low code to no code: Azure GPT-3 and Microsoft's Power Platform

#artificialintelligence

Microsoft has been making major investments in very large language models, from the hardware to run them in Azure (which it talks about as an'AI supercomputer') to the DeepSpeed library that speeds up training and running machine-learning models with billions of parameters by spreading them across multiple GPUs. In 2020, Microsoft got an exclusive licence for the powerful (and sometimes controversial) GPT-3 natural language generation model from OpenAI, which uses 175 billion parameters to produce what can look very much like something written by a person. OpenAI has a GPT-3 API that's trained and run on Azure, but it's in private beta and researchers and academics have to apply individually to join a waitlist. Similarly, Microsoft hasn't yet started even a private preview for what it calls the Open AI GPT and Azure Service and the page to sign up for notifications says there is no release date yet. But Microsoft is already using GPT-3 and other natural language generation in its products for features that are much more sophisticated than writing automatic captions for images.


What's Happening with Artificial intelligence at a Macro Level Around the World?

#artificialintelligence

Organizations that contributed to the report include representatives from arXiv, AI Ethics Lab, Black in AI, Bloomberg Government, Burning Glass Technologies, Computing Research Association, Elsevier, Intento, International Federation of Robotics, Joint Research Center, European Commission, LinkedIn, Liquidnet, McKinsey Global Institute, Microsoft Academic Graph, National Institute of Standards and Technology, Nesta, NetBase Quid, PostEra, Queer in AI, State of AI Report, Women in Machine Learning, and many individual contributors. Supporting partners to the report include McKinsey & Company, Google, OpenAI, Genpact, AI21 labs, and PricewaterhouseCoopers.


AI Weekly: China's massive multimodal model highlights AI research gap

#artificialintelligence

This week, researchers at the Beijing Academy of Artificial Intelligence (BAAI) announced the release of Wu Dao 2.0, a multimodal AI model capable of generating text indiscernible from human-crafted prose -- and more. Containing 1.75 trillion parameters, the parts of the machine learning model learned from historical training data, Wu Dao 2.0 is 10 times larger than OpenAI's 175-billion-parameter GPT- 3. Wu Dao 2.0 is the latest example of what OpenAI policy director Jack Clark calls model diffusion, or multiple state and private actors developing GPT-3-style AI models. For example, Russia and France are training smaller-scale systems via Sberbank and LightOn's PAGnol, while Korea's Naver Labs is investing in the recently created HyperCLOVA. Clark notes that because these models reflect and magnify the data they're trained on, different countries care about how their own cultures are represented in the models. The Wu Dao 2.0 announcement, then, is part of a general trend of nations asserting their own AI capabilities via training frontier models like GPT-3.


DALL·E Explained in Under 5 Minutes

#artificialintelligence

It seems like every few months, someone publishes a machine learning paper or demo that makes my jaw drop. This behemoth 12-billion-parameter neural network takes a text caption (i.e. "an armchair in the shape of an avocado") and generates images to match it: I think its pictures are pretty inspiring (I'd buy one of those avocado chairs), but what's even more impressive is DALL·E's ability to understand and render concepts of space, time, and even logic (more on that in a second). In this post, I'll give you a quick overview of what DALL·E can do, how it works, how it fits in with recent trends in ML, and why it's significant. In July, DALL·E's creator, the company OpenAI, released a similarly huge model called GPT-3 that wowed the world with its ability to generate human-like text, including Op Eds, poems, sonnets, and even computer code.


On Memorization in Probabilistic Deep Generative Models

arXiv.org Machine Learning

In the last few years there have been incredible successes in generative modeling through the development of deep learning techniques such as variational autoencoders (VAEs) [1, 2], generative adversarial networks (GANs) [3], normalizing flows [4, 5], and diffusion networks [6], among others. The goal of generative modeling is to learn the data distribution of a given data set, which has numerous applications such as creating realistic synthetic data, correcting data corruption, and detecting anomalies. Novel architectures for generative modeling are typically evaluated on how well a complex, high dimensional data distribution can be learned by the model and how realistic the samples from the model are. An important question in the evaluation of generative models is to what extent observations from the training data are memorized by the learning algorithm. A common technique to assess memorization in deep generative models is to look for nearest neighbors. Typically, several samples are drawn from a trained model and compared to their nearest neighbors in the training set. There are several problems with this approach. First, it has been well established that when using the Euclidean metric this test can be easily fooled by taking an image from the training set and shifting it by a few pixels [7]. For this reason, nearest neighbors in the feature space of a secondary model are sometimes used, as well as cropping and/or downsampling before identifying nearest neighbors (e.g.


Microsoft, GPT-3, and the future of OpenAI

#artificialintelligence

One of the biggest highlights of Build, Microsoft's annual software development conference, was the presentation of a tool that uses deep learning to generate source code for office applications. The tool uses GPT-3, a massive language model developed by OpenAI last year and made available to select developers, researchers, and startups in a paid application programming interface. Many have touted GPT-3 as the next-generation artificial intelligence technology that will usher in a new breed of applications and startups. Since GPT-3's release, many developers have found interesting and innovative uses for the language model. And several startups have declared that they will be using GPT-3 to build new or augment existing products. But creating a profitable and sustainable business around GPT-3 remains a challenge.


Chinese AI lab challenges Google, OpenAI with a model of 1.75 trillion parameters- PingWest

#artificialintelligence

In the race to build the underlying technologies that can power the next wave of AI revolution, a Chinese lab just toppled OpenAI, the venerated US-based research lab, in terms of who can train a gigantic deep learning model with the most training parameters--as for whether or not there is a race, at least ranking members of the lab believe so. The Beijing Academy of Artificial Intelligence, styled as BAAI and known in Chinese as 北京智源人工智能研究院, launched the latest version of Wudao 悟道, a pre-trained deep learning model that the lab dubbed as "China's first," and "the world's largest ever," with a whopping 1.75 trillion parameters. Unlike conventional deep learning models that are usually task-specific, Wudao is a multi-modal model trained to tackle both text and image, two dramatically different sets of problems. At BAAI's annual academic conference on Tuesday, the institution demonstrated Wudao performing tasks such as natural language processing, text generation, image recognition, image generation, etc. The model is capable of writing poems and couplets in the traditional Chinese styles, answer questions, write essays, generate alt text for images, and generate corresponding images from natural language description with a decent level of photorealism. It is even able to power "virtual idols", with the help of XiaoIce, a Chinese company spun off of Microsoft--so there can be voice support too, in addition to text and image.