Generative AI
Evidential Softmax for Sparse Multimodal Distributions in Deep Generative Models
Chen, Phil, Itkina, Masha, Senanayake, Ransalu, Kochenderfer, Mykel J.
Many applications of generative models rely on the marginalization of their high-dimensional output probability distributions. Normalization functions that yield sparse probability distributions can make exact marginalization more computationally tractable. However, sparse normalization functions usually require alternative loss functions for training since the log-likelihood is undefined for sparse probability distributions. Furthermore, many sparse normalization functions often collapse the multimodality of distributions. In this work, we present $\textit{ev-softmax}$, a sparse normalization function that preserves the multimodality of probability distributions. We derive its properties, including its gradient in closed-form, and introduce a continuous family of approximations to $\textit{ev-softmax}$ that have full support and can be trained with probabilistic loss functions such as negative log-likelihood and Kullback-Leibler divergence. We evaluate our method on a variety of generative models, including variational autoencoders and auto-regressive architectures. Our method outperforms existing dense and sparse normalization techniques in distributional accuracy. We demonstrate that $\textit{ev-softmax}$ successfully reduces the dimensionality of probability distributions while maintaining multimodality.
GPT-3 Scared You? Meet Wu Dao 2.0: A Monster of 1.75 Trillion Parameters
Jack Clark, OpenAI's policy director, calls this trend of copying GPT-3, "model diffusion." Yet, among all the copies, Wu Dao 2.0 holds the record of being the largest of all with a striking 1.75 trillion parameters (10x GPT-3). Coco Feng reported for South China Morning Post that Wu Dao 2.0 was trained on 4.9TB of high-quality text and image data, which makes GPT-3's training dataset (570GB) pale in comparison. Yet, it's worth noting OpenAI researchers curated 45TB of data to extract clean those 570GB. It can learn from text and images and tackle tasks that include both types of data (something GPT-3 can't do).
Gartner identifies the top strategic technology trends for 2022
Generative AI, distributed enterprise and cloud-native platforms are amongst the top strategic technology trends for 2022, Gartner has predicted. David Groombridge, research vice president at Gartner, says with CEOs and boards striving to find growth through direct digital connections with customers, the priorities of a CIO must reflect the same business imperatives, which run through each of Gartner's top strategic tech trends for 2022. "CIOs must find the IT force multipliers to enable growth and innovation, and create scalable, resilient technical foundations whose scalability will free cash for digital investments," Groombridge says. "These imperatives form the three themes of this year's trends: engineering trust, sculpting change and accelerating growth." Gartner says one of the most visible and powerful AI techniques coming to market is generative AI โ machine learning methods that learn about content or objects from their data, and use it to generate brand-new, completely original, realistic artefacts.
Enterprise AI startup SambaNova releases a large language model tool
SambaNova Systems, a Palo Altoโbased AI startup, announced a new language service model with a familiar description: GPT, which stands for Generative Pre-trained Transformer, and has no links to OpenAI's GPT series of language models. SambaNova markets their GPT as an everyman's alternative to OpenAI's GPT-3, writing in the press release that it will allow companies "to be up and running with a customized language model in as fast as one month as opposed to nine months or a year." What it's all about: You may have heard of SaaS (Software as a Service) or IaaS (Infrastructure as a Service), but SambaNova Systems offers DaaS: Dataflow-as-a-Service. The aim is to sell a suite of AI tools, including natural language processing ones, for startups to adopt quickly and seamlessly. GPT is the latest tool in its box: a model that can both produce and process natural language. It's built for enterprise use cases, according to the company.
Exclusive: OpenAI summarizes KDnuggets - KDnuggets
OpenAI has recently published an important work, focused on the alignment problem, the problem of ensuring that general-purpose AI and machine learning systems align with human intentions. The "Paperclip Maximizer" is a famous example of alignment gone wrong. To test scalable alignment methods, OpenAI trained a model to summarize entire books, as described in their blog on KDnuggets: Scaling human oversight of AI systems for difficult tasks โ OpenAI approach. OpenAI model works by first summarizing small sections of a book, then summarizing those summaries into a higher-level summary, and so on. The results were pretty amazing, so we have asked OpenAI to summarize two top KDnuggets blogs from last year, and here are the summaries.
What is GPT-3 and Why Does it Matter?
The recent hype surrounding Generative Pre-trained Transformer 3 (GPT-3), the new artificial intelligence (AI) based natural language processing (NLP) model, is worth observing, particularly from the enterprise front. Both keen observation and casual look-see applied to this latest language model that generates human-like written content are worth your time and effort. It can also show you that the hype is real. However, like every technological innovation, GPT-3 has its shortcomings, yet it is a great leap for AI. In May 2020, OpenAI, an AI research lab founded by Elon Musk, launched the latest version of an AI-based Natural Language Processing system named GPT-3 that can mimic human language.
Deep Generative Models in Engineering Design: A Review
Regenwetter, Lyle, Nobari, Amin Heyrani, Ahmed, Faez
Automated design synthesis has the potential to revolutionize the modern human design process and improve access to highly optimized and customized products across countless industries. Successfully adapting generative Machine Learning to design engineering may be the key to such automated design synthesis and is a research subject of great importance. We present a review and analysis of Deep Generative Learning models in engineering design. Deep Generative Models (DGMs) typically leverage deep networks to learn from an input dataset and learn to synthesize new designs. Recently, DGMs such as Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), feedforward Neural Networks (NNs) and certain Deep Reinforcement Learning (DRL) frameworks have shown promising results in design applications like structural optimization, materials design, and shape synthesis. The prevalence of DGMs in Engineering Design has skyrocketed since 2016. Anticipating continued growth, we conduct a review of recent advances with the hope of benefitting researchers interested in DGMs for design. We structure our review as an exposition of the algorithms, datasets, representation methods, and applications commonly used in the current literature. In particular, we discuss key works that have introduced new techniques and methods in DGMs, successfully applied DGMs to a design-related domain, or directly supported development of DGMs through datasets or auxiliary methods. We further identify key challenges and limitations currently seen in DGMs across design fields, such as design creativity, handling complex constraints and objectives, and modeling both form and functional performance simultaneously. In our discussion we identify possible solution pathways as key areas on which to target future work.
Microsoft and Nvidia build largest ever AI to mimic human language
Microsoft and chip manufacturer Nvidia have created a vast artificial intelligence that can mimic human language more convincingly than ever before. But the cost and time involved in creating the neural network has called into question whether such AIs can continue to scale up. The new neural network, known as the Megatron-Turing Natural Language Generation (MT-NLG) has 530 billion parameters, more than tripling the scale of OpenAI's groundbreaking GPT-3 neural network that was considered the state of the art up until now.
Scaling Laws for the Few-Shot Adaptation of Pre-trained Image Classifiers
Prato, Gabriele, Guiroy, Simon, Caballero, Ethan, Rish, Irina, Chandar, Sarath
Empirical science of neural scaling laws is a rapidly growing area of significant importance to the future of machine learning, particularly in the light of recent breakthroughs achieved by large-scale pre-trained models such as GPT-3, CLIP and DALL-e. Accurately predicting the neural network performance with increasing resources such as data, compute and model size provides a more comprehensive evaluation of different approaches across multiple scales, as opposed to traditional point-wise comparisons of fixed-size models on fixed-size benchmarks, and, most importantly, allows for focus on the best-scaling, and thus most promising in the future, approaches. In this work, we consider a challenging problem of few-shot learning in image classification, especially when the target data distribution in the few-shot phase is different from the source, training, data distribution, in a sense that it includes new image classes not encountered during training. Our current main goal is to investigate how the amount of pre-training data affects the few-shot generalization performance of standard image classifiers. Our key observations are that (1) such performance improvements are well-approximated by power laws (linear log-log plots) as the training set size increases, (2) this applies to both cases of target data coming from either the same or from a different domain (i.e., new classes) as the training data, and (3) few-shot performance on new classes converges at a faster rate than the standard classification performance on previously seen classes. Our findings shed new light on the relationship between scale and generalization. Over the past decade, deep learning has made tremendous progress in multiple fields, especially in vision (Alam et al., 2020) and natural language processing (Torfi et al., 2020). However, several important issues remain unsolved, including the ability to generalize well to novel, out-of-distribution data (Arjovsky, 2021). A particularly challenging situation involves simultaneous changes at test time in both the input and the task, class distributions, p(x) and p(y x). For example, a self-driving car seeing an elephant for the first time should be able to recognize it as a "new object", while seeing another elephant afterwards, it should be able to recognize it as the same "new object". Obviously, any deployment of deep networks in the real world will likely require them to deal with new situations not encountered during training.