Goto

Collaborating Authors

 Large Language Model


Google's new SEED RL framework reduces AI model training costs by 80% - SiliconANGLE

#artificialintelligence

Researchers at Google have open-sourced a new framework that can scale up artificial intelligence model training across thousands of machines. It's a promising development because it should enable AI algorithm training to be performed at millions of frames per second while reducing the costs of doing so by as much as 80%, Google noted in a research paper. That kind of reduction could help to level the playing field a bit for startups that previously haven't been able to compete with major players such as Google in AI. Indeed, the cost of training sophisticated machine learning models in the cloud is surprisingly expensive. One recent report by Synced found that the University of Washington racked up $25,000 in costs to train its Grover model, which is used to detect and generate fake news.


Google open-sources framework that reduces AI training costs by up to 80%

#artificialintelligence

Google researchers recently published a paper describing a framework -- SEED RL -- that scales AI model training to thousands of machines. They say that it could facilitate training at millions of frames per second on a machine while reducing costs by up to 80%, potentially leveling the playing field for startups that couldn't previously compete with large AI labs. Training sophisticated machine learning models in the cloud remains prohibitively expensive. According to a recent Synced report, the University of Washington's Grover, which is tailored for both the generation and detection of fake news, cost $25,000 to train over the course of two weeks. OpenAI racked up $256 per hour to train its GPT-2 language model, and Google spent an estimated $6,912 training BERT, a bidirectional transformer model that redefined the state of the art for 11 natural language processing tasks.


Hands-On Guide to OpenAI Gym Custom Environments - Analytics India Magazine

#artificialintelligence

OpenAI Gym is a well known RL community for developing and comparing Reinforcement Learning agents. OpenAI Gym doesn't make assumptions about the structure of the agent and works out well with any numerical computation library such as TensorFlow, PyTorch. The gym also provides various types of environments. In this hands-on guide, we will develop a tic-tac-toe environment from scratch using OpenAI Gym. To start with, let's create the desired folder structure with all the required files.


How to generate text: using different decoding methods for language generation with Transformers

#artificialintelligence

In recent years, there has been an increasing interest in open-ended language generation thanks to the rise of large transformer-based language models trained on millions of webpages, such as OpenAI's famous GPT2 model. The results on conditioned open-ended language generation are impressive, e.g. Besides the improved transformer architecture and massive unsupervised training data, better decoding methods have also played an important role. This blog post gives a brief overview of different decoding strategies and more importantly shows how you can implement them with very little effort using the popular transformers library! All of the following functionalities can be used for auto-regressive language generation (here a refresher).


What happens when a machine can write as well as an academic? University Affairs

#artificialintelligence

Recently one morning, I asked my computer a relatively simple question: can artificial intelligence (AI) write? We're not too certain on what artificial intelligence will be able to write, but there are some scenarios in which computers could be responsible for a huge number of word documents … The biggest potential scenarios would involve machines analyzing what has already been written and determining what pieces need to be edited to make the content seem fresh. The above sentences were composed by a machine in a matter of seconds. The tool used is a freely accessible interface based on the GPT-2 text generator released by OpenAI – a company founded by technology industry leaders, including Elon Musk and Sam Altman. Only a limited version of the tool was made available, as it was dubbed "too dangerous" by the company to release fully into the world.


Transductive Zero-Shot Learning with Visual Structure Constraint

Neural Information Processing Systems

To recognize objects of the unseen classes, most existing Zero-Shot Learning (ZSL) methods first learn a compatible projection function between the common semantic space and the visual space based on the data of source seen classes, then directly apply it to the target unseen classes. However, in real scenarios, the data distribution between the source and target domain might not match well, thus causing the well-known domain shift problem. Based on the observation that visual features of test instances can be separated into different clusters, we propose a new visual structure constraint on class centers for transductive ZSL, to improve the generality of the projection function (\ie alleviate the above domain shift problem). Specifically, three different strategies (symmetric Chamfer-distance,Bipartite matching distance, and Wasserstein distance) are adopted to align the projected unseen semantic centers and visual cluster centers of test instances. We also propose a new training strategy to handle the real cases where many unrelated images exist in the test dataset, which is not considered in previous methods.


6 Pretrained Models to Master Text Classification

#artificialintelligence

Though ERNIE 1.0 (released in March 2019) has been a popular model for text classification, it was ERNIE 2.0 which became the talk of the town in the latter half of 2019. Developed by tech-giant Baidu, ERNIE outperformed Google XLNet and BERT on the GLUE benchmark for English. ERNIE stands for Enhanced Representation through kNowledge IntEgration, and ERNIE 2.0 is an upgraded version of ERNIE 1.0. ERNIE 1.0 was pathbreaking in its own way – it was one of the first models to leverage Knowledge Graphs. This incorporation further enhanced training the model for advanced tasks like Relation Classification and NamedEntityRecognition (NER). Like its predecessor, ERNIE 2.0 brings another innovation to the table in the form of Continual Incremental Multi-task Learning.


Semantic-Guided Multi-Attention Localization for Zero-Shot Learning

Neural Information Processing Systems

Zero-shot learning extends the conventional object classification to the unseen class recognition by introducing semantic representations of classes. Existing approaches predominantly focus on learning the proper mapping function for visual-semantic embedding, while neglecting the effect of learning discriminative visual features. In this paper, we study the significance of the discriminative region localization. We propose a semantic-guided multi-attention localization model, which automatically discovers the most discriminative parts of objects for zero-shot learning without any human annotations. Our model jointly learns cooperative global and local features from the whole object as well as the detected parts to categorize objects based on semantic descriptions.


Zero-shot Knowledge Transfer via Adversarial Belief Matching

Neural Information Processing Systems

Performing knowledge transfer from a large teacher network to a smaller student is a popular task in modern deep learning applications. However, due to growing dataset sizes and stricter privacy regulations, it is increasingly common not to have access to the data that was used to train the teacher. We propose a novel method which trains a student to match the predictions of its teacher without using any data or metadata. We achieve this by training an adversarial generator to search for images on which the student poorly matches the teacher, and then using them to train the student. Our resulting student closely approximates its teacher for simple datasets like SVHN, and on CIFAR10 we improve on the state-of-the-art for few-shot distillation (with $100$ images per class), despite using no data.


Visualizing and Measuring the Geometry of BERT

Neural Information Processing Systems

Transformer architectures show significant promise for natural language processing. Given that a single pretrained model can be fine-tuned to perform well on many different tasks, these networks appear to extract generally useful linguistic features. A natural question is how such networks represent this information internally. This paper describes qualitative and quantitative investigations of one particularly effective model, BERT. At a high level, linguistic features seem to be represented in separate semantic and syntactic subspaces.