Goto

Collaborating Authors

 Large Language Model


Are Foundation Models going to lead to general #artificialintelligence - Pinaki Laskar on LinkedIn

#artificialintelligence

Are Foundation Models going to lead to general #artificialintelligence? The foundation models are about the deepAI. Its a narrow, weak, human-centered, big-tech AI of statistic, data driven ML/DL/ANNs. The logic is here rather simple and naive: train one model on a huge amount of data and adapt it to many applications. The genuine, human-augmenting AI is Causal Machine Intelligence and Learning, emerging as Trans-AI or Meta-AI, combining as its integrated modules: ANI, ML, DL, ANNs, LLMs, Machine Perception, Computer Vision; Foundation Transformer Models, AGI, ASI; Contextual, Composite, Causal AI; Though foundation models are based on standard #deeplearning and transfer learning, their scale results in new emergent capabilities, and their effectiveness across so many tasks incentivizes homogenization.



Pseudo-Labels Are All You Need

arXiv.org Artificial Intelligence

Automatically estimating the complexity of texts for readers has a variety of applications, such as recommending texts with an appropriate complexity level to language learners or supporting the evaluation of text simplification approaches. In this paper, we present our submission to the Text Complexity DE Challenge 2022, a regression task where the goal is to predict the complexity of a German sentence for German learners at level B. Our approach relies on more than 220,000 pseudo-labels created from the German Wikipedia and other corpora to train Transformer-based models, and refrains from any feature engineering or any additional, labeled data. We find that the pseudo-label-based approach gives impressive results yet requires little to no adjustment to the specific task and therefore could be easily adapted to other domains and tasks.


Writing Lyrics with AI

#artificialintelligence

To build our lyric-generating pipeline, we'll be using: - Genius.com: A service to grab data for existing lyrics. API. - GPT-2: A transformer-based NLP model, trained on 8 million web pages. We'll need to set up our environment and install a two pip packages: GPT-2 has an excellent knowledge of English out-of-the-box; but we want our lyrics to be stylized and poetic. To do this, we need to show it some lyrics for inspiration.


The rise of the Demigod designer

#artificialintelligence

I'm talking about language models like OpenAI's GPT-3 that enabled Jordan Singer to build a Figma Plug in that creates websites just by describing their functionality. He asked for "An app that has a navigation bar with a camera icon, "Photos" title, and a message icon. A feed of photos with each photo having a user icon, a photo, a heart icon and a chat bubble icon". Or Sharif Shameem that built a "layout generator" that immediately exports the layout as code. All you have to do is to say what you want and the plug in will execute it.


Real-world AI assistant: Google combines a large language model with an everyday robot

#artificialintelligence

In the PaLM-SayCan project, Google is combining current robotics technology with advances in large language models. Advances in large-scale AI language models have so far mainly arrived in our digital lives, such as text translation, text and image generation, or behind the scenes, when tech platforms use language AI to moderate the content. In the PaLM-SayCan project, various Google divisions are now combining the company's most advanced large-scale speech model to date with an everyday robot that could one day help in the home – an assistant for the real world. But that will take a while yet. Google unveiled the giant AI language model PaLM in early April, crediting the model with "breakthrough capabilities" in language understanding and, specifically, reasoning. PaLM stands for "Pathways Language Model" – making it a building block in Google's grand Pathways AI strategy for next-generation AI that can efficiently handle thousands or millions of tasks.


MulZDG: Multilingual Code-Switching Framework for Zero-shot Dialogue Generation

arXiv.org Artificial Intelligence

Building dialogue generation systems in a zero-shot scenario remains a huge challenge, since the typical zero-shot approaches in dialogue generation rely heavily on large-scale pre-trained language generation models such as GPT-3 and T5. The research on zero-shot dialogue generation without cumbersome language models is limited due to lacking corresponding parallel dialogue corpora. In this paper, we propose a simple but effective Multilingual learning framework for Zero-shot Dialogue Generation (dubbed as MulZDG) that can effectively transfer knowledge from an English corpus with large-scale training samples to a non-English corpus with zero samples. Besides, MulZDG can be viewed as a multilingual data augmentation method to improve the performance of the resource-rich language. First, we construct multilingual code-switching dialogue datasets via translation utterances randomly selected from monolingual English datasets. Then we employ MulZDG to train a unified multilingual dialogue model based on the code-switching datasets. The MulZDG can conduct implicit semantic alignment between different languages. Experiments on DailyDialog and DSTC7 datasets demonstrate that MulZDG not only achieve competitive performance under zero-shot case compared to training with sufficient examples but also greatly improve the performance of the source language.


Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems

arXiv.org Artificial Intelligence

Large-scale language models (LLMs) such as GPT-2, BERT and RoBERTa have been successfully applied to ASR N-best rescoring. However, whether or how they can benefit competitive, near state-of-the-art ASR systems remains unexplored. In this study, we incorporate LLM rescoring into one of the most competitive ASR baselines: the Conformer-Transducer model. We demonstrate that consistent improvement is achieved by the LLM's bidirectionality, pretraining, in-domain finetuning and context augmentation. Furthermore, our lexical analysis sheds light on how each of these components may be contributing to the ASR performance.


Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

arXiv.org Artificial Intelligence

Multi-modal learning from video data has seen increased attention recently as it allows to train semantically meaningful embeddings without human annotation enabling tasks like zero-shot retrieval and classification. In this work, we present a multi-modal, modality agnostic fusion transformer approach that learns to exchange information between multiple modalities, such as video, audio, and text, and integrate them into a joined multi-modal representation to obtain an embedding that aggregates multi-modal temporal information. We propose to train the system with a combinatorial loss on everything at once, single modalities as well as pairs of modalities, explicitly leaving out any add-ons such as position or modality encoding. At test time, the resulting model can process and fuse any number of input modalities. Moreover, the implicit properties of the transformer allow to process inputs of different lengths. To evaluate the proposed approach, we train the model on the large scale HowTo100M dataset and evaluate the resulting embedding space on four challenging benchmark datasets obtaining state-of-the-art results in zero-shot video retrieval and zero-shot video action localization.


NVIDIA AI Platform Delivers Big Gains for Large Language Models

#artificialintelligence

As the size and complexity of large language models (LLMs) continue to grow, NVIDIA is today announcing updates to the NeMo Megatron framework that provide training speed-ups of up to 30%. These updates–which include two trailblazing techniques and a hyperparameter tool to optimize and scale training of LLMs on any number of GPUs–offer new capabilities to train and deploy models using the NVIDIA AI platform. BLOOM, the world's largest open-science, open-access multilingual language model, with 176 billion parameters, was recently trained on the NVIDIA AI platform, enabling text generation in 46 languages and 13 programming languages. The NVIDIA AI platform has also powered one of the most powerful transformer language models, with 530 billion parameters, Megatron-Turing NLG model (MT-NLG). LLMs are one of today's most important advanced technologies, involving up to trillions of parameters that learn from text.