Large Language Model
What Matters In The Structured Pruning of Generative Language Models?
Santacroce, Michael, Wen, Zixin, Shen, Yelong, Li, Yuanzhi
Auto-regressive large language models such as GPT-3 require enormous computational resources to use. Traditionally, structured pruning methods are employed to reduce resource usage. However, their application to and efficacy for generative language models is heavily under-explored. In this paper we conduct an comprehensive evaluation of common structured pruning methods, including magnitude, random, and movement pruning on the feed-forward layers in GPT-type models. Unexpectedly, random pruning results in performance that is comparable to the best established methods, across multiple natural language generation tasks. To understand these results, we provide a framework for measuring neuron-level redundancy of models pruned by different methods, and discover that established structured pruning methods do not take into account the distinctiveness of neurons, leaving behind excess redundancies. In view of this, we introduce Globally Unique Movement (GUM) to improve the uniqueness of neurons in pruned models. We then discuss the effects of our techniques on different redundancy metrics to explain the improved performance.
Controlling Personality Style in Dialogue with Zero-Shot Prompt-Based Learning
Ramirez, Angela, Alsalihy, Mamon, Aggarwal, Kartik, Li, Cecilia, Wu, Liren, Walker, Marilyn
Prompt-based or in-context learning has achieved high zero-shot performance on many natural language generation (NLG) tasks. Here we explore the performance of prompt-based learning for simultaneously controlling the personality and the semantic accuracy of an NLG for task-oriented dialogue. We experiment with prompt-based learning on the PERSONAGE restaurant recommendation corpus to generate semantically and stylistically-controlled text for 5 different Big-5 personality types: agreeable, disagreeable, conscientious, unconscientious, and extravert. We test two different classes of discrete prompts to generate utterances for a particular personality style: (1) prompts that demonstrate generating directly from a meaning representation that includes a personality specification; and (2) prompts that rely on first converting the meaning representation to a textual pseudo-reference, and then using the pseudo-reference in a textual style transfer (TST) prompt. In each case, we show that we can vastly improve performance by over-generating outputs and ranking them, testing several ranking functions based on automatic metrics for semantic accuracy, personality-match, and fluency. We also test whether NLG personality demonstrations from the restaurant domain can be used with meaning representations for the video game domain to generate personality stylized utterances about video games. Our findings show that the TST prompts produces the highest semantic accuracy (78.46% for restaurants and 87.6% for video games) and personality accuracy (100% for restaurants and 97% for video games). Our results on transferring personality style to video game utterances are surprisingly good. To our knowledge, there is no previous work testing the application of prompt-based learning to simultaneously controlling both style and semantic accuracy in NLG.
Scaling Back-Translation with Domain Text Generation for Sign Language Gloss Translation
Ye, Jinhui, Jiao, Wenxiang, Wang, Xing, Tu, Zhaopeng
Sign language gloss translation aims to translate the sign glosses into spoken language texts, which is challenging due to the scarcity of labeled gloss-text parallel data. Back translation (BT), which generates pseudo-parallel data by translating in-domain spoken language texts into sign glosses, has been applied to alleviate the data scarcity problem. However, the lack of large-scale high-quality domain spoken language text data limits the effect of BT. In this paper, to overcome the limitation, we propose a Prompt based domain text Generation (PGEN) approach to produce the large-scale in-domain spoken language text data. Specifically, PGEN randomly concatenates sentences from the original in-domain spoken language text data as prompts to induce a pre-trained language model (i.e., GPT-2) to generate spoken language texts in a similar style. Experimental results on three benchmarks of sign language gloss translation in varied languages demonstrate that BT with spoken language texts generated by PGEN significantly outperforms the compared methods. In addition, as the scale of spoken language texts generated by PGEN increases, the BT technique can achieve further improvements, demonstrating the effectiveness of our approach. We release the code and data for facilitating future research in this field.
Zero-shot Active Visual Search (ZAVIS): Intelligent Object Search for Robotic Assistants
Park, Jeongeun, Yoon, Taerim, Hong, Jejoon, Yu, Youngjae, Pan, Matthew, Choi, Sungjoon
In this paper, we focus on the problem of efficiently locating a target object described with free-form language using a mobile robot equipped with vision sensors (e.g., an RGBD camera). Conventional active visual search predefines a set of objects to search for, rendering these techniques restrictive in practice. To provide added flexibility in active visual searching, we propose a system where a user can enter target commands using free-form language; we call this system Active Visual Search in the Wild (AVSW). AVSW detects and plans to search for a target object inputted by a user through a semantic grid map represented by static landmarks (e.g., desk or bed). For efficient planning of object search patterns, AVSW considers commonsense knowledge-based co-occurrence and predictive uncertainty while deciding which landmarks to visit first. We validate the proposed method with respect to SR (success rate) and SPL (success weighted by path length) in both simulated and real-world environments. The proposed method outperforms previous methods in terms of SPL in simulated scenarios with an average gap of 0.283. We further demonstrate AVSW with a Pioneer-3AT robot in real-world studies.
Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling
Neural Processes (NPs) are a popular class of approaches for meta-learning. Similar to Gaussian Processes (GPs), NPs define distributions over functions and can estimate uncertainty in their predictions. However, unlike GPs, NPs and their variants suffer from underfitting and often have intractable likelihoods, which limit their applications in sequential decision making. We propose Transformer Neural Processes (TNPs), a new member of the NP family that casts uncertainty-aware meta learning as a sequence modeling problem. We learn TNPs via an autoregressive likelihood-based objective and instantiate it with a novel transformer-based architecture. The model architecture respects the inductive biases inherent to the problem structure, such as invariance to the observed data points and equivariance to the unobserved points. We further investigate knobs within the TNP framework that tradeoff expressivity of the decoding distribution with extra computation. Empirically, we show that TNPs achieve state-of-the-art performance on various benchmark problems, outperforming all previous NP variants on meta regression, image completion, contextual multi-armed bandits, and Bayesian optimization.
Google's conversational AI service 'Bard' to challenge ChatGPT
Conversational AI services like ChatGPT are absolutely dominating the tech news these days, and for good reason. Whether you want answers to a medical question or a serviceable 800-word essay on existentialism, the new breed of AI services will suck in information from the internet at large, and spit back content that's remarkably human-ish. Google has been lagging behind the upstart AI platforms--if only in terms of recent press coverage. But today Google made a big effort to catch up with the larger AI conversation, officially announcing a new AI platform called Bard in a blog post by company CEO, Sundar Pichai. Much like ChatGPT, Bard is designed to produce detailed answers to questions both large and small.
Listen to AI-generated Donald Trump read 'The Three Little Pigs'
Sound clips of Donald Trump reading the'Three Little Pigs' nursery rhyme aloud and Tom Hanks reciting Pulp Fiction's'Ezekiel 25:17' may sound realistic, but they were generated by artificial intelligence. A developer created a tool, dubbed Tortoise TTS (Text-to-Speech), capable of replicating a person's voice after analyzing 20 seconds of an audio clip with them speaking. Shashank Jain, the creator of Tortoise TTS, said his main idea was to create a tool that allows us to generate podcasts based on text. 'With the arrival of ChatGPT, we can generate conversations in the format we want, provide the feed to the tool I created and outcomes a podcast between two speakers of our choice,' he told DailyMail.com. The sound clips were created with a text-to-speech AI developed by Shashank Jain, who said it was designed to generate podcasts.
Microsoft announces surprise event for tomorrow with Bing ChatGPT expected - The Verge
The invite says Microsoft CEO Satya Nadella will "share some progress on a few exciting projects," so expect a number of important announcements. The invite comes just days after Microsoft extended its OpenAI partnership in a $10 billion deal that will see it become the exclusive cloud partner for OpenAI. Microsoft's cloud services will power all OpenAI workloads across products, API services, and research.
Google Opens ChatGPT Rival Bard for Testing, as AI War Heats Up
Google is rolling out a new conversational artificial-intelligence service to a select set of testers, and plans a broader public launch in coming weeks, part of the company's effort to play catch-up with challengers such as OpenAI, creator of the popular chatbot ChatGPT. The new experimental service, called Bard, generates textual responses to questions posed by users, based on information drawn from the web, Sundar Pichai, chief executive of Google parent Alphabet Inc., said in a blog post published Monday.
Google answers ChatGPT with its own chatbot
Google has been at the forefront of AI research for years, scooping up many of the field's brightest scientists and using the tech to improve the quality of language translation, search results and a host of other technologies the company uses. But over the last six months, smaller companies like OpenAI have captured more attention -- and venture capital investment -- by making tools like AI image- and text-generators directly available to the public. That's at odds with the Big Tech companies' generally more cautious approaches, which have been shaped by earlier public relations disasters, such as chatbots that spouted racism and hate speech, or a Google project to build image recognition software for the military that spurred an employee revolt.