Large Language Model
Could ChatGPT herald the next stage for CX AI adoption? - UK News Group
Over the last few weeks, you would have heard lots of noise about ChatGPT, the new model for conversational AI that was launched by OpenAI โ the AI research and deployment company โ at the end of November. What is particularly striking about ChatGPT is that it took just five days to reach one million signed-up users, and it's estimated that figure may already be over two million. In comparison, Instagram took three months to reach that number, Spotify five months, and Twitter two years. Why is it clearly capturing so much attention? And what's going on in the AI market, when just last month some commentators were questioning chatbot sector momentum โ particularly with Amazon stripping costs and people out of its Alexa team?
"It's Not Possible for Me to Feel or Be Creepy": An Interview with ChatGPT
Between Christmas and New Year's, my family took a six-hour drive to Vermont. I drove; my wife and two children sat in the back seat. Our children are five and two--too old to be hypnotized by a rattle or a fidget spinner, too young to entertain themselves--so a six-hour drive amounted to an hour of napping, an hour of free association and sing-alongs, and four hours of desperation. We offered the kids an episode of their favorite storytelling podcast, but they weren't in the mood for something prerecorded. They wanted us to invent a new story, on the spot, tailored to their interests.
Billionaire Bill Gates Touts AI Advancements In Office Efficiency - AI Summary
In an interview with the German-language business paper, Handelsblatt, Microsoft founder-turned-philanthropist Bill Gates said that improvements in artificial intelligence are the "most important" innovation at the moment. Gates said that the applications of generative AI like OpenAI's ChatGPT could improve office efficiency, drafting invoices and letters. Microsoft announced this week that its Bing search engine will be powered in part by ChatGPT AI technology. Google also recently announced Bard, its ChatGPT competitor. Microsoft co-founder Bill Gates thinks AI software like ChatGPT is the most important innovation right now, and could change health care and education for good.
ChatGPT: implications for the legal world - Internet for Lawyers Newsletter
Chatbots have been around since the 1960s and coders have been trying to pass the Turing test ever since, creating increasingly sophisticated iterations of natural language processing (NLP) software. A recent episode, where a Google engineer was sacked for claiming that the search engine's chatbot generator software known as LaMDA was sentient, perhaps demonstrates the leaps and bounds that NLP has made over the past few years. However, it's only with the public release of a new chatbot called ChatGPT that the potential of NLP has been taken seriously by the wider public. ChatGPT is a conversational piece of software released by OpenAI, designed to answer questions posed in natural language and even have a dialogue with users. It has been trained on a multitude of online data from Wikipedia to Reddit, although the information is only correct up until 2021. As well as answering general queries and therefore being a potential threat to Google, it also has the ability to write bespoke articles on any topic which is sparking off existential debates amongst academics and professional writers.
ChatGPT frenzy prompts China firms to seek home-grown alternatives
HONG KONG โ Microsoft-backed OpenAI has kept its hit ChatGPT app off-limits to users in China but the app is attracting huge interest in the country, with firms rushing to integrate the technology into their products and launch rival solutions. While residents in China are unable to create OpenAI accounts to access the artificial intelligence-powered (AI) chatbot, virtual private networks and foreign phone numbers are helping some bypass those restrictions. At the same time, the OpenAI models behind the ChatGPT program -- which can write essays, recipes and complex computer code -- are relatively accessible in China and increasingly being incorporated into Chinese consumer technology applications, from social networks to online shopping. This could be due to a conflict with your ad-blocking or security software. Please add japantimes.co.jp and piano.io to your list of allowed sites. If this does not resolve the issue or you are unable to add the domains to your allowlist, please see this FAQ.
Implications of the Convergence of Language and Vision Model Geometries
Li, Jiaang, Kementchedjhieva, Yova, Sรธgaard, Anders
Large-scale pretrained language models (LMs) are said to ``lack the ability to connect [their] utterances to the world'' (Bender and Koller, 2020). If so, we would expect LM representations to be unrelated to representations in computer vision models. To investigate this, we present an empirical evaluation across three different LMs (BERT, GPT2, and OPT) and three computer vision models (VMs, including ResNet, SegFormer, and MAE). Our experiments show that LMs converge towards representations that are partially isomorphic to those of VMs, with dispersion, and polysemy both factoring into the alignability of vision and language spaces. We discuss the implications of this finding.
Targeted Attack on GPT-Neo for the SATML Language Model Data Extraction Challenge
Al-Kaswan, Ali, Izadi, Maliheh, van Deursen, Arie
Previous work has shown that Large Language Models are susceptible to so-called data extraction attacks. This allows an attacker to extract a sample that was contained in the training data, which has massive privacy implications. The construction of data extraction attacks is challenging, current attacks are quite inefficient, and there exists a significant gap in the extraction capabilities of untargeted attacks and memorization. Thus, targeted attacks are proposed, which identify if a given sample from the training data, is extractable from a model. In this work, we apply a targeted data extraction attack to the SATML2023 Language Model Training Data Extraction Challenge. We apply a two-step approach. In the first step, we maximise the recall of the model and are able to extract the suffix for 69% of the samples. In the second step, we use a classifier-based Membership Inference Attack on the generations. Our AutoSklearn classifier achieves a precision of 0.841. The full approach reaches a score of 0.405 recall at a 10% false positive rate, which is an improvement of 34% over the baseline of 0.301.
GPTScore: Evaluate as You Desire
Fu, Jinlan, Ng, See-Kiong, Jiang, Zhengbao, Liu, Pengfei
Generative Artificial Intelligence (AI) has enabled the development of sophisticated models that are capable of producing high-caliber text, images, and other outputs through the utilization of large pre-trained models. Nevertheless, assessing the quality of the generation is an even more arduous task than the generation itself, and this issue has not been given adequate consideration recently. This paper proposes a novel evaluation framework, GPTScore, which utilizes the emergent abilities (e.g., zero-shot instruction) of generative pre-trained models to score generated texts. There are 19 pre-trained models explored in this paper, ranging in size from 80M (e.g., FLAN-T5-small) to 175B (e.g., GPT3). Experimental results on four text generation tasks, 22 evaluation aspects, and corresponding 37 datasets demonstrate that this approach can effectively allow us to achieve what one desires to evaluate for texts simply by natural language instructions. This nature helps us overcome several long-standing challenges in text evaluation--how to achieve customized, multi-faceted evaluation without the need for annotated samples. We make our code publicly available at https://github.com/jinlanfu/GPTScore.
On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark)
Valmeekam, Karthik, Sreedharan, Sarath, Marquez, Matthew, Olmo, Alberto, Kambhampati, Subbarao
Intrigued by the claims of emergent reasoning capabilities in LLMs trained on general web corpora, in this paper, we set out to investigate their planning capabilities. We aim to evaluate (1) how good LLMs are by themselves in generating and validating simple plans in commonsense planning tasks (of the type that humans are generally quite good at) and (2) how good LLMs are in being a source of heuristic guidance for other agents--either AI planners or human planners--in their planning tasks. To investigate these questions in a systematic rather than anecdotal manner, we start by developing a benchmark suite based on the kinds of domains employed in the International Planning Competition. On this benchmark, we evaluate LLMs in three modes: autonomous, heuristic and human-in-the-loop. Our results show that LLM's ability to autonomously generate executable plans is quite meager, averaging only about 3% success rate. The heuristic and human-in-the-loop modes show slightly more promise. In addition to these results, we also make our benchmark and evaluation tools available to support investigations by research community.
Gradient-Based Automated Iterative Recovery for Parameter-Efficient Tuning
Mozes, Maximilian, Bolukbasi, Tolga, Yuan, Ann, Liu, Frederick, Thain, Nithum, Dixon, Lucas
Pretrained large language models (LLMs) are able to solve a wide variety of tasks through transfer learning. Various explainability methods have been developed to investigate their decision making process. TracIn (Pruthi et al., 2020) is one such gradient-based method which explains model inferences based on the influence of training examples. In this paper, we explore the use of TracIn to improve model performance in the parameter-efficient tuning (PET) setting. We develop conversational safety classifiers via the prompt-tuning PET method and show how the unique characteristics of the PET regime enable TracIn to identify the cause for certain misclassifications by LLMs. We develop a new methodology for using gradient-based explainability techniques to improve model performance, G-BAIR: gradient-based automated iterative recovery. We show that G-BAIR can recover LLM performance on benchmarks after manually corrupting training labels. This suggests that influence methods like TracIn can be used to automatically perform data cleaning, and introduces the potential for interactive debugging and relabeling for PET-based transfer learning methods.