Large Language Model
Breaking Down AutoGPT: What It Is, Its Features, Limitations, Artificial General Intelligence (AGI) And Impact of Autonomous Agents on Generative AI - MarkTechPost
Introduction Generative AI is evolving and getting popular. Since its introduction, new models and research papers are getting released almost every other day. The major reason for the exponentially increasing popularity is the development of Large Language Models. LLMs, the Artificial Intelligence models that are designed to process natural language and generate human-like responses, are trending. The best example is OpenAI's ChatGPT, the well-known chatbot that does everything from content generation and code completion to question answering, just like a human. Even OpenAI's DALL-E and Google's BERT have contributed to making significant advances in recent times. What is AutoGPT? Recently,
EU lawmakers call for summit to control 'very powerful' AI
April 17 (Reuters) - EU lawmakers urged world leaders on Monday to hold a summit to find ways to control the development of advanced artificial intelligence (AI) systems such as ChatGPT, saying they were developing faster than expected. The 12 MEPs, all working on EU legislation on the technology, called on U.S. President Joe Biden and European Commission President Ursula von der Leyen to convene the meeting, and said AI firms should be more responsible. The statement came weeks after Twitter owner Elon Musk and more than 1,000 technology figures demanded a six-month pause in the development of systems more powerful than Microsoft-backed (MSFT.O) OpenAI's latest iteration of ChatGPT, which can mimic humans and create text and images based on prompts. That open letter, published in March by the Future of Life Institute (FLI), had warned that AI could spread misinformation at an unprecedented rate, and that machines could "outnumber, outsmart, obsolete and replace" humans, if left unchecked. The MEPS said they disagreed with some of the FLI message's "more alarmist statements".
Exploring the Trade-Offs: Unified Large Language Models vs Local Fine-Tuned Models for Highly-Specific Radiology NLI Task
Wu, Zihao, Zhang, Lu, Cao, Chao, Yu, Xiaowei, Dai, Haixing, Ma, Chong, Liu, Zhengliang, Zhao, Lin, Li, Gang, Liu, Wei, Li, Quanzheng, Shen, Dinggang, Li, Xiang, Zhu, Dajiang, Liu, Tianming
Recently, ChatGPT and GPT-4 have emerged and gained immense global attention due to their unparalleled performance in language processing. Despite demonstrating impressive capability in various open-domain tasks, their adequacy in highly specific fields like radiology remains untested. Radiology presents unique linguistic phenomena distinct from open-domain data due to its specificity and complexity. Assessing the performance of large language models (LLMs) in such specific domains is crucial not only for a thorough evaluation of their overall performance but also for providing valuable insights into future model design directions: whether model design should be generic or domain-specific. To this end, in this study, we evaluate the performance of ChatGPT/GPT-4 on a radiology NLI task and compare it to other models fine-tuned specifically on task-related data samples. We also conduct a comprehensive investigation on ChatGPT/GPT-4's reasoning ability by introducing varying levels of inference difficulty. Our results show that 1) GPT-4 outperforms ChatGPT in the radiology NLI task; 2) other specifically fine-tuned models require significant amounts of data samples to achieve comparable performance to ChatGPT/GPT-4. These findings demonstrate that constructing a generic model that is capable of solving various tasks across different domains is feasible.
The Unintended Consequences of Censoring Digital Technology -- Evidence from Italy's ChatGPT Ban
Kreitmeir, David H., Raschky, Paul A.
We first compile data on the hourly coding output of over 8,000 professional GitHub users in Italy and other European countries to analyse the impact of the ban on individual productivity. Combining the high-frequency data with the sudden announcement of the ban in a difference-in-differences framework, we find that the output of Italian developers decreased by around 50% in the first two business days after the ban and recovered after that. Applying a synthetic control approach to daily Google search and Tor usage data shows that the ban led to a significant increase in the use of censorship bypassing tools. Our findings show that users swiftly implement strategies to bypass Internet restrictions but this adaptation activity creates short-term disruptions and hampers productivity.
Promptify: Text-to-Image Generation through Interactive Prompt Exploration with Large Language Models
Brade, Stephen, Wang, Bryan, Sousa, Mauricio, Oore, Sageev, Grossman, Tovi
Text-to-image generative models have demonstrated remarkable capabilities in generating high-quality images based on textual prompts. However, crafting prompts that accurately capture the user's creative intent remains challenging. It often involves laborious trial-and-error procedures to ensure that the model interprets the prompts in alignment with the user's intention. To address the challenges, we present Promptify, an interactive system that supports prompt exploration and refinement for text-to-image generative models. Promptify utilizes a suggestion engine powered by large language models to help users quickly explore and craft diverse prompts. Our interface allows users to organize the generated images flexibly, and based on their preferences, Promptify suggests potential changes to the original prompt. This feedback loop enables users to iteratively refine their prompts and enhance desired features while avoiding unwanted ones. Our user study shows that Promptify effectively facilitates the text-to-image workflow and outperforms an existing baseline tool widely used for text-to-image generation.
Safer Conversational AI as a Source of User Delight
Lu, Xiaoding, Korshuk, Aleksey, Liu, Zongyi, Beauchamp, William, Research, Chai
This work explores the impact of moderation on users' enjoyment of conversational AI systems. While recent advancements in Large Language Models (LLMs) have led to highly capable conversational AIs that are increasingly deployed in real-world settings, there is a growing concern over AI safety and the need to moderate systems to encourage safe language and prevent harm. However, some users argue that current approaches to moderation limit the technology, compromise free expression, and limit the value delivered by the technology. This study takes an unbiased stance and shows that moderation does not necessarily detract from user enjoyment. Heavy handed moderation does seem to have a nefarious effect, but models that are moderated to be safer can lead to a better user experience. By deploying various conversational AIs in the Chai platform, the study finds that user retention can increase with a level of moderation and safe system design. These results demonstrate the importance of appropriately defining safety in models in a way that is both responsible and focused on serving users.
Large Language Models in the Workplace: A Case Study on Prompt Engineering for Job Type Classification
Clavié, Benjamin, Ciceu, Alexandru, Naylor, Frederick, Soulié, Guillaume, Brightwell, Thomas
This case study investigates the task of job classification in a real-world setting, where the goal is to determine whether an English-language job posting is appropriate for a graduate or entry-level position. We explore multiple approaches to text classification, including supervised approaches such as traditional models like Support Vector Machines (SVMs) and state-of-the-art deep learning methods such as DeBERTa. We compare them with Large Language Models (LLMs) used in both few-shot and zero-shot classification settings. To accomplish this task, we employ prompt engineering, a technique that involves designing prompts to guide the LLMs towards the desired output. Specifically, we evaluate the performance of two commercially available state-of-the-art GPT-3.5-based language models, text-davinci-003 and gpt-3.5-turbo. We also conduct a detailed analysis of the impact of different aspects of prompt engineering on the model's performance. Our results show that, with a well-designed prompt, a zero-shot gpt-3.5-turbo classifier outperforms all other models, achieving a 6% increase in Precision@95% Recall compared to the best supervised approach. Furthermore, we observe that the wording of the prompt is a critical factor in eliciting the appropriate "reasoning" in the model, and that seemingly minor aspects of the prompt significantly affect the model's performance.
Contrastive language and vision learning of general fashion concepts
Chia, Patrick John, Attanasio, Giuseppe, Bianchi, Federico, Terragni, Silvia, Magalhães, Ana Rita, Goncalves, Diogo, Greco, Ciro, Tagliabue, Jacopo
The model is trained on over 700k The extraordinary growth of online retail - as < image, text > pairs from the inventory of of 2020, 4 trillion dollars per year (Cramer-Flood, Farfetch, one of the largest fashion luxury retailer 2020) - had a profound impact on the fashion industry, in the world, and is applied to use cases with 1 out of 4 transactions now happening online known to be crucial in a vast global market; (McKinsey, 2019). The combination of large amounts of data and variety of use cases supported 2. we evaluate FashionCLIP in a variety of by growing investments has made e-commerce fertile tasks, showing that fine-tuning helps capture for the application of cutting-edge machine domain-specific concepts and generalize them learning models, with NLP involved in recommendations in zero-shot scenarios; we supplement quantitative (de Souza Pereira Moreira et al., 2019; Guo tests with qualitative analyses, and et al., 2020; Goncalves et al., 2021), information offer preliminary insights of how concepts retrieval (IR) (Ai and Narayanan.R, 2021), product grounded in a visual space unlock linguistic
Towards Designing a ChatGPT Conversational Companion for Elderly People
Alessa, Abeer, Al-Khalifa, Hend
Loneliness and social isolation are serious and widespread problems among older people, affecting their physical and mental health, quality of life, and longevity. In this paper, we propose a ChatGPT-based conversational companion system for elderly people. The system is designed to provide companionship and help reduce feelings of loneliness and social isolation. The system was evaluated with a preliminary study. The results showed that the system was able to generate responses that were relevant to the created elderly personas. However, it is essential to acknowledge the limitations of ChatGPT, such as potential biases and misinformation, and to consider the ethical implications of using AI-based companionship for the elderly, including privacy concerns.
Towards Zero-Shot Personalized Table-to-Text Generation with Contrastive Persona Distillation
Zhan, Haolan, Lin, Xuming, Cui, Shaobo, Zhao, Zhongzhou, Zhou, Wei, Chen, Haiqing
Existing neural methods have shown great potentials towards generating informative text from structured tabular data as well as maintaining high content fidelity. However, few of them shed light on generating personalized expressions, which often requires well-aligned persona-table-text datasets that are difficult to obtain. To overcome these obstacles, we explore personalized table-to-text generation under a zero-shot setting, by assuming no well-aligned persona-table-text triples are required during training. To this end, we firstly collect a set of unpaired persona information and then propose a semi-supervised approach with contrastive persona distillation (S2P-CPD) to generate personalized context. Specifically, tabular data and persona information are firstly represented as latent variables separately. Then, we devise a latent space fusion technique to distill persona information into the table representation. Besides, a contrastive-based discriminator is employed to guarantee the style consistency between the generated context and its corresponding persona. Experimental results on two benchmarks demonstrate S2P-CPD's ability on keeping both content fidelity and personalized expressions.