Goto

Collaborating Authors

 Large Language Model


Full Parameter Fine-tuning for Large Language Models with Limited Resources

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have revolutionized Natural Language Processing (NLP) but demand massive GPU resources for training. Lowering the threshold for LLMs training would encourage greater participation from researchers, benefiting both academia and society. While existing approaches have focused on parameter-efficient fine-tuning, which tunes or adds a small number of parameters, few have addressed the challenge of tuning the full parameters of LLMs with limited resources. In this work, we propose a new optimizer, LOw-Memory Optimization (LOMO), which fuses the gradient computation and the parameter update in one step to reduce memory usage. By integrating LOMO with existing memory saving techniques, we reduce memory usage to 10.8% compared to the standard approach (DeepSpeed solution). Consequently, our approach enables the full parameter fine-tuning of a 65B model on a single machine with 8 RTX 3090, each with 24GB memory.


Inspire creativity with ORIBA: Transform Artists' Original Characters into Chatbots through Large Language Model

arXiv.org Artificial Intelligence

This research delves into the intersection of illustration art and artificial intelligence (AI), focusing on how illustrators engage with AI agents that embody their original characters (OCs). We introduce 'ORIBA', a customizable AI chatbot that enables illustrators to converse with their OCs. This approach allows artists to not only receive responses from their OCs but also to observe their inner monologues and behavior. Despite the existing tension between artists and AI, our study explores innovative collaboration methods that are inspiring to illustrators. By examining the impact of AI on the creative process and the boundaries of authorship, we aim to enhance human-AI interactions in creative fields, with potential applications extending beyond illustration to interactive storytelling and more.


h2oGPT: Democratizing Large Language Models

arXiv.org Artificial Intelligence

Applications built on top of Large Language Models (LLMs) such as GPT-4 represent a revolution in AI due to their human-level capabilities in natural language processing. However, they also pose many significant risks such as the presence of biased, private, or harmful text, and the unauthorized inclusion of copyrighted material. We introduce h2oGPT, a suite of open-source code repositories for the creation and use of LLMs based on Generative Pretrained Transformers (GPTs). The goal of this project is to create the world's best truly open-source alternative to closed-source approaches. In collaboration with and as part of the incredible and unstoppable open-source community, we open-source several fine-tuned h2oGPT models from 7 to 40 Billion parameters, ready for commercial use under fully permissive Apache 2.0 licenses. Included in our release is 100\% private document search using natural language. Open-source language models help boost AI development and make it more accessible and trustworthy. They lower entry hurdles, allowing people and groups to tailor these models to their needs. This openness increases innovation, transparency, and fairness. An open-source strategy is needed to share AI benefits fairly, and H2O.ai will continue to democratize AI and LLMs.


Large Language Models Sometimes Generate Purely Negatively-Reinforced Text

arXiv.org Artificial Intelligence

When using adversarial training, it is common practice to train against the most egregious failures. However, this might imply using examples with sensitive information (such as leaked passwords or security vulnerabilities) as training data. One might assume that language models trained with gradient descent never generate text snippets which were only present in examples associated with the lowest possible reward. In this paper, we show that this assumption is wrong: in some situations, large language models do learn from such negatively-reinforced examples. We present a specific training setup that enables Pythia-160M to guess passwords 13% more often than it would by guessing randomly, despite only showing it these passwords on examples where the model is incentivized to not output these passwords. Our code is available at www.github.com/FabienRoger/Learning-From-Negative-Examples


Exploring the Viability of Synthetic Query Generation for Relevance Prediction

arXiv.org Artificial Intelligence

Query-document relevance prediction is a critical problem in Information Retrieval systems. This problem has increasingly been tackled using (pretrained) transformer-based models which are finetuned using large collections of labeled data. However, in specialized domains such as e-commerce and healthcare, the viability of this approach is limited by the dearth of large in-domain data. To address this paucity, recent methods leverage these powerful models to generate high-quality task and domain-specific synthetic data. Prior work has largely explored synthetic data generation or query generation (QGen) for Question-Answering (QA) and binary (yes/no) relevance prediction, where for instance, the QGen models are given a document, and trained to generate a query relevant to that document. However in many problems, we have a more fine-grained notion of relevance than a simple yes/no label. Thus, in this work, we conduct a detailed study into how QGen approaches can be leveraged for nuanced relevance prediction. We demonstrate that -- contrary to claims from prior works -- current QGen approaches fall short of the more conventional cross-domain transfer-learning approaches. Via empirical studies spanning 3 public e-commerce benchmarks, we identify new shortcomings of existing QGen approaches -- including their inability to distinguish between different grades of relevance. To address this, we introduce label-conditioned QGen models which incorporates knowledge about the different relevance. While our experiments demonstrate that these modifications help improve performance of QGen techniques, we also find that QGen approaches struggle to capture the full nuance of the relevance label space and as a result the generated queries are not faithful to the desired relevance label.


"When Words Fail, Emojis Prevail": Generating Sarcastic Utterances with Emoji Using Valence Reversal and Semantic Incongruity

arXiv.org Artificial Intelligence

Sarcasm is a form of figurative language that serves as a humorous tool for mockery and ridicule. We present a novel architecture for sarcasm generation with emoji from a non-sarcastic input sentence in English. We divide the generation task into two sub tasks: one for generating textual sarcasm and another for collecting emojis associated with those sarcastic sentences. Two key elements of sarcasm are incorporated into the textual sarcasm generation task: valence reversal and semantic incongruity with context, where the context may involve shared commonsense or general knowledge between the speaker and their audience. The majority of existing sarcasm generation works have focused on this textual form. However, in the real world, when written texts fall short of effectively capturing the emotional cues of spoken and face-to-face communication, people often opt for emojis to accurately express their emotions. Due to the wide range of applications of emojis, incorporating appropriate emojis to generate textual sarcastic sentences helps advance sarcasm generation. We conclude our study by evaluating the generated sarcastic sentences using human judgement. All the codes and data used in this study has been made publicly available.


Enhancing Activity Prediction Models in Drug Discovery with the Ability to Understand Human Language

arXiv.org Artificial Intelligence

Activity and property prediction models are the central workhorses in drug discovery and materials sciences, but currently they have to be trained or fine-tuned for new tasks. Without training or fine-tuning, scientific language models could be used for such low-data tasks through their announced zero- and few-shot capabilities. However, their predictive quality at activity prediction is lacking. In this work, we envision a novel type of activity prediction model that is able to adapt to new prediction tasks at inference time, via understanding textual information describing the task. To this end, we propose a new architecture with separate modules for chemical and natural language inputs, and a contrastive pre-training objective on data from large biochemical databases. In extensive experiments, we show that our method CLAMP yields improved predictive performance on few-shot learning benchmarks and zero-shot problems in drug discovery. We attribute the advances of our method to the modularized architecture and to our pre-training objective.


Mercedes-Benz is adding ChatGPT to its cars... right now

FOX News

Mercedes-Benz vehicles are known for their quiet cabins, but things are going to get a little louder in them soon. The luxury automaker has announced that it is launching a software update that will bring ChatGPT into its vehicles through a collaboration with the Microsoft Azure OpenAI Service., starting on June 16. The feature will be integrated into the MBUX infotainment system, which already offers a wide array of voice commands through the "Hey, Mercedes" voice assistant feature. ChatGPT will allow occupants to have "conversations with natural dialogues and follow-up questions" with the generative artificial intelligence platform. A beta version of Mercedes-Benz's ChatGPT voice assistant is launching on June 16.


Mercedes tries putting ChatGPT in your car

Engadget

Mercedes-Benz is putting ChatGPT on the road. The automaker is using Microsoft's Azure OpenAI Service to bring the viral natural-language model to its in-car voice assistant. It will initially be available in a three-month beta program for US customers in select vehicles, but Mercedes says it will consider a broader and more permanent rollout in the future. ChatGPT integration could put the automaker's "Hey Mercedes" voice assistant on steroids. Rather than merely answering simple and pre-programmed commands like "Turn up the heat" or "What's the forecast," it can carry natural conversations about virtually any topic, including contextual follow-up questions.


Good News! China and the US Are Talking About AI Dangers

WIRED

Sam Altman, the CEO of OpenAI, recently said that China should play a key role in shaping the guardrails that are placed around the technology. "China has some of the best AI talent in the world," Altman said during a talk at the Beijing Academy of Artificial Intelligence (BAAI) last week. "Solving alignment for advanced AI systems requires some of the best minds from around the world--and so I really hope that Chinese AI researchers will make great contributions here." Altman is in a good position to opine on these issues. His company is behind ChatGPT, the chatbot that's shown the world how rapidly AI capabilities are progressing.