Large Language Model
Making Language Models Better Tool Learners with Execution Feedback
Qiao, Shuofei, Gui, Honghao, Chen, Huajun, Zhang, Ningyu
Tools serve as pivotal interfaces that enable humans to understand and reshape the world. With the advent of foundational models, AI systems can utilize tools to expand their capabilities and interact with the world. Existing tool learning methodologies, encompassing supervised fine-tuning and prompt engineering approaches, often induce language models to utilize tools indiscriminately, as complex problems often exceed their own competencies. However, introducing tools for simple tasks, which the models themselves can readily resolve, can inadvertently propagate errors rather than enhance performance. This leads to the research question: can we teach language models when and how to use tools? To meet this need, we propose Tool leaRning wIth exeCution fEedback (TRICE), a two-stage end-to-end framework that enables the model to continually learn through feedback derived from tool execution, thereby learning when and how to use tools effectively. Experimental results, backed by further analysis, show that TRICE can make the language model to selectively use tools by decreasing the model's dependency on tools while enhancing the performance. Code and datasets will be available in https://github.com/zjunlp/trice.
A Study of Generative Large Language Model for Medical Research and Healthcare
Peng, Cheng, Yang, Xi, Chen, Aokun, Smith, Kaleb E, PourNejatian, Nima, Costa, Anthony B, Martin, Cheryl, Flores, Mona G, Zhang, Ying, Magoc, Tanja, Lipori, Gloria, Mitchell, Duane A, Ospina, Naykky S, Ahmed, Mustafa M, Hogan, William R, Shenkman, Elizabeth A, Guo, Yi, Bian, Jiang, Wu, Yonghui
There is enormous enthusiasm and concerns in using large language models (LLMs) in healthcare, yet current assumptions are all based on general-purpose LLMs such as ChatGPT. This study develops a clinical generative LLM, GatorTronGPT, using 277 billion words of mixed clinical and English text with a GPT-3 architecture of 20 billion parameters. GatorTronGPT improves biomedical natural language processing for medical research. Synthetic NLP models trained using GatorTronGPT generated text outperform NLP models trained using real-world clinical text. Physicians Turing test using 1 (worst) to 9 (best) scale shows that there is no significant difference in linguistic readability (p = 0.22; 6.57 of GatorTronGPT compared with 6.93 of human) and clinical relevance (p = 0.91; 7.0 of GatorTronGPT compared with 6.97 of human) and that physicians cannot differentiate them (p < 0.001). This study provides insights on the opportunities and challenges of LLMs for medical research and healthcare.
Transformer-based Vulnerability Detection in Code at EditTime: Zero-shot, Few-shot, or Fine-tuning?
Chan, Aaron, Kharkar, Anant, Moghaddam, Roshanak Zilouchian, Mohylevskyy, Yevhen, Helyar, Alec, Kamal, Eslam, Elkamhawy, Mohamed, Sundaresan, Neel
Software vulnerabilities bear enterprises significant costs. Despite extensive efforts in research and development of software vulnerability detection methods, uncaught vulnerabilities continue to put software owners and users at risk. Many current vulnerability detection methods require that code snippets can compile and build before attempting detection. This, unfortunately, introduces a long latency between the time a vulnerability is injected to the time it is removed, which can substantially increases the cost of fixing a vulnerability. We recognize that the current advances in machine learning can be used to detect vulnerable code patterns on syntactically incomplete code snippets as the developer is writing the code at EditTime. In this paper we present a practical system that leverages deep learning on a large-scale data set of vulnerable code patterns to learn complex manifestations of more than 250 vulnerability types and detect vulnerable code patterns at EditTime. We discuss zero-shot, few-shot, and fine-tuning approaches on state of the art pre-trained Large Language Models (LLMs). We show that in comparison with state of the art vulnerability detection models our approach improves the state of the art by 10%. We also evaluate our approach to detect vulnerability in auto-generated code by code LLMs. Evaluation on a benchmark of high-risk code scenarios shows a reduction of up to 90% vulnerability reduction.
Preconditioned Visual Language Inference with Weak Supervision
Qasemi, Ehsan, Maina-Kilaas, Amani R., Dash, Devadutta, Alsaggaf, Khalid, Chen, Muhao
Humans can infer the affordance of objects by extracting related contextual preconditions for each scenario. For example, upon seeing an image of a broken cup, we can infer that this precondition prevents the cup from being used for drinking. Reasoning with preconditions of commonsense is studied in NLP where the model explicitly gets the contextual precondition. However, it is unclear if SOTA visual language models (VLMs) can extract such preconditions and infer the affordance of objects with them. In this work, we introduce the task of preconditioned visual language inference and rationalization (PVLIR). We propose a learning resource based on three strategies to retrieve weak supervision signals for the task and develop a human-verified test set for evaluation. Our results reveal the shortcomings of SOTA VLM models in the task and draw a road map to address the challenges ahead in improving them.
Fairness of ChatGPT
Understanding and addressing unfairness in LLMs are crucial for responsible AI deployment. However, there is a limited availability of quantitative analyses and in-depth studies regarding fairness evaluations in LLMs, especially when applying LLMs to high-stakes fields. This work aims to fill this gap by providing a systematic evaluation of the effectiveness and fairness of LLMs using ChatGPT as a study case. We focus on assessing ChatGPT's performance in high-takes fields including education, criminology, finance and healthcare. To make thorough evaluation, we consider both group fairness and individual fairness and we also observe the disparities in ChatGPT's outputs under a set of biased or unbiased prompts. This work contributes to a deeper understanding of LLMs' fairness performance, facilitates bias mitigation and fosters the development of responsible artificial intelligence systems.
Humanoid robot funded by ChatGPT is already working as a security guard
A robot which could work as a nurse or barman, and which can pick up objects with its human-like arms is already at work in the U.S., the CEO of a company funded by OpenAI, maker of ChatGPT has revealed. Bernt Bornich, CEO and founder of 1X, says that his company's humanoid EVE robot has been working since April this year - and that it is going'better than we thought.' It's the first truly humanoid android to find a place in the workplace in human history - outpacing Elon Musk's hyped Tesla robot. At present, the robot is working as a security guard at two industrial sites: unlike other security robots, it has a head, a face, two arms, and can navigate autonomously. Security guards control a fleet of patrolling EVE androids, which are made at two sites in Norway and Dallas, and if anything happens to one of the units, they can'step into' the android's body through virtual reality. 'You're there in a second as if you were there,' Bornich says.
Supercharge Your ChatGPT Prompts With Auto-GPT
The capabilities of AI tools are progressing rapidly, with Google, Microsoft, OpenAI, and many others racing to stay ahead of the competition. It feels like advances and apps are arriving on a weekly basis, with the bar constantly being raised in terms of what AI can do for us. Auto-GPT is the latest evidence for this: It leverages the power of ChatGPT to create an autonomous AI assistant, capable of taking on tasks and projects on its own and working through multiple steps in a job without you having to prompt it every time. In other words, it does a lot of the hard work for you, without you having to come up with your own follow-up responses or ideas. Auto-GPT can be run locally on your computer. Think about everything you can do with ChatGPT, then imagine rolling that into a system that can supply its own feedback and make its own choices.
The Horrific Content a Kenyan Worker Had to See While Training ChatGPT
This article is from Big Technology, a newsletter by Alex Kantrowitz. Richard Mathenge felt he'd landed the perfect role when he started training OpenAI's GPT model in 2021. After years of working in customer service in Nairobi, Kenya, he was finally involved in something that felt meaningful and held a future for him. But the position left him scarred. For nine hours per day, five days a week, Mathenge led a team that taught the A.I. model about explicit content.
PRODIGY: Enabling In-context Learning Over Graphs
Huang, Qian, Ren, Hongyu, Chen, Peng, Krลพmanc, Gregor, Zeng, Daniel, Liang, Percy, Leskovec, Jure
In-context learning is the ability of a pretrained model to adapt to novel and diverse downstream tasks by conditioning on prompt examples, without optimizing any parameters. While large language models have demonstrated this ability, how in-context learning could be performed over graphs is unexplored. In this paper, we develop Pretraining Over Diverse In-Context Graph Systems (PRODIGY), the first pretraining framework that enables in-context learning over graphs. The key idea of our framework is to formulate in-context learning over graphs with a novel prompt graph representation, which connects prompt examples and queries. We then propose a graph neural network architecture over the prompt graph and a corresponding family of in-context pretraining objectives. With PRODIGY, the pretrained model can directly perform novel downstream classification tasks on unseen graphs via in-context learning. We provide empirical evidence of the effectiveness of our framework by showcasing its strong in-context learning performance on tasks involving citation networks and knowledge graphs. Our approach outperforms the in-context learning accuracy of contrastive pretraining baselines with hard-coded adaptation by 18% on average across all setups. Moreover, it also outperforms standard finetuning with limited data by 33% on average with in-context learning.
Generative Pre-trained Transformer: A Comprehensive Review on Enabling Technologies, Potential Applications, Emerging Challenges, and Future Directions
Yenduri, Gokul, M, Ramalingam, G, Chemmalar Selvi, Y, Supriya, Srivastava, Gautam, Maddikunta, Praveen Kumar Reddy, G, Deepti Raj, Jhaveri, Rutvij H, B, Prabadevi, Wang, Weizheng, Vasilakos, Athanasios V., Gadekallu, Thippa Reddy
The Generative Pre-trained Transformer (GPT) represents a notable breakthrough in the domain of natural language processing, which is propelling us toward the development of machines that can understand and communicate using language in a manner that closely resembles that of humans. GPT is based on the transformer architecture, a deep neural network designed for natural language processing tasks. Due to their impressive performance on natural language processing tasks and ability to effectively converse, GPT have gained significant popularity among researchers and industrial communities, making them one of the most widely used and effective models in natural language processing and related fields, which motivated to conduct this review. This review provides a detailed overview of the GPT, including its architecture, working process, training procedures, enabling technologies, and its impact on various applications. In this review, we also explored the potential challenges and limitations of a GPT. Furthermore, we discuss potential solutions and future directions. Overall, this paper aims to provide a comprehensive understanding of GPT, enabling technologies, their impact on various applications, emerging challenges, and potential solutions.