Large Language Model
Learn how to get the most out of ChatGPT
ChatGPT has revolutionized business and academia, in ways both positive and negative. While it has been somewhat controversial, it's hard to argue that it's not useful. As such, it's worth knowing how to get the most out of this AI tool. While there are many AI writing assistants out there, ChatGPT is one of the most well-known and readily accessible. With Introduction to ChatGPT, you can learn how to master this tool and earn a certification that will demonstrate your expertise to employers.
The Age of Chat
Earlier this spring, I took the bus to the Moscone Center, in downtown San Francisco, where almost thirty thousand people had gathered for the annual Game Developers Conference (G.D.C.), which I was attending as a journalist. I had spent the previous few months out on maternity leave, and I was glad to return to work, to have meetings, to temporarily exit the domestic sphere. Participating in public life felt incredible, almost psychedelic. I loved making small talk with the bus driver, and eavesdropping on strangers. "Conferences are back," I heard one man say, sombrely, to another.
MISMATCH: Fine-grained Evaluation of Machine-generated Text with Mismatch Error Types
Murugesan, Keerthiram, Swaminathan, Sarathkrishna, Dan, Soham, Chaudhury, Subhajit, Gunasekara, Chulaka, Crouse, Maxwell, Mahajan, Diwakar, Abdelaziz, Ibrahim, Fokoue, Achille, Kapanipathi, Pavan, Roukos, Salim, Gray, Alexander
With the growing interest in large language models, the need for evaluating the quality of machine text compared to reference (typically human-generated) text has become focal attention. Most recent works focus either on task-specific evaluation metrics or study the properties of machine-generated text captured by the existing metrics. In this work, we propose a new evaluation scheme to model human judgments in 7 NLP tasks, based on the fine-grained mismatches between a pair of texts. Inspired by the recent efforts in several NLP tasks for fine-grained evaluation, we introduce a set of 13 mismatch error types such as spatial/geographic errors, entity errors, etc, to guide the model for better prediction of human judgments. We propose a neural framework for evaluating machine texts that uses these mismatch error types as auxiliary tasks and re-purposes the existing single-number evaluation metrics as additional scalar features, in addition to textual features extracted from the machine and reference texts. Our experiments reveal key insights about the existing metrics via the mismatch errors. We show that the mismatch errors between the sentence pairs on the held-out datasets from 7 NLP tasks align well with the human evaluation.
Chain of Thought Prompt Tuning in Vision Language Models
Ge, Jiaxin, Luo, Hongyin, Qian, Siyuan, Gan, Yulu, Fu, Jie, Zhang, Shanghang
Language-Image Pre-training has demonstrated promising results on zero-shot and few-shot downstream tasks by prompting visual models with natural language prompts. However, most recent studies only use a single prompt for tuning, neglecting the inherent step-to-step cognitive reasoning process that humans conduct in complex task settings, for example, when processing images from unfamiliar domains. Chain of Thought is a simple and effective approximation to human reasoning process and has been proven useful for natural language processing (NLP) tasks. Based on this cognitive intuition, we believe that conducting effective reasoning is also an important problem in visual tasks, and a chain of thought could be a solution to this problem. In this work, we propose a novel chain of thought prompt tuning for vision-language modeling. Extensive experiments show that our method not only generalizes better in image classification tasks, has greater transferability beyond a single dataset, and has stronger domain generalization performance, but also performs much better in imagetext retrieval and visual question answering, which require more reasoning capabilities. We are the first to successfully adapt chain-of-thought prompting that combines visual and textual embeddings. We will release our codes
Generation of Radiology Findings in Chest X-Ray by Leveraging Collaborative Knowledge
Danu, Manuela Daniela, Marica, George, Karn, Sanjeev Kumar, Georgescu, Bogdan, Mansoor, Awais, Ghesu, Florin, Itu, Lucian Mihai, Suciu, Constantin, Grbic, Sasa, Farri, Oladimeji, Comaniciu, Dorin
Among all the sub-sections in a typical radiology report, the Clinical Indications, Findings, and Impression often reflect important details about the health status of a patient. The information included in Impression is also often covered in Findings. While Findings and Impression can be deduced by inspecting the image, Clinical Indications often require additional context. The cognitive task of interpreting medical images remains the most critical and often time-consuming step in the radiology workflow. Instead of generating an end-to-end radiology report, in this paper, we focus on generating the Findings from automated interpretation of medical images, specifically chest X-rays (CXRs). Thus, this work focuses on reducing the workload of radiologists who spend most of their time either writing or narrating the Findings. Unlike past research, which addresses radiology report generation as a single-step image captioning task, we have further taken into consideration the complexity of interpreting CXR images and propose a two-step approach: (a) detecting the regions with abnormalities in the image, and (b) generating relevant text for regions with abnormalities by employing a generative large language model (LLM). This two-step approach introduces a layer of interpretability and aligns the framework with the systematic reasoning that radiologists use when reviewing a CXR.
LLMVA-GEBC: Large Language Model with Video Adapter for Generic Event Boundary Captioning
Tang, Yunlong, Zhang, Jinrui, Wang, Xiangchen, Wang, Teng, Zheng, Feng
Our winning entry for the CVPR 2023 Generic Event Boundary Captioning (GEBC) competition is detailed in this paper. Unlike conventional video captioning tasks, GEBC demands that the captioning model possess an understanding of immediate changes in status around the designated video boundary, making it a difficult task. This paper proposes an effective model LLMVA-GEBC (Large Language Model with Video Adapter for Generic Event Boundary Captioning): (1) We utilize a pretrained LLM for generating human-like captions with high quality. (2) To adapt the model to the GEBC task, we take the video Q-former as an adapter and train it with the frozen visual feature extractors and LLM. Our proposed method achieved a 76.14 score on the test set and won the first place in the challenge. Our code is available at https://github.com/zjr2000/LLMVA-GEBC .
Moral Mimicry: Large Language Models Produce Moral Rationalizations Tailored to Political Identity
Large Language Models (LLMs) have demonstrated impressive capabilities in generating fluent text, as well as tendencies to reproduce undesirable social biases. This study investigates whether LLMs reproduce the moral biases associated with political groups in the United States, an instance of a broader capability herein termed moral mimicry. This hypothesis is explored in the GPT-3/3.5 and OPT families of Transformer-based LLMs. Using tools from Moral Foundations Theory, it is shown that these LLMs are indeed moral mimics. When prompted with a liberal or conservative political identity, the models generate text reflecting corresponding moral biases. This study also explores the relationship between moral mimicry and model size, and similarity between human and LLM moral word use.
AIs will become useless if they keep learning from other AIs
Artificial intelligences that are trained using text and images from other AIs, which have themselves been trained on AI outputs, could eventually become functionally useless. AIs such as ChatGPT, known as large language models (LLMs), use vast repositories of human-written text from the internet to create a statistical model of human language, so that they can predict which words are most likely to come next in a sentence. Since they have been available, the internet has become awash with AI-generated text, but the effect this will have on future AIs is unclear. Now, Ilia Shumailov at the University of Oxford and his colleagues have found that AI models trained using the outputs of other AIs become heavily biased, overly simple and disconnected from reality – a problem they call model collapse. This failure happens because of the way that AI models statistically represent text.
AI is already causing unintended harm. What happens when it falls into the wrong hands? David Evan Harris
A researcher was granted access earlier this year by Facebook's parent company, Meta, to incredibly potent artificial intelligence software – and leaked it to the world. As a former researcher on Meta's civic integrity and responsible AI teams, I am terrified by what could happen next. Though Meta was violated by the leak, it came out as the winner: researchers and independent coders are now racing to improve on or build on the back of LLaMA (Large Language Model Meta AI – Meta's branded version of a large language model or LLM, the type of software underlying ChatGPT), with many sharing their work openly with the world. This could position Meta as owner of the centrepiece of the dominant AI platform, much in the same way that Google controls the open-source Android operating system that is built on and adapted by device manufacturers globally. If Meta were to secure this central position in the AI ecosystem, it would have leverage to shape the direction of AI at a fundamental level, controlling both the experiences of individual users and setting limits on what other companies could and couldn't do.
Language to Rewards for Robotic Skill Synthesis
Yu, Wenhao, Gileadi, Nimrod, Fu, Chuyuan, Kirmani, Sean, Lee, Kuang-Huei, Arenas, Montse Gonzalez, Chiang, Hao-Tien Lewis, Erez, Tom, Hasenclever, Leonard, Humplik, Jan, Ichter, Brian, Xiao, Ted, Xu, Peng, Zeng, Andy, Zhang, Tingnan, Heess, Nicolas, Sadigh, Dorsa, Tan, Jie, Tassa, Yuval, Xia, Fei
Large language models (LLMs) have demonstrated exciting progress in acquiring diverse new capabilities through in-context learning, ranging from logical reasoning to code-writing. Robotics researchers have also explored using LLMs to advance the capabilities of robotic control. However, since low-level robot actions are hardware-dependent and underrepresented in LLM training corpora, existing efforts in applying LLMs to robotics have largely treated LLMs as semantic planners or relied on human-engineered control primitives to interface with the robot. On the other hand, reward functions are shown to be flexible representations that can be optimized for control policies to achieve diverse tasks, while their semantic richness makes them suitable to be specified by LLMs. In this work, we introduce a new paradigm that harnesses this realization by utilizing LLMs to define reward parameters that can be optimized and accomplish variety of robotic tasks. Using reward as the intermediate interface generated by LLMs, we can effectively bridge the gap between high-level language instructions or corrections to low-level robot actions. Meanwhile, combining this with a real-time optimizer, MuJoCo MPC, empowers an interactive behavior creation experience where users can immediately observe the results and provide feedback to the system. To systematically evaluate the performance of our proposed method, we designed a total of 17 tasks for a simulated quadruped robot and a dexterous manipulator robot. We demonstrate that our proposed method reliably tackles 90% of the designed tasks, while a baseline using primitive skills as the interface with Code-as-policies achieves 50% of the tasks. We further validated our method on a real robot arm where complex manipulation skills such as non-prehensile pushing emerge through our interactive system.