Goto

Collaborating Authors

 Large Language Model


The Morning After: OpenAI and Microsoft aren't happy

Engadget

Microsoft may own almost half of OpenAI, but a recent expose hints the pair aren't the happiest of bedfellows. The Wall Street Journal claims the AI company warned Microsoft not to incorporate GPT-4 into Bing search without further training, but it did so anyway. It resulted in several high-profile examples of odd behavior, including bots arguing with users, and at least one instance of a user being urged to dissolve their marriage and elope with Bing instead. There's resentment, too, on Microsoft's side, finding its own internal AI projects overlooked in favor of OpenAI. Which, despite the close financial ties, is very much free to work with Microsoft's rivals in plenty of fields.


ChatGPT Is Unoriginal--and Exactly What Humans Need

WIRED

Consider a teenager, Jorge, who is caught possessing a large amount of marijuana by a school administrator and will be expelled if he's reported to his parole officer. If the administrator does not report him, they're breaking the law; if they do, they're condemning him to one of the worst schools in the city and likely recidivism. This is a case study we presented to a class of 60 students at the Harvard Graduate School of Education. We asked them to pretend to be a teacher or administrator at the school and design a course of action. One hour into their conversation, we presented them with ChatGPT's analysis of the study.


Tokyo Metropolitan Government to start using ChatGPT from August

The Japan Times

The Tokyo Metropolitan Government will begin using the artificial intelligence chatbot ChatGPT for writing texts and carrying out other clerical work in all of its offices from August, Gov. Yuriko Koike said Tuesday. ChatGPT "has the potential to greatly transform the way public administration is conducted," Koike said during a metropolitan assembly session. She added that "better city governance" can be achieved by assessing the positive and negative aspects of the AI service. Koike also said the metropolitan government will use ChatGPT for tasks including preparing documents in question-and-answer format, and seek input from its employees about other practical uses for the generative AI tool. This could be due to a conflict with your ad-blocking or security software.


Knowledge Distillation of Large Language Models

arXiv.org Artificial Intelligence

Knowledge Distillation (KD) is a promising technique for reducing the high computational demand of large language models (LLMs). However, previous KD methods are primarily applied to white-box classification models or training small models to imitate black-box model APIs like ChatGPT. How to effectively distill the knowledge from white-box generative LLMs is still under-explored, which becomes more and more important with the prosperity of LLMs. In this work, we propose MiniLLM that distills smaller language models from generative larger language models. We first replace the forward Kullback-Leibler divergence (KLD) objective in the standard KD approaches with reverse KLD, which is more suitable for KD on generative language models, to prevent the student model from overestimating the low-probability regions of the teacher distribution. Then, we derive an effective optimization approach to learn this objective. Extensive experiments in the instruction-following setting show that the MiniLLM models generate more precise responses with the higher overall quality, lower exposure bias, better calibration, and higher long-text generation performance. Our method is also scalable for different model families with 120M to 13B parameters. We will release our code and model checkpoints at https://aka.ms/MiniLLM.


PRISMA-DFLLM: An Extension of PRISMA for Systematic Literature Reviews using Domain-specific Finetuned Large Language Models

arXiv.org Artificial Intelligence

With the proliferation of open-sourced Large Language Models (LLMs) and efficient finetuning techniques, we are on the cusp of the emergence of numerous domain-specific LLMs that have been finetuned for expertise across specialized fields and applications for which the current general-purpose LLMs are unsuitable. In academia, this technology has the potential to revolutionize the way we conduct systematic literature reviews (SLRs), access knowledge and generate new insights. This paper proposes an AI-enabled methodological framework that combines the power of LLMs with the rigorous reporting guidelines of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). By finetuning LLMs on domain-specific academic papers that have been selected as a result of a rigorous SLR process, the proposed PRISMA-DFLLM (for Domain-specific Finetuned LLMs) reporting guidelines offer the potential to achieve greater efficiency, reusability and scalability, while also opening the potential for conducting incremental living systematic reviews with the aid of LLMs. Additionally, the proposed approach for leveraging LLMs for SLRs enables the dissemination of finetuned models, empowering researchers to accelerate advancements and democratize cutting-edge research. This paper presents the case for the feasibility of finetuned LLMs to support rigorous SLRs and the technical requirements for realizing this. This work then proposes the extended PRISMA-DFLLM checklist of reporting guidelines as well as the advantages, challenges, and potential implications of implementing PRISMA-DFLLM. Finally, a future research roadmap to develop this line of AI-enabled SLRs is presented, paving the way for a new era of evidence synthesis and knowledge discovery.


Revealing the structure of language model capabilities

arXiv.org Artificial Intelligence

Building a theoretical understanding of the capabilities of large language models (LLMs) is vital for our ability to predict and explain the behavior of these systems. Here, we investigate the structure of LLM capabilities by extracting latent capabilities from patterns of individual differences across a varied population of LLMs. Using a combination of Bayesian and frequentist factor analysis, we analyzed data from 29 different LLMs across 27 cognitive tasks. We found evidence that LLM capabilities are not monolithic. Instead, they are better explained by three well-delineated factors that represent reasoning, comprehension and core language modeling. Moreover, we found that these three factors can explain a high proportion of the variance in model performance. These results reveal a consistent structure in the capabilities of different LLMs and demonstrate the multifaceted nature of these capabilities. We also found that the three abilities show different relationships to model properties such as model size and instruction tuning. These patterns help refine our understanding of scaling laws and indicate that changes to a model that improve one ability might simultaneously impair others. Based on these findings, we suggest that benchmarks could be streamlined by focusing on tasks that tap into each broad model ability.


Radiology-GPT: A Large Language Model for Radiology

arXiv.org Artificial Intelligence

We introduce Radiology-GPT, a large language model for radiology. Using an instruction tuning approach on an extensive dataset of radiology domain knowledge, Radiology-GPT demonstrates superior performance compared to general language models such as StableLM, Dolly and LLaMA. It exhibits significant versatility in radiological diagnosis, research, and communication. This work serves as a catalyst for future developments in clinical NLP. The successful implementation of Radiology-GPT is indicative of the potential of localizing generative large language models, specifically tailored for distinctive medical specialties, while ensuring adherence to privacy standards such as HIPAA. The prospect of developing individualized, large-scale language models that cater to specific needs of various hospitals presents a promising direction. The fusion of conversational competence and domain-specific knowledge in these models is set to foster future development in healthcare AI. A demo of Radiology-GPT is available at https://huggingface.co/spaces/allen-eric/radiology-gpt.


Toward Grounded Social Reasoning

arXiv.org Artificial Intelligence

Consider a robot tasked with tidying a desk with a meticulously constructed Lego sports car. A human may recognize that it is not socially appropriate to disassemble the sports car and put it away as part of the "tidying". How can a robot reach that conclusion? Although large language models (LLMs) have recently been used to enable social reasoning, grounding this reasoning in the real world has been challenging. To reason in the real world, robots must go beyond passively querying LLMs and *actively gather information from the environment* that is required to make the right decision. For instance, after detecting that there is an occluded car, the robot may need to actively perceive the car to know whether it is an advanced model car made out of Legos or a toy car built by a toddler. We propose an approach that leverages an LLM and vision language model (VLM) to help a robot actively perceive its environment to perform grounded social reasoning. To evaluate our framework at scale, we release the MessySurfaces dataset which contains images of 70 real-world surfaces that need to be cleaned. We additionally illustrate our approach with a robot on 2 carefully designed surfaces. We find an average 12.9% improvement on the MessySurfaces benchmark and an average 15% improvement on the robot experiments over baselines that do not use active perception. The dataset, code, and videos of our approach can be found at https://minaek.github.io/groundedsocialreasoning.


Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models

arXiv.org Artificial Intelligence

Abstract--The AI community has been pursuing algorithms known as artificial general intelligence (AGI) that apply to any kind of real-world problem. Recently, chat systems powered by large language models (LLMs) emerge and rapidly become a promising direction to achieve AGI in natural language processing (NLP), but the path towards AGI in computer vision (CV) remains unclear. One may owe the dilemma to the fact that visual signals are more complex than language signals, yet we are interested in finding concrete reasons, as well as absorbing experiences from GPT and LLMs to solve the problem. In this paper, we start with a conceptual definition of AGI and briefly review how NLP solves a wide range of tasks via a chat system. The analysis inspires us that unification is the next important goal of CV. But, despite various efforts in this direction, CV is still far from a system like GPT that naturally integrates all tasks. We point out that the essential weakness of CV lies in lacking a paradigm to learn from environments, yet NLP has accomplished the task in the text world. We then imagine a pipeline that puts a CV algorithm (i.e., an agent) in world-scale, interactable environments, pre-trains it to predict future frames with respect to its action, and then fine-tunes it with instruction to accomplish various tasks. We expect substantial research and engineering efforts to push the idea forward and scale it up, for which we share our perspectives on future research directions. Some researchers believed that such systems designs do not generally transfer to other problems such as can be seen as early sparks of AGI [2]. These systems were image captioning [11] or visual content generation [12]. In recent years, enhanced by instruct tuning [4]. Equipped with an external there are many efforts in this direction, and we roughly categorize knowledge base and specifically designed modules, they them into five research topics, namely, (i) open-world can accomplish complex tasks such as solving mathematical visual recognition based on vision-language alignment [13], questions, generating visual contents, etc., reflecting its (ii) the Segment Anything task [14] for generic visual recognition, strong ability to understand users' intentions and perform (iii) generalized visual encoding to unify vision preliminary chain-of-thoughts [5]. Despite known weaknesses tasks [15], [16], [17], (iv) LLM-guided visual understanding in some aspects (e.g., telling scientific facts and relationships to enhance the logic in CV [18], [19], and (v) multimodal between named people), these pioneering studies dialog to facilitate vision-language interaction [11], [20].


Anticipatory Music Transformer

arXiv.org Artificial Intelligence

We introduce anticipation: a method for constructing a controllable generative model of a temporal point process (the event process) conditioned asynchronously on realizations of a second, correlated process (the control process). We achieve this by interleaving sequences of events and controls, such that controls appear following stopping times in the event sequence. This work is motivated by problems arising in the control of symbolic music generation. We focus on infilling control tasks, whereby the controls are a subset of the events themselves, and conditional generation completes a sequence of events given the fixed control events. We train anticipatory infilling models using the large and diverse Lakh MIDI music dataset. These models match the performance of autoregressive models for prompted music generation, with the additional capability to perform infilling control tasks, including accompaniment. Human evaluators report that an anticipatory model produces accompaniments with similar musicality to even music composed by humans over a 20-second clip.