Goto

Collaborating Authors

 gorilla



Gorilla: Large Language Model Connected with Massive APIs

Neural Information Processing Systems

Large Language Models (LLMs) have seen an impressive wave of advances, withmodels now excelling in a variety of tasks, such as mathematical reasoning andprogram synthesis. However, their potential to effectively use tools via API callsremains unfulfilled. This is a challenging task even for today's state-of-the-artLLMs such as GPT-4 largely due to their unawareness of what APIs are availableand how to use them in a frequently updated tool set. We develop Gorilla, afinetuned LLaMA model that surpasses the performance of GPT-4 on writing APIcalls. Trained with the novel Retriever Aware Training (RAT), when combinedwith a document retriever, Gorilla demonstrates a strong capability to adapt totest-time document changes, allowing flexible user updates or version changes.It also substantially mitigates the issue of hallucination, commonly encounteredwhen prompting LLMs directly. To evaluate the model's ability, we introduceAPIBench, a comprehensive dataset consisting of HuggingFace, TorchHub, andTensorHub APIs. The successful integration of the retrieval system with Gorillademonstrates the potential for LLMs to use tools more accurately, keep up withfrequently updated documentation, and consequently increase the reliability andapplicability of their outputs. Gorilla's code, model, data, and demo are availableat: https://gorilla.cs.berkeley.edu



Female mountain gorillas wield a lot of power

Popular Science

Breakthroughs, discoveries, and DIY tips sent every weekday. Whether it's King Kong climbing the Empire State building or Donkey Kong throwing barrels at unsuspecting Italian plumbers, gorillas in popular culture are symbols of male power. This interpretation by filmmakers and video game creators has some truth to it. Silverback males rule gorilla troops, and occupy a place of power they only vacate after combat or death. The first studies on gorilla behavior began in the 1950s, through the pioneering fieldwork of George Schaller and Dian Fossey.


Gorilla: Large Language Model Connected with Massive APIs

Neural Information Processing Systems

Large Language Models (LLMs) have seen an impressive wave of advances, withmodels now excelling in a variety of tasks, such as mathematical reasoning andprogram synthesis. However, their potential to effectively use tools via API callsremains unfulfilled. This is a challenging task even for today's state-of-the-artLLMs such as GPT-4 largely due to their unawareness of what APIs are availableand how to use them in a frequently updated tool set. We develop Gorilla, afinetuned LLaMA model that surpasses the performance of GPT-4 on writing APIcalls. Trained with the novel Retriever Aware Training (RAT), when combinedwith a document retriever, Gorilla demonstrates a strong capability to adapt totest-time document changes, allowing flexible user updates or version changes.It also substantially mitigates the issue of hallucination, commonly encounteredwhen prompting LLMs directly.


RouteNator: A Router-Based Multi-Modal Architecture for Generating Synthetic Training Data for Function Calling LLMs

Belavadi, Vibha, Vatsa, Tushar, Sultania, Dewang, Suresha, Suhas, Verma, Ishita, Chen, Cheng, King, Tracy Holloway, Friedrich, Michael

arXiv.org Artificial Intelligence

This paper addresses fine-tuning Large Language Models (LLMs) for function calling tasks when real user interaction data is unavailable. In digital content creation tools, where users express their needs through natural language queries that must be mapped to API calls, the lack of real-world task-specific data and privacy constraints for training on it necessitate synthetic data generation. Existing approaches to synthetic data generation fall short in diversity and complexity, failing to replicate real-world data distributions and leading to suboptimal performance after LLM fine-tuning. We present a novel router-based architecture that leverages domain resources like content metadata and structured knowledge graphs, along with text-to-text and vision-to-text language models to generate high-quality synthetic training data. Our architecture's flexible routing mechanism enables synthetic data generation that matches observed real-world distributions, addressing a fundamental limitation of traditional approaches. Evaluation on a comprehensive set of real user queries demonstrates significant improvements in both function classification accuracy and API parameter selection. Models fine-tuned with our synthetic data consistently outperform traditional approaches, establishing new benchmarks for function calling tasks.


A Benchmark of French ASR Systems Based on Error Severity

Tholly, Antoine, Wottawa, Jane, Rouvier, Mickael, Dufour, Richard

arXiv.org Artificial Intelligence

Automatic Speech Recognition (ASR) transcription errors are commonly assessed using metrics that compare them with a reference transcription, such as Word Error Rate (WER), which measures spelling deviations from the reference, or semantic score-based metrics. However, these approaches often overlook what is understandable to humans when interpreting transcription errors. To address this limitation, a new evaluation is proposed that categorizes errors into four levels of severity, further divided into subtypes, based on objective linguistic criteria, contextual patterns, and the use of content words as the unit of analysis. This metric is applied to a benchmark of 10 state-of-the-art ASR systems on French language, encompassing both HMM-based and end-to-end models. Our findings reveal the strengths and weaknesses of each system, identifying those that provide the most comfortable reading experience for users.


Controlling Language and Diffusion Models by Transporting Activations

Rodriguez, Pau, Blaas, Arno, Klein, Michal, Zappella, Luca, Apostoloff, Nicholas, Cuturi, Marco, Suau, Xavier

arXiv.org Artificial Intelligence

The increasing capabilities of large generative models and their ever more widespread deployment have raised concerns about their reliability, safety, and potential misuse. To address these issues, recent works have proposed to control model generation by steering model activations in order to effectively induce or prevent the emergence of concepts or behaviors in the generated output. In this paper we introduce Activation Transport (AcT), a general framework to steer activations guided by optimal transport theory that generalizes many previous activation-steering works. AcT is modality-agnostic and provides fine-grained control over the model behavior with negligible computational overhead, while minimally impacting model abilities. We experimentally show the effectiveness and versatility of our approach by addressing key challenges in large language models (LLMs) and text-to-image diffusion models (T2Is). For LLMs, we show that AcT can effectively mitigate toxicity, induce arbitrary concepts, and increase their truthfulness. In T2Is, we show how AcT enables fine-grained style control and concept negation.


Redefining in Dictionary: Towards an Enhanced Semantic Understanding of Creative Generation

Feng, Fu, Xie, Yucheng, Yang, Xu, Wang, Jing, Geng, Xin

arXiv.org Artificial Intelligence

Given the challenge atively generated using . Furthermore, this that diffusion models face in directly generating creativity, meta-creativity enables direct concept combinations without existing methods typically rely on synthesizing reference requiring additional training, much like generating "a prompts or images to achieve creative effects. This significantly reduces both time and computational instance, to combine "Lettuce" and "Mantis" creatively, complexity compared to state-of-the-art (SOTA) ConceptLab [43] merges tokens representing these concepts creative generation methods, such as ConceptLab [43] (4s into a new composite token, while BASS [22] uses predefined vs. 120s per image, 30 speedup) and BASS [22] (4s vs. sampling rules to search for creative outcomes from a 2400s per image, 600 speedup), while maintaining linguistic large pool of candidate images. Further each generation, which leads to high computational costs evaluations using GPT-4o [1] and user studies indicate superior and limited practicality for online applications. In contrast, performance of CreTok in terms of integration, originality, "a blue banana" can be generated directly without additional and aesthetics, underscoring its effectiveness in fostering training, due to its clear and concrete semantics, especially combinatorial creativity. Inspired by this, we may Our contributions are as follows: (1) We propose Cre-ask: Can we awaken the creativity of diffusion models by Tok, a method designed to enhance models' meta-ability enhancing their semantic understanding of "creative"? To by enabling a enhanced understanding of abstract and ambiguous achieve this, we propose CreTok, which redefines "creative" adjectives (e.g., "creative" or "beautiful") through as a new specialized token, , allowing it their redefinition as new tokens. This redefinition we redefine the abstract term "creative" within our proposed enhances the model's semantic understanding for CangJie dataset for the TP2O task, and introduce combinatorial creativity, as shown in Figure 1c. Specifically, text-to-image (T2I) models and creative generation methods CreTok builds on the definition of "creativity" from in terms of computational complexity, human preference the TP2O task [22] for combinatorial object generation, ratings, text-image alignment, and other key metrics. ") and human-like creativity, a critical yet underexplored aspect an adaptive prompt (e.g., "A photo of a mixture"). of AI research [28, 29].


Applying RLAIF for Code Generation with API-usage in Lightweight LLMs

Dutta, Sujan, Mahinder, Sayantan, Anantha, Raviteja, Bandyopadhyay, Bortik

arXiv.org Artificial Intelligence

Reinforcement Learning from AI Feedback (RLAIF) has demonstrated significant potential across various domains, including mitigating harm in LLM outputs, enhancing text summarization, and mathematical reasoning. This paper introduces an RLAIF framework for improving the code generation abilities of lightweight (<1B parameters) LLMs. We specifically focus on code generation tasks that require writing appropriate API calls, which is challenging due to the well-known issue of hallucination in LLMs. Our framework extracts AI feedback from a larger LLM (e.g., GPT-3.5) through a specialized prompting strategy and uses this data to train a reward model towards better alignment from smaller LLMs. We run our experiments on the Gorilla dataset and meticulously assess the quality of the model-generated code across various metrics, including AST, ROUGE, and Code-BLEU, and develop a pipeline to compute its executability rate accurately. Our approach significantly enhances the fine-tuned LLM baseline's performance, achieving a 4.5% improvement in executability rate. Notably, a smaller LLM model (780M parameters) trained with RLAIF surpasses a much larger fine-tuned baseline with 7B parameters, achieving a 1.0% higher code executability rate.