AITopics

Country: Asia > China > Zhejiang Province (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsDec-26-2025, 03:57:29 GMT

HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face

Solving complicated AI tasks with different domains and modalities is a key step toward artificial general intelligence. While there are numerous AI models available for various domains and modalities, they cannot handle complicated AI tasks autonomously. Considering large language models (LLMs) have exhibited exceptional abilities in language understanding, generation, interaction, and reasoning, we advocate that LLMs could act as a controller to manage existing AI models to solve complicated AI tasks, with language serving as a generic interface to empower this. Based on this philosophy, we present HuggingGPT, an LLM-powered agent that leverages LLMs (e.g., ChatGPT) to connect various AI models in machine learning communities (e.g., Hugging Face) to solve AI tasks. Specifically, we use ChatGPT to conduct task planning when receiving a user request, select models according to their function descriptions available in Hugging Face, execute each subtask with the selected AI model, and summarize the response according to the execution results. By leveraging the strong language capability of ChatGPT and abundant AI models in Hugging Face, HuggingGPT can tackle a wide range of sophisticated AI tasks spanning different modalities and domains and achieve impressive results in language, vision, speech, and other challenging tasks, which paves a new way towards the realization of artificial general intelligence.

ai model, hugging face, hugginggpt, (7 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.76)

Neural Information Processing SystemsOct-8-2025, 22:40:10 GMT

77c33e6a367922d003ff102ffb92b658-Paper-Conference.pdf

large language model, machine learning, natural language, (18 more...)

Country: Asia > China > Zhejiang Province (0.04)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Neural Information Processing SystemsJan-19-2025, 09:01:47 GMT

HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face

Solving complicated AI tasks with different domains and modalities is a key step toward artificial general intelligence. While there are numerous AI models available for various domains and modalities, they cannot handle complicated AI tasks autonomously. Considering large language models (LLMs) have exhibited exceptional abilities in language understanding, generation, interaction, and reasoning, we advocate that LLMs could act as a controller to manage existing AI models to solve complicated AI tasks, with language serving as a generic interface to empower this. Based on this philosophy, we present HuggingGPT, an LLM-powered agent that leverages LLMs (e.g., ChatGPT) to connect various AI models in machine learning communities (e.g., Hugging Face) to solve AI tasks. Specifically, we use ChatGPT to conduct task planning when receiving a user request, select models according to their function descriptions available in Hugging Face, execute each subtask with the selected AI model, and summarize the response according to the execution results.

large language model, machine learning, natural language, (10 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

arXiv.org Artificial IntelligenceDec-17-2024

From An LLM Swarm To A PDDL-Empowered HIVE: Planning Self-Executed Instructions In A Multi-Modal Jungle

Vyas, Kaustubh, Graux, Damien, Yang, Yijun, Montella, Sébastien, Diao, Chenxin, Zhou, Wendi, Vougiouklis, Pavlos, Lai, Ruofei, Ren, Yang, Li, Keshuang, Pan, Jeff Z.

In response to the call for agent-based solutions that leverage the ever-increasing capabilities of the deep models' ecosystem, we introduce Hive -- a comprehensive solution for selecting appropriate models and subsequently planning a set of atomic actions to satisfy the end-users' instructions. Hive operates over sets of models and, upon receiving natural language instructions (i.e. user queries), schedules and executes explainable plans of atomic actions. These actions can involve one or more of the available models to achieve the overall task, while respecting end-users specific constraints. Notably, Hive handles tasks that involve multi-modal inputs and outputs, enabling it to handle complex, real-world queries. Our system is capable of planning complex chains of actions while guaranteeing explainability, using an LLM-based formal logic backbone empowered by PDDL operations. We introduce the MuSE benchmark in order to offer a comprehensive evaluation of the multi-modal capabilities of agent systems. Our findings show that our framework redefines the state-of-the-art for task selection, outperforming other competing systems that plan operations across multiple models while offering transparency guarantees while fully adhering to user constraints.

large language model, machine learning, natural language, (19 more...)

2412.12839

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > Dominican Republic (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceDec-13-2024

Olympus: A Universal Task Router for Computer Vision Tasks

Lin, Yuanze, Li, Yunsheng, Chen, Dongdong, Xu, Weijian, Clark, Ronald, Torr, Philip H. S.

We introduce Olympus, a new approach that transforms Multimodal Large Language Models (MLLMs) into a unified framework capable of handling a wide array of computer vision tasks. Utilizing a controller MLLM, Olympus delegates over 20 specialized tasks across images, videos, and 3D objects to dedicated modules. This instruction-based routing enables complex workflows through chained actions without the need for training heavy generative models. Olympus easily integrates with existing MLLMs, expanding their capabilities with comparable performance. Experimental results demonstrate that Olympus achieves an average routing accuracy of 94.75% across 20 tasks and precision of 91.82% in chained action scenarios, showcasing its effectiveness as a universal task router that can solve a diverse range of computer vision tasks. Project page: http://yuanze-lin.me/Olympus_page/

large language model, machine learning, natural language, (17 more...)

2412.09612

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceOct-3-2024

Rethinking VLMs and LLMs for Image Classification

Cooper, Avi, Kato, Keizo, Shih, Chia-Hsien, Yamane, Hiroaki, Vinken, Kasper, Takemoto, Kentaro, Sunagawa, Taro, Yeh, Hao-Wei, Yamanaka, Jin, Mason, Ian, Boix, Xavier

Visual Language Models (VLMs) are now increasingly being merged with Large Language Models (LLMs) to enable new capabilities, particularly in terms of improved interactivity and open-ended responsiveness. While these are remarkable capabilities, the contribution of LLMs to enhancing the longstanding key problem of classifying an image among a set of choices remains unclear. Through extensive experiments involving seven models, ten visual understanding datasets, and multiple prompt variations per dataset, we find that, for object and scene recognition, VLMs that do not leverage LLMs can achieve better performance than VLMs that do. Yet at the same time, leveraging LLMs can improve performance on tasks requiring reasoning and outside knowledge. In response to these challenges, we propose a pragmatic solution: a lightweight fix involving a relatively small LLM that efficiently routes visual tasks to the most suitable model for the task. The LLM router undergoes training using a dataset constructed from more than 2.5 million examples of pairs of visual task and model accuracy. Our results reveal that this lightweight fix surpasses or matches the accuracy of state-of-the-art alternatives, including GPT-4V and HuggingGPT, while improving cost-effectiveness.

large language model, machine learning, natural language, (18 more...)

2410.1469

Country:

North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > United States > California > Santa Clara County > Santa Clara (0.04)
North America > Canada > Ontario > Toronto (0.04)
(2 more...)

Genre: Research Report > New Finding (0.87)

Industry:

Health & Medicine > Therapeutic Area > Oncology (0.47)
Education > Curriculum > Subject-Specific Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

arXiv.org Artificial IntelligenceDec-3-2023

HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face

Shen, Yongliang, Song, Kaitao, Tan, Xu, Li, Dongsheng, Lu, Weiming, Zhuang, Yueting

Solving complicated AI tasks with different domains and modalities is a key step toward artificial general intelligence. While there are numerous AI models available for various domains and modalities, they cannot handle complicated AI tasks autonomously. Considering large language models (LLMs) have exhibited exceptional abilities in language understanding, generation, interaction, and reasoning, we advocate that LLMs could act as a controller to manage existing AI models to solve complicated AI tasks, with language serving as a generic interface to empower this. Based on this philosophy, we present HuggingGPT, an LLM-powered agent that leverages LLMs (e.g., ChatGPT) to connect various AI models in machine learning communities (e.g., Hugging Face) to solve AI tasks. Specifically, we use ChatGPT to conduct task planning when receiving a user request, select models according to their function descriptions available in Hugging Face, execute each subtask with the selected AI model, and summarize the response according to the execution results. By leveraging the strong language capability of ChatGPT and abundant AI models in Hugging Face, HuggingGPT can tackle a wide range of sophisticated AI tasks spanning different modalities and domains and achieve impressive results in language, vision, speech, and other challenging tasks, which paves a new way towards the realization of artificial general intelligence.

hugginggpt, llm, task planning, (14 more...)

2303.1758

Country: Asia > China > Zhejiang Province (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceApr-15-2023, 08:25:14 GMT

HuggingGPT: Can This Collaborative Language Model Truly Do It All?

AI researchers continue to work towards the development of an Artificial General Intelligence capable of performing tasks across all domains. While numerous domain-specific models can address particular problems, no single model exists that can solve all problems. Traditional large language models (LLMs) excel in handling text and responding to users but face limitations when processing complex information like vision and speech. Instead of training LLMs to handle domain-specific problems, an intriguing approach is to employ LLMs alongside domain-specific models to address real-world use cases. Consider several examples: identifying toppings on a pizza from an advertisement, determining the amount to pay from an invoice, checking if it's raining outside, counting dogs in a park, assessing whether vehicles are on the road to cross it safely, or recognizing a location based on a picture.

artificial intelligence, large language model, natural language, (17 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)