Goto

Collaborating Authors

 tool chain





GTA: A Benchmark for General Tool Agents

Wang, Jize, Ma, Zerun, Li, Yining, Zhang, Songyang, Chen, Cailian, Chen, Kai, Le, Xinyi

arXiv.org Artificial Intelligence

Significant focus has been placed on integrating large language models (LLMs) with various tools in developing general-purpose agents. This poses a challenge to LLMs' tool-use capabilities. However, there are evident gaps between existing tool-use evaluations and real-world scenarios. Current evaluations often use AI-generated queries, single-step tasks, dummy tools, and text-only interactions, failing to reveal the agents' real-world problem-solving abilities effectively. To address this, we propose GTA, a benchmark for General Tool Agents, featuring three main aspects: (i) Real user queries: human-written queries with simple real-world objectives but implicit tool-use, requiring the LLM to reason the suitable tools and plan the solution steps. (ii) Real deployed tools: an evaluation platform equipped with tools across perception, operation, logic, and creativity categories to evaluate the agents' actual task execution performance. (iii) Real multimodal inputs: authentic image files, such as spatial scenes, web page screenshots, tables, code snippets, and printed/handwritten materials, used as the query contexts to align with real-world scenarios closely. We design 229 real-world tasks and executable tool chains to evaluate mainstream LLMs. Our findings show that real-world user queries are challenging for existing LLMs, with GPT-4 completing less than 50% of the tasks and most LLMs achieving below 25%. This evaluation reveals the bottlenecks in the tool-use capabilities of current LLMs in real-world scenarios, which provides future direction for advancing general-purpose tool agents. The code and dataset are available at https://github.com/open-compass/GTA.


Aligning Models with Their Realization through Model-based Systems Engineering

Zenz, Lovis Justin Immanuel, Heiland, Erik, Hillmann, Peter, Karcher, Andreas

arXiv.org Artificial Intelligence

In this paper, we propose a method for aligning models with their realization through the application of model-based systems engineering. Our approach is divided into three steps. (1) Firstly, we leverage domain expertise and the Unified Architecture Framework to establish a reference model that fundamentally describes some domain. (2) Subsequently, we instantiate the reference model as specific models tailored to different scenarios within the domain. (3) Finally, we incorporate corresponding run logic directly into both the reference model and the specific models. In total, we thus provide a practical means to ensure that every implementation result is justified by business demand. We demonstrate our approach using the example of maritime object detection as a specific application (specific model / implementation element) of automatic target recognition as a service reoccurring in various forms (reference model element). Our approach facilitates a more seamless integration of models and implementation, fostering enhanced Business-IT alignment.


TensorFlow Lite Micro: Embedded Machine Learning on TinyML Systems

David, Robert, Duke, Jared, Jain, Advait, Reddi, Vijay Janapa, Jeffries, Nat, Li, Jian, Kreeger, Nick, Nappier, Ian, Natraj, Meghna, Regev, Shlomi, Rhodes, Rocky, Wang, Tiezhen, Warden, Pete

arXiv.org Artificial Intelligence

Deep learning inference on embedded devices is a burgeoning field with myriad applications because tiny embedded devices are omnipresent. But we must overcome major challenges before we can benefit from this opportunity. Embedded processors are severely resource constrained. Their nearest mobile counterparts exhibit at least a 100---1,000x difference in compute capability, memory availability, and power consumption. As a result, the machine-learning (ML) models and associated ML inference framework must not only execute efficiently but also operate in a few kilobytes of memory. Also, the embedded devices' ecosystem is heavily fragmented. To maximize efficiency, system vendors often omit many features that commonly appear in mainstream systems, including dynamic memory allocation and virtual memory, that allow for cross-platform interoperability. The hardware comes in many flavors (e.g., instruction-set architecture and FPU support, or lack thereof). We introduce TensorFlow Lite Micro (TF Micro), an open-source ML inference framework for running deep-learning models on embedded systems. TF Micro tackles the efficiency requirements imposed by embedded-system resource constraints and the fragmentation challenges that make cross-platform interoperability nearly impossible. The framework adopts a unique interpreter-based approach that provides flexibility while overcoming these challenges. This paper explains the design decisions behind TF Micro and describes its implementation details. Also, we present an evaluation to demonstrate its low resource requirement and minimal run-time performance overhead.


Microsoft CEO Satya Nadella On The Extraordinary Potential Of AI

Forbes - Tech

With AI, "It's not just having the technology" but also "the deployed solution," says Microsoft CEO Satya Nadella. He now runs his own firm, Evans Strategic Communications LLC.) CLOUD WARS -- As the AI Age takes hold, Microsoft has unleashed an AI strategy that's as complete and ambitious as any you'll find from any company in the world. It's already in place in fast-food restaurants and in manufacturing plants, and Microsoft is ahead of everyone in extending and unifying AI's capabilities from the cloud to the edge. But that enormous potential won't be realized unless AI takes its proper place within the ever-widening set of Azure-centered technologies and services within the Microsoft portfolio. "AI is going to be one of the trends that is going to be the next big shift in technology," Microsoft CEO Satya Nadella said at a recent investor's conference.