required parameter
When Agents go Astray: Course-Correcting SWE Agents with PRMs
Gandhi, Shubham, Tsay, Jason, Ganhotra, Jatin, Kate, Kiran, Rizk, Yara
Large Language Model (LLM) agents are increasingly deployed for complex, multi-step software engineering (SWE) tasks. However, their trajectories often contain costly inefficiencies, such as redundant exploration, looping, and failure to terminate once a solution is reached. Prior work has largely treated these errors in a post-hoc manner, diagnosing failures only after execution. In this paper, we introduce SWE-PRM, an inference-time Process Reward Model (PRM) that intervenes during execution to detect and course-correct trajectory-level errors. Our PRM design leverages a taxonomy of common inefficiencies and delivers lightweight, interpretable feedback without modifying the underlying policy. On SWE-bench Verified, closed-source PRMs improve resolution from 40.0% to 50.6% (+10.6 p.p.), with the largest gains on medium and hard tasks. Among feedback strategies, taxonomy-guided PRMs outperform unguided or explicit action-prescriptive variants, increasing success rate while reducing trajectory length. These benefits come at an acceptable added inference cost of as low as $0.2, making PRMs a practical and scalable mechanism for improving SWE agents' reliability and efficiency.
- Europe > Austria > Vienna (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Multi Agent based Medical Assistant for Edge Devices
Gawade, Sakharam, Akhouri, Shivam, Kulkarni, Chinmay, Samant, Jagdish, Sahu, Pragya, Aastik, null, Pahal, Jai, Meher, Saswat
Large Action Models (LAMs) have revolutionized intelligent automation, but their application in healthcare faces challenges due to privacy concerns, latency, and dependency on internet access. This report introduces an ondevice, multi-agent healthcare assistant that overcomes these limitations. The system utilizes smaller, task-specific agents to optimize resources, ensure scalability and high performance. Our proposed system acts as a one-stop solution for health care needs with features like appointment booking, health monitoring, medication reminders, and daily health reporting. Powered by the Qwen Code Instruct 2.5 7B model, the Planner and Caller Agents achieve an average RougeL score of 85.5 for planning and 96.5 for calling for our tasks while being lightweight for on-device deployment. This innovative approach combines the benefits of ondevice systems with multi-agent architectures, paving the way for user-centric healthcare solutions.
- North America > United States > Oklahoma > Payne County > Cushing (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)
GenTool: Enhancing Tool Generalization in Language Models through Zero-to-One and Weak-to-Strong Simulation
He, Jie, Neville, Jennifer, Wan, Mengting, Yang, Longqi, Liu, Hui, Xu, Xiaofeng, Song, Xia, Pan, Jeff Z., Zhou, Pei
Large Language Models (LLMs) can enhance their capabilities as AI assistants by integrating external tools, allowing them to access a wider range of information. While recent LLMs are typically fine-tuned with tool usage examples during supervised fine-tuning (SFT), questions remain about their ability to develop robust tool-usage skills and can effectively generalize to unseen queries and tools. In this work, we present GenTool, a novel training framework that prepares LLMs for diverse generalization challenges in tool utilization. Our approach addresses two fundamental dimensions critical for real-world applications: Zero-to-One Generalization, enabling the model to address queries initially lacking a suitable tool by adopting and utilizing one when it becomes available, and Weak-to-Strong Generalization, allowing models to leverage enhanced versions of existing tools to solve queries. To achieve this, we develop synthetic training data simulating these two dimensions of tool usage and introduce a two-stage fine-tuning approach: optimizing tool ranking, then refining tool selection. Through extensive experiments across four generalization scenarios, we demonstrate that our method significantly enhances the tool-usage capabilities of LLMs ranging from 1B to 8B parameters, achieving performance that surpasses GPT-4o. Furthermore, our analysis also provides valuable insights into the challenges LLMs encounter in tool generalization.
- North America > United States (0.14)
- Asia > Thailand (0.14)
- North America > Mexico > Mexico City (0.14)
- North America > Canada (0.14)
- Banking & Finance (0.68)
- Transportation > Passenger (0.67)
- Consumer Products & Services > Travel (0.46)
- Transportation > Ground (0.46)
Creating a Chatbot for Healthcare in React Native using Dialogflow
A chatbot is a piece of software that helps in conducting a conversation through voice based or textual methods. Chatbots offer companies new opportunities to improve the customer engagement process and operational efficiency by reducing the typical cost of customer service. Dialogflow (previously known as API.AI) is a Natural Language Processing (NLP) platform which can be greatly helpful to build conversational applications for a company's customers in various languages and also across multiple platforms. Dialogflow enables developers to create text-based and voice conversation interfaces for responding to customer queries in different languages. Navigate to console in the official website.
- Health & Medicine (0.40)
- Information Technology (0.30)