Plotting

 Pathak, Divya


ITBench: Evaluating AI Agents across Diverse Real-World IT Automation Tasks

arXiv.org Artificial Intelligence

Realizing the vision of using AI agents to automate critical IT tasks depends on the ability to measure and understand effectiveness of proposed solutions. We introduce ITBench, a framework that offers a systematic methodology for benchmarking AI agents to address real-world IT automation tasks. Our initial release targets three key areas: Site Reliability Engineering (SRE), Compliance and Security Operations (CISO), and Financial Operations (FinOps). The design enables AI researchers to understand the challenges and opportunities of AI agents for IT automation with push-button workflows and interpretable metrics. ITBench includes an initial set of 94 real-world scenarios, which can be easily extended by community contributions. Our results show that agents powered by state-of-the-art models resolve only 13.8% of SRE scenarios, 25.2% of CISO scenarios, and 0% of FinOps scenarios. We expect ITBench to be a key enabler of AI-driven IT automation that is correct, safe, and fast.


A Novel Methodology For Crowdsourcing AI Models in an Enterprise

arXiv.org Artificial Intelligence

The evolution of AI is advancing rapidly, creating both challenges and opportunities for industry-community collaboration. In this work, we present a novel methodology aiming to facilitate this collaboration through crowdsourcing of AI models. Concretely, we have implemented a system and a process that any organization can easily adopt to host AI competitions. The system allows them to automatically harvest and evaluate the submitted models against in-house proprietary data and also to incorporate them as reusable services in a product.


A Canonical Architecture For Predictive Analytics on Longitudinal Patient Records

arXiv.org Artificial Intelligence

The architecture Many institutions within the healthcare ecosystem are making is designed to accommodate trust and reproducibility as significant investments in AI technologies to optimize their business an inherent part of the AI life cycle and support the needs for a operations at lower cost with improved patient outcomes. Despite deployed AI system in healthcare. In what follows, we start with the hype with AI, the full realization of this potential is seriously a crisp articulation of challenges that we have identified to derive hindered by several systemic problems, including data privacy, the requirements for this architecture. We then follow with a description security, bias, fairness, and explainability. In this paper, we propose of this architecture before providing qualitative evidence a novel canonical architecture for the development of AI models of its capabilities in real world settings.