finop
ITBench: Evaluating AI Agents across Diverse Real-World IT Automation Tasks
Jha, Saurabh, Arora, Rohan, Watanabe, Yuji, Yanagawa, Takumi, Chen, Yinfang, Clark, Jackson, Bhavya, Bhavya, Verma, Mudit, Kumar, Harshit, Kitahara, Hirokuni, Zheutlin, Noah, Takano, Saki, Pathak, Divya, George, Felix, Wu, Xinbo, Turkkan, Bekir O., Vanloo, Gerard, Nidd, Michael, Dai, Ting, Chatterjee, Oishik, Gupta, Pranjal, Samanta, Suranjana, Aggarwal, Pooja, Lee, Rong, Murali, Pavankumar, Ahn, Jae-wook, Kar, Debanjana, Rahane, Ameet, Fonseca, Carlos, Paradkar, Amit, Deng, Yu, Moogi, Pratibha, Mohapatra, Prateeti, Abe, Naoki, Narayanaswami, Chandrasekhar, Xu, Tianyin, Varshney, Lav R., Mahindru, Ruchi, Sailer, Anca, Shwartz, Laura, Sow, Daby, Fuller, Nicholas C. M., Puri, Ruchir
Realizing the vision of using AI agents to automate critical IT tasks depends on the ability to measure and understand effectiveness of proposed solutions. We introduce ITBench, a framework that offers a systematic methodology for benchmarking AI agents to address real-world IT automation tasks. Our initial release targets three key areas: Site Reliability Engineering (SRE), Compliance and Security Operations (CISO), and Financial Operations (FinOps). The design enables AI researchers to understand the challenges and opportunities of AI agents for IT automation with push-button workflows and interpretable metrics. ITBench includes an initial set of 94 real-world scenarios, which can be easily extended by community contributions. Our results show that agents powered by state-of-the-art models resolve only 13.8% of SRE scenarios, 25.2% of CISO scenarios, and 0% of FinOps scenarios. We expect ITBench to be a key enabler of AI-driven IT automation that is correct, safe, and fast.
- Europe (0.14)
- North America > United States > Illinois (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Workflow (1.00)
- Research Report > New Finding (1.00)
- Information Technology > Services (1.00)
- Information Technology > Security & Privacy (1.00)
- Government (0.92)
Creative AI, FinOps among hot developer trends of 2023
A handful of important trends will transform the software developer experience in 2023, as enterprises consider more self-hosting, observe more SaaS consolidations and see an upswing of interest in creative AI. Also, as AI enters the creativity realm, it threatens to upend the future of app dev. And OpenAI's Chat GPT, released in November, takes code completion beyond line suggestions -- in addition to writing complete web pages and simple applications, it can generate new programming languages. For developers, the 2022 job market started strong, but by December, they saw storm clouds as layoffs hit the tech sector. Experts felt vibes of the early 2000s recession and the pandemic's early days.
Staff Cloud Analytics Data Engineer, FinOps
We have the vision of a world where each day is safer and more secure than the one before. These aren't easy goals to accomplish – but we're not here for easy. We are a company built on the foundation of challenging and disrupting the way things are done, and we're looking for innovators who are as committed to shaping the future of cybersecurity as we are. Palo Alto Networks is evolving to meet the needs of our employees now and in the future through FLEXWORK, our approach to how we work. And because it FLEXes around each individual employee based on their individual choices, employees are empowered to push boundaries and help us all evolve, together.
- Information Technology > Security & Privacy (0.74)
- Health & Medicine > Therapeutic Area > Immunology (0.37)