Watanabe, Yuji
ITBench: Evaluating AI Agents across Diverse Real-World IT Automation Tasks
Jha, Saurabh, Arora, Rohan, Watanabe, Yuji, Yanagawa, Takumi, Chen, Yinfang, Clark, Jackson, Bhavya, Bhavya, Verma, Mudit, Kumar, Harshit, Kitahara, Hirokuni, Zheutlin, Noah, Takano, Saki, Pathak, Divya, George, Felix, Wu, Xinbo, Turkkan, Bekir O., Vanloo, Gerard, Nidd, Michael, Dai, Ting, Chatterjee, Oishik, Gupta, Pranjal, Samanta, Suranjana, Aggarwal, Pooja, Lee, Rong, Murali, Pavankumar, Ahn, Jae-wook, Kar, Debanjana, Rahane, Ameet, Fonseca, Carlos, Paradkar, Amit, Deng, Yu, Moogi, Pratibha, Mohapatra, Prateeti, Abe, Naoki, Narayanaswami, Chandrasekhar, Xu, Tianyin, Varshney, Lav R., Mahindru, Ruchi, Sailer, Anca, Shwartz, Laura, Sow, Daby, Fuller, Nicholas C. M., Puri, Ruchir
Realizing the vision of using AI agents to automate critical IT tasks depends on the ability to measure and understand effectiveness of proposed solutions. We introduce ITBench, a framework that offers a systematic methodology for benchmarking AI agents to address real-world IT automation tasks. Our initial release targets three key areas: Site Reliability Engineering (SRE), Compliance and Security Operations (CISO), and Financial Operations (FinOps). The design enables AI researchers to understand the challenges and opportunities of AI agents for IT automation with push-button workflows and interpretable metrics. ITBench includes an initial set of 94 real-world scenarios, which can be easily extended by community contributions. Our results show that agents powered by state-of-the-art models resolve only 13.8% of SRE scenarios, 25.2% of CISO scenarios, and 0% of FinOps scenarios. We expect ITBench to be a key enabler of AI-driven IT automation that is correct, safe, and fast.
NP4G : Network Programming for Generalization
Hara, Shoichiro, Watanabe, Yuji
Automatic programming has been actively studied for a long time by various approaches including genetic programming. In recent years, automatic programming using neural networks such as GPT-3 has been actively studied and is attracting a lot of attention. However, these methods are illogical inference based on experience by enormous learning, and their thinking process is unclear. Even using the method by logical inference with a clear thinking process, the system that automatically generates any programs has not yet been realized. Especially, the inductive inference generalized by logical inference from one example is an important issue that the artificial intelligence can acquire knowledge by itself. In this study, we propose NP4G: Network Programming for Generalization, which can automatically generate programs by inductive inference. Because the proposed method can realize "sequence", "selection", and "iteration" in programming and can satisfy the conditions of the structured program theorem, it is expected that NP4G is a method automatically acquire any programs by inductive inference. As an example, we automatically construct a bitwise NOT operation program from several training data by generalization using NP4G. Although NP4G only randomly selects and connects nodes, by adjusting the number of nodes and the number of phase of "Phased Learning", we show the bitwise NOT operation programs are acquired in a comparatively short time and at a rate of about 7 in 10 running. The source code of NP4G is available on GitHub as a public repository.