Kumar, Harshit
ITBench: Evaluating AI Agents across Diverse Real-World IT Automation Tasks
Jha, Saurabh, Arora, Rohan, Watanabe, Yuji, Yanagawa, Takumi, Chen, Yinfang, Clark, Jackson, Bhavya, Bhavya, Verma, Mudit, Kumar, Harshit, Kitahara, Hirokuni, Zheutlin, Noah, Takano, Saki, Pathak, Divya, George, Felix, Wu, Xinbo, Turkkan, Bekir O., Vanloo, Gerard, Nidd, Michael, Dai, Ting, Chatterjee, Oishik, Gupta, Pranjal, Samanta, Suranjana, Aggarwal, Pooja, Lee, Rong, Murali, Pavankumar, Ahn, Jae-wook, Kar, Debanjana, Rahane, Ameet, Fonseca, Carlos, Paradkar, Amit, Deng, Yu, Moogi, Pratibha, Mohapatra, Prateeti, Abe, Naoki, Narayanaswami, Chandrasekhar, Xu, Tianyin, Varshney, Lav R., Mahindru, Ruchi, Sailer, Anca, Shwartz, Laura, Sow, Daby, Fuller, Nicholas C. M., Puri, Ruchir
Realizing the vision of using AI agents to automate critical IT tasks depends on the ability to measure and understand effectiveness of proposed solutions. We introduce ITBench, a framework that offers a systematic methodology for benchmarking AI agents to address real-world IT automation tasks. Our initial release targets three key areas: Site Reliability Engineering (SRE), Compliance and Security Operations (CISO), and Financial Operations (FinOps). The design enables AI researchers to understand the challenges and opportunities of AI agents for IT automation with push-button workflows and interpretable metrics. ITBench includes an initial set of 94 real-world scenarios, which can be easily extended by community contributions. Our results show that agents powered by state-of-the-art models resolve only 13.8% of SRE scenarios, 25.2% of CISO scenarios, and 0% of FinOps scenarios. We expect ITBench to be a key enabler of AI-driven IT automation that is correct, safe, and fast.
A Dynamical Systems-Inspired Pruning Strategy for Addressing Oversmoothing in Graph Neural Networks
Chakraborty, Biswadeep, Kumar, Harshit, Mukhopadhyay, Saibal
Graph Neural Networks (GNNs) Wu et al. [2020] have emerged as an important component in contemporary machine learning, excelling in tasks that require the analysis of graph-structured data. Their capacity to model complex relationships between nodes and edges has driven their widespread application in fields ranging from molecular property prediction Gilmer et al. [2017], Reiser et al. [2022], Gasteiger et al. [2021] to social network analysis Kipf and Welling [2017], Fan et al. [2019] and recommendation systems Ying et al. [2018]. However, one significant challenge that GNNs face is the phenomenon known as oversmoothing. As the depth of the GNN increases, node representations tend to homogenize, leading to a decline in the network's ability to differentiate between nodes, ultimately impairing performance Li et al. [2018]. Oversmoothing in GNNs has been extensively studied, with early works such as Li et al. [2018] identifying it as a critical issue in deep architectures like Graph Convolutional Networks (GCNs). Subsequent theoretical analyses Oono and Suzuki [2020], Cai and Wang [2020], Keriven [2022], Chen et al. [2020], Xu et al. [2019] have confirmed that oversmoothing is a fundamental problem in message-passing architectures, where repeated aggregation leads to the homogenization of node features. To counteract this, various strategies have been proposed, such as residual connections and skip connections Li et al. [2019], Xu et al. [2018], normalization methods Ba et al. [2016], Ioffe and Szegedy [2015], Zhou et al. [2020], and attention mechanisms Velickovic et al. [2018]. However, these approaches primarily involve architectural modifications that do not fundamentally address the propagation dynamics responsible for oversmoothing.
Learning Locally Interacting Discrete Dynamical Systems: Towards Data-Efficient and Scalable Prediction
Kang, Beomseok, Kumar, Harshit, Lee, Minah, Chakraborty, Biswadeep, Mukhopadhyay, Saibal
Locally interacting dynamical systems, such as epidemic spread, rumor propagation through crowd, and forest fire, exhibit complex global dynamics originated from local, relatively simple, and often stochastic interactions between dynamic elements. Their temporal evolution is often driven by transitions between a finite number of discrete states. Despite significant advancements in predictive modeling through deep learning, such interactions among many elements have rarely explored as a specific domain for predictive modeling. We present Attentive Recurrent Neural Cellular Automata (AR-NCA), to effectively discover unknown local state transition rules by associating the temporal information between neighboring cells in a permutation-invariant manner. AR-NCA exhibits the superior generalizability across various system configurations (i.e., spatial distribution of states), data efficiency and robustness in extremely data-limited scenarios even in the presence of stochastic interactions, and scalability through spatial dimension-independent prediction. Our code and supplementary material are available in https://github.com/beomseokg/ARNCA. Keywords: Local Interaction, Discrete Dynamical System, Neural Cellular Automata
Has the Deep Neural Network learned the Stochastic Process? A Wildfire Perspective
Kumar, Harshit, Kang, Beomseok, Chakraborty, Biswadeep, Mukhopadhyay, Saibal
This paper presents the first systematic study of evalution of Deep Neural Network (DNN) designed and trained to predict the evolution of a stochastic dynamical system, using wildfire prediction as a case study. We show that traditional evaluation methods based on threshold based classification metrics and error-based scoring rules assess a DNN's ability to replicate the observed ground truth (GT), but do not measure the fidelity of the DNN's learning of the underlying stochastic process. To address this gap, we propose a new system property: Statistic-GT, representing the GT of the stochastic process, and an evaluation metric that exclusively assesses fidelity to Statistic-GT. Utilizing a synthetic dataset, we introduce a stochastic framework to characterize this property and establish criteria for a metric to be a valid measure of the proposed property. We formally show that Expected Calibration Error (ECE) tests the necessary condition for fidelity to Statistic-GT. We perform empirical experiments, differentiating ECE's behavior from conventional metrics and demonstrate that ECE exclusively measures fidelity to the stochastic process. Extending our analysis to real-world wildfire data, we highlight the limitations of traditional evaluation methods and discuss the utility of evaluating fidelity to the stochastic process alongside existing metrics.
Towards Robust Real-Time Hardware-based Mobile Malware Detection using Multiple Instance Learning Formulation
Kumar, Harshit, Sharma, Sudarshan, Chakraborty, Biswadeep, Mukhopadhyay, Saibal
This study introduces RT-HMD, a Hardware-based Malware Detector (HMD) for mobile devices, that refines malware representation in segmented time-series through a Multiple Instance Learning (MIL) approach. We address the mislabeling issue in real-time HMDs, where benign segments in malware time-series incorrectly inherit malware labels, leading to increased false positives. Utilizing the proposed Malicious Discriminative Score within the MIL framework, RT-HMD effectively identifies localized malware behaviors, thereby improving the predictive accuracy. Empirical analysis, using a hardware telemetry dataset collected from a mobile platform across 723 benign and 1033 malware samples, shows a 5% precision boost while maintaining recall, outperforming baselines affected by mislabeled benign segments.
Sparse Spiking Neural Network: Exploiting Heterogeneity in Timescales for Pruning Recurrent SNN
Chakraborty, Biswadeep, Kang, Beomseok, Kumar, Harshit, Mukhopadhyay, Saibal
Recurrent Spiking Neural Networks (RSNNs) have emerged as a computationally efficient and brain-inspired learning model. The design of sparse RSNNs with fewer neurons and synapses helps reduce the computational complexity of RSNNs. Traditionally, sparse SNNs are obtained by first training a dense and complex SNN for a target task, and, then, pruning neurons with low activity (activity-based pruning) while maintaining task performance. In contrast, this paper presents a task-agnostic methodology for designing sparse RSNNs by pruning a large randomly initialized model. We introduce a novel Lyapunov Noise Pruning (LNP) algorithm that uses graph sparsification methods and utilizes Lyapunov exponents to design a stable sparse RSNN from a randomly initialized RSNN. We show that the LNP can leverage diversity in neuronal timescales to design a sparse Heterogeneous RSNN (HRSNN). Further, we show that the same sparse HRSNN model can be trained for different tasks, such as image classification and temporal prediction. We experimentally show that, in spite of being task-agnostic, LNP increases computational efficiency (fewer neurons and synapses) and prediction performance of RSNNs compared to traditional activity-based pruning of trained dense models.
AutoMixer for Improved Multivariate Time-Series Forecasting on Business and IT Observability Data
Palaskar, Santosh, Ekambaram, Vijay, Jati, Arindam, Gantayat, Neelamadhav, Saha, Avirup, Nagar, Seema, Nguyen, Nam H., Dayama, Pankaj, Sindhgatta, Renuka, Mohapatra, Prateeti, Kumar, Harshit, Kalagnanam, Jayant, Hemachandra, Nandyala, Rangaraj, Narayan
The efficiency of business processes relies on business key performance indicators (Biz-KPIs), that can be negatively impacted by IT failures. Business and IT Observability (BizITObs) data fuses both Biz-KPIs and IT event channels together as multivariate time series data. Forecasting Biz-KPIs in advance can enhance efficiency and revenue through proactive corrective measures. However, BizITObs data generally exhibit both useful and noisy inter-channel interactions between Biz-KPIs and IT events that need to be effectively decoupled. This leads to suboptimal forecasting performance when existing multivariate forecasting models are employed. To address this, we introduce AutoMixer, a time-series Foundation Model (FM) approach, grounded on the novel technique of channel-compressed pretrain and finetune workflows. AutoMixer leverages an AutoEncoder for channel-compressed pretraining and integrates it with the advanced TSMixer model for multivariate time series forecasting. This fusion greatly enhances the potency of TSMixer for accurate forecasts and also generalizes well across several downstream tasks. Through detailed experiments and dashboard analytics, we show AutoMixer's capability to consistently improve the Biz-KPI's forecasting accuracy (by 11-15\%) which directly translates to actionable business insights.
Learning Representations on Logs for AIOps
Gupta, Pranjal, Kumar, Harshit, Kar, Debanjana, Bhukar, Karan, Aggarwal, Pooja, Mohapatra, Prateeti
AI for IT Operations (AIOps) is a powerful platform that Site Reliability Engineers (SREs) use to automate and streamline operational workflows with minimal human intervention. Automated log analysis is a critical task in AIOps as it provides key insights for SREs to identify and address ongoing faults. Tasks such as log format detection, log classification, and log parsing are key components of automated log analysis. Most of these tasks require supervised learning; however, there are multiple challenges due to limited labelled log data and the diverse nature of log data. Large Language Models (LLMs) such as BERT and GPT3 are trained using self-supervision on a vast amount of unlabeled data. These models provide generalized representations that can be effectively used for various downstream tasks with limited labelled data. Motivated by the success of LLMs in specific domains like science and biology, this paper introduces a LLM for log data which is trained on public and proprietary log data. The results of our experiments demonstrate that the proposed LLM outperforms existing models on multiple downstream tasks. In summary, AIOps powered by LLMs offers an efficient and effective solution for automating log analysis tasks and enabling SREs to focus on higher-level tasks. Our proposed LLM, trained on public and proprietary log data, offers superior performance on multiple downstream tasks, making it a valuable addition to the AIOps platform.
Forecasting Local Behavior of Self-organizing Many-agent System without Reconstruction
Kang, Beomseok, Lee, Minah, Kumar, Harshit, Mukhopadhyay, Saibal
Large multi-agent systems are often driven by locally defined agent interactions, which is referred to as self-organization. Our primary objective is to determine when the propagation of such local interactions will reach a specific agent of interest. Although conventional approaches that reconstruct all agent states can be used, they may entail unnecessary computational costs. In this paper, we investigate a CNN-LSTM model to forecast the state of a particular agent in a large self-organizing multi-agent system without the reconstruction. The proposed model comprises a CNN encoder to represent the system in a low-dimensional vector, a LSTM module to learn agent dynamics in the vector space, and a MLP decoder to predict the future state of an agent. As an example, we consider a forest fire model where we aim to predict when a particular tree agent will start burning. We compare the proposed model with reconstruction-based approaches such as CNN-LSTM and ConvLSTM. The proposed model exhibits similar or slightly worse AUC but significantly reduces computational costs such as activation than ConvLSTM. Moreover, it achieves higher AUC with less computation than the recontruction-based CNN-LSTM.
Picking Pearl From Seabed: Extracting Artefacts from Noisy Issue Triaging Collaborative Conversations for Hybrid Cloud Services
Azad, Amar Prakash, Ghosh, Supriyo, Gupta, Ajay, Kumar, Harshit, Mohapatra, Prateeti
Site Reliability Engineers (SREs) play a key role in issue identification and resolution. After an issue is reported, SREs come together in a virtual room (collaboration platform) to triage the issue. While doing so, they leave behind a wealth of information which can be used later for triaging similar issues. However, usability of the conversations offer challenges due to them being i) noisy and ii) unlabelled. This paper presents a novel approach for issue artefact extraction from the noisy conversations with minimal labelled data. We propose a combination of unsupervised and supervised model with minimum human intervention that leverages domain knowledge to predict artefacts for a small amount of conversation data and use that for fine-tuning an already pretrained language model for artefact prediction on a large amount of conversation data. Experimental results on our dataset show that the proposed ensemble of unsupervised and supervised model is better than using either one of them individually.