nodeid
Translating Federated Learning Algorithms in Python into CSP Processes Using ChatGPT
Popovic, Miroslav, Popovic, Marko, Djukic, Miodrag, Basicevic, Ilija
The Python Testbed for Federated Learning Algorithms is a simple Python FL framework that is easy to use by ML&AI developers who do not need to be professional programmers and is also amenable to LLMs. In the previous research, generic federated learning algorithms provided by this framework were manually translated into the CSP processes and algorithms' safety and liveness properties were automatically verified by the model checker PAT. In this paper, a simple translation process is introduced wherein the ChatGPT is used to automate the translation of the mentioned federated learning algorithms in Python into the corresponding CSP processes. Within the process, the minimality of the used context is estimated based on the feedback from ChatGPT. The proposed translation process was experimentally validated by successful translation (verified by the model checker PAT) of both generic centralized and decentralized federated learning algorithms.
Learning Attributed Graphlets: Predictive Graph Mining by Graphlets with Trainable Attribute
Shinji, Tajima, Sugihara, Ren, Kitahara, Ryota, Karasuyama, Masayuki
The graph classification problem has been widely studied; however, achieving an interpretable model with high predictive performance remains a challenging issue. This paper proposes an interpretable classification algorithm for attributed graph data, called LAGRA (Learning Attributed GRAphlets). LAGRA learns importance weights for small attributed subgraphs, called attributed graphlets (AGs), while simultaneously optimizing their attribute vectors. This enables us to obtain a combination of subgraph structures and their attribute vectors that strongly contribute to discriminating different classes. A significant characteristics of LAGRA is that all the subgraph structures in the training dataset can be considered as a candidate structures of AGs. This approach can explore all the potentially important subgraphs exhaustively, but obviously, a naive implementation can require a large amount of computations. To mitigate this issue, we propose an efficient pruning strategy by combining the proximal gradient descent and a graph mining tree search. Our pruning strategy can ensure that the quality of the solution is maintained compared to the result without pruning. We empirically demonstrate that LAGRA has superior or comparable prediction performance to the standard existing algorithms including graph neural networks, while using only a small number of AGs in an interpretable manner.
Self-Supervised Learning to Prove Equivalence Between Straight-Line Programs via Rewrite Rules
Kommrusch, Steve, Monperrus, Martin, Pouchet, Louis-Noël
We target the problem of automatically synthesizing proofs of semantic equivalence between two programs made of sequences of statements. We represent programs using abstract syntax trees (AST), where a given set of semantics-preserving rewrite rules can be applied on a specific AST pattern to generate a transformed and semantically equivalent program. In our system, two programs are equivalent if there exists a sequence of application of these rewrite rules that leads to rewriting one program into the other. We propose a neural network architecture based on a transformer model to generate proofs of equivalence between program pairs. The system outputs a sequence of rewrites, and the validity of the sequence is simply checked by verifying it can be applied. If no valid sequence is produced by the neural network, the system reports the programs as non-equivalent, ensuring by design no programs may be incorrectly reported as equivalent. Our system is fully implemented for one single grammar which can represent straight-line programs with function calls and multiple types. To efficiently train the system to generate such sequences, we develop an original incremental training technique, named self-supervised sample selection. We extensively study the effectiveness of this novel training approach on proofs of increasing complexity and length. Our system, S4Eq, achieves 97% proof success on a curated dataset of 10,000 pairs of equivalent programs.
DeepTaskAPT: Insider APT detection using Task-tree based Deep Learning
APT, known as Advanced Persistent Threat, is a difficult challenge for cyber defence. These threats make many traditional defences ineffective as the vulnerabilities exploited by these threats are insiders who have access to and are within the network. This paper proposes DeepTaskAPT, a heterogeneous task-tree based deep learning method to construct a baseline model based on sequences of tasks using a Long Short-Term Memory (LSTM) neural network that can be applied across different users to identify anomalous behaviour. Rather than applying the model to sequential log entries directly, as most current approaches do, DeepTaskAPT applies a process tree based task generation method to generate sequential log entries for the deep learning model. To assess the performance of DeepTaskAPT, we use a recently released synthetic dataset, DARPA Operationally Transparent Computing (OpTC) dataset and a real-world dataset, Los Alamos National Laboratory (LANL) dataset. Both of them are composed of host-based data collected from sensors. Our results show that DeepTaskAPT outperforms similar approaches e.g. DeepLog and the DeepTaskAPT baseline model demonstrate its capability to detect malicious traces in various attack scenarios while having high accuracy and low false-positive rates. To the best of knowledge this is the very first attempt of using recently introduced OpTC dataset for cyber threat detection.