Goto

Collaborating Authors

 enc


Coded Computing for Resilient Distributed Computing: A Learning-Theoretic Framework

Neural Information Processing Systems

Coded computing has emerged as a promising framework for tackling significant challenges in large-scale distributed computing, including the presence of slow, faulty, or compromised servers. In this approach, each worker node processes a combination of the data, rather than the raw data itself. The final result then is decoded from the collective outputs of the worker nodes. However, there is a significant gap between current coded computing approaches and the broader landscape of general distributed computing, particularly when it comes to machine learning workloads. To bridge this gap, we propose a novel foundation for coded computing, integrating the principles of learning theory, and developing a framework that seamlessly adapts with machine learning applications. In this framework, the objective is to find the encoder and decoder functions that minimize the loss function, defined as the mean squared error between the estimated and true values. Facilitating the search for the optimum decoding and functions, we show that the loss function can be upper-bounded by the summation of two terms: the generalization error of the decoding function and the training error of the encoding function. Focusing on the second-order Sobolev space, we then derive the optimal encoder and decoder.


87f476af4053961667c2c08e9f4b850e-Supplemental-Conference.pdf

Neural Information Processing Systems

We performed all experiments on 4 V100 GPUs. Apart from the root mean square error (RMSE), we also report the energy spectrum error (ESE) forocean current prediction which quantifies the physical consistency. Our model achieves this by learning on multiple tasks simultaneously and then adapting to new tasks with domain transfer. As these are upper bounds, they do not necessarily imply better generalization, although empirically, this is what we observe. Suppose we haveK tasks{ck}Kk=1 C, each of which is sampled from acontinuous parameter spaceC.



Coded Computing for Resilient Distributed Computing: A Learning-Theoretic Framework

Neural Information Processing Systems

Coded computing has emerged as a promising framework for tackling significant challenges in large-scale distributed computing, including the presence of slow, faulty, or compromised servers. In this approach, each worker node processes a combination of the data, rather than the raw data itself. The final result then is decoded from the collective outputs of the worker nodes. However, there is a significant gap between current coded computing approaches and the broader landscape of general distributed computing, particularly when it comes to machine learning workloads. To bridge this gap, we propose a novel foundation for coded computing, integrating the principles of learning theory, and developing a framework that seamlessly adapts with machine learning applications. In this framework, the objective is to find the encoder and decoder functions that minimize the loss function, defined as the mean squared error between the estimated and true values. Facilitating the search for the optimum decoding and functions, we show that the loss function can be upper-bounded by the summation of two terms: the generalization error of the decoding function and the training error of the encoding function. Focusing on the second-order Sobolev space, we then derive the optimal encoder and decoder.


Relative Entropy Regularized Reinforcement Learning for Efficient Encrypted Policy Synthesis

Suh, Jihoon, Jang, Yeongjun, Teranishi, Kaoru, Tanaka, Takashi

arXiv.org Artificial Intelligence

We propose an efficient encrypted policy synthesis to develop privacy-preserving model-based reinforcement learning. We first demonstrate that the relative-entropy-regularized reinforcement learning framework offers a computationally convenient linear and ``min-free'' structure for value iteration, enabling a direct and efficient integration of fully homomorphic encryption with bootstrapping into policy synthesis. Convergence and error bounds are analyzed as encrypted policy synthesis propagates errors under the presence of encryption-induced errors including quantization and bootstrapping. Theoretical analysis is validated by numerical simulations. Results demonstrate the effectiveness of the RERL framework in integrating FHE for encrypted policy synthesis.


Directly Follows Graphs Go Predictive Process Monitoring With Graph Neural Networks

Lischka, Attila, Rauch, Simon, Stritzel, Oliver

arXiv.org Artificial Intelligence

In the past years, predictive process monitoring (PPM) techniques based on artificial neural networks have evolved as a method to monitor the future behavior of business processes. Existing approaches mostly focus on interpreting the processes as sequences, so-called traces, and feeding them to neural architectures designed to operate on sequential data such as recurrent neural networks (RNNs) or transformers. In this study, we investigate an alternative way to perform PPM: by transforming each process in its directly-follows-graph (DFG) representation we are able to apply graph neural networks (GNNs) for the prediction tasks. By this, we aim to develop models that are more suitable for complex processes that are long and contain an abundance of loops. In particular, we present different ways to create DFG representations depending on the particular GNN we use. The tested GNNs range from classical node-based to novel edge-based architectures. Further, we investigate the possibility of using multi-graphs. By these steps, we aim to design graph representations that minimize the information loss when transforming traces into graphs.


General Coded Computing in a Probabilistic Straggler Regime

Moradi, Parsa, Maddah-Ali, Mohammad Ali

arXiv.org Artificial Intelligence

Coded computing has demonstrated promising results in addressing straggler resiliency in distributed computing systems. However, most coded computing schemes are designed for exact computation, requiring the number of responding servers to exceed a certain recovery threshold. Additionally, these schemes are tailored for highly structured functions. Recently, new coded computing schemes for general computing functions, where exact computation is replaced with approximate computation, have emerged. In these schemes, the availability of additional results corresponds to more accurate estimation of computational tasks. This flexibility introduces new questions that need to be addressed. This paper addresses the practically important scenario in the context of general coded computing, where each server may become a straggler with a probability $p$, independently from others. We theoretically analyze the approximation error of two existing general coded computing schemes: Berrut Approximate Coded Computing (BACC) and Learning Theoretic Coded Computing (LeTCC). Under the probabilistic straggler configuration, we demonstrate that the average approximation error for BACC and LeTCC converge to zero with the rate of at least $\mathcal{O}(\log^3_{\frac{1}{p}}(N)\cdot{N^{-3}})$ and $\mathcal{O}(\log^4_{\frac{1}{p}}(N)\cdot{N^{-2}})$, respectively. This is perhaps surprising, as earlier results does not indicate a convergence when the number of stragglers scales with the total number of servers $N$. However, in this case, despite the average number of stragglers being $Np$, the independence of servers in becoming stragglers allows the approximation error to converge to zero. These theoretical results are validated through experiments on various computing functions, including deep neural networks.


Coded Computing: A Learning-Theoretic Framework

Moradi, Parsa, Tahmasebi, Behrooz, Maddah-Ali, Mohammad Ali

arXiv.org Artificial Intelligence

Coded computing has emerged as a promising framework for tackling significant challenges in large-scale distributed computing, including the presence of slow, faulty, or compromised servers. In this approach, each worker node processes a combination of the data, rather than the raw data itself. The final result then is decoded from the collective outputs of the worker nodes. However, there is a significant gap between current coded computing approaches and the broader landscape of general distributed computing, particularly when it comes to machine learning workloads. To bridge this gap, we propose a novel foundation for coded computing, integrating the principles of learning theory, and developing a new framework that seamlessly adapts with machine learning applications. In this framework, the objective is to find the encoder and decoder functions that minimize the loss function, defined as the mean squared error between the estimated and true values. Facilitating the search for the optimum decoding and functions, we show that the loss function can be upper-bounded by the summation of two terms: the generalization error of the decoding function and the training error of the encoding function. Focusing on the second-order Sobolev space, we then derive the optimal encoder and decoder. We show that in the proposed solution, the mean squared error of the estimation decays with the rate of $O(S^4 N^{-3})$ and $O(S^{\frac{8}{5}}N^{\frac{-3}{5}})$ in noiseless and noisy computation settings, respectively, where $N$ is the number of worker nodes with at most $S$ slow servers (stragglers). Finally, we evaluate the proposed scheme on inference tasks for various machine learning models and demonstrate that the proposed framework outperforms the state-of-the-art in terms of accuracy and rate of convergence.


Linking In-context Learning in Transformers to Human Episodic Memory

Ji-An, Li, Zhou, Corey Y., Benna, Marcus K., Mattar, Marcelo G.

arXiv.org Artificial Intelligence

Understanding the connections between artificial and biological intelligent systems can reveal fundamental principles underlying general intelligence. While many artificial intelligence (AI) models have a neuroscience counterpart, such connections are largely missing in Transformer models and the self-attention mechanism. Here, we examine the relationship between attention heads and human episodic memory. We focus on the induction heads, which contribute to the in-context learning capabilities of Transformer-based large language models (LLMs). We demonstrate that induction heads are behaviorally, functionally, and mechanistically similar to the contextual maintenance and retrieval (CMR) model of human episodic memory. Our analyses of LLMs pre-trained on extensive text data show that CMR-like heads often emerge in the intermediate model layers and that their behavior qualitatively mirrors the memory biases seen in humans. Our findings uncover a parallel between the computational mechanisms of LLMs and human memory, offering valuable insights into both research fields.


Natural Language Counterfactuals through Representation Surgery

Avitan, Matan, Cotterell, Ryan, Goldberg, Yoav, Ravfogel, Shauli

arXiv.org Artificial Intelligence

Interventions targeting the representation space of language models (LMs) have emerged as an effective means to influence model behavior. Such methods are employed, for example, to eliminate or alter the encoding of demographic information such as gender within the model's representations and, in so doing, create a counterfactual representation. However, because the intervention operates within the representation space, understanding precisely what aspects of the text it modifies poses a challenge. In this paper, we give a method to convert representation counterfactuals into string counterfactuals. We demonstrate that this approach enables us to analyze the linguistic alterations corresponding to a given representation space intervention and to interpret the features utilized to encode a specific concept. Moreover, the resulting counterfactuals can be used to mitigate bias in classification through data augmentation.