Goto

Collaborating Authors

 Atia, George


Explainable Adversarial Attacks on Coarse-to-Fine Classifiers

arXiv.org Artificial Intelligence

Traditional adversarial attacks typically aim to alter the predicted labels of input images by generating perturbations that are imperceptible to the human eye. However, these approaches often lack explainability. Moreover, most existing work on adversarial attacks focuses on single-stage classifiers, but multi-stage classifiers are largely unexplored. In this paper, we introduce instance-based adversarial attacks for multi-stage classifiers, leveraging Layer-wise Relevance Propagation (LRP), which assigns relevance scores to pixels based on their influence on classification outcomes. Our approach generates explainable adversarial perturbations by utilizing LRP to identify and target key features critical for both coarse and fine-grained classifications. Unlike conventional attacks, our method not only induces misclassification but also enhances the interpretability of the model's behavior across classification stages, as demonstrated by experimental results.


Bayesian Inverse Reinforcement Learning for Non-Markovian Rewards

arXiv.org Artificial Intelligence

Inverse reinforcement learning (IRL) is the problem of inferring a reward function from expert behavior. There are several approaches to IRL, but most are designed to learn a Markovian reward. However, a reward function might be non-Markovian, depending on more than just the current state, such as a reward machine (RM). Although there has been recent work on inferring RMs, it assumes access to the reward signal, absent in IRL. We propose a Bayesian IRL (BIRL) framework for inferring RMs directly from expert behavior, requiring significant changes to the standard framework. We define a new reward space, adapt the expert demonstration to include history, show how to compute the reward posterior, and propose a novel modification to simulated annealing to maximize this posterior. We demonstrate that our method performs well when optimizing according to its inferred reward and compares favorably to an existing method that learns exclusively binary non-Markovian rewards.


Automaton Distillation: Neuro-Symbolic Transfer Learning for Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Reinforcement learning (RL) is a powerful tool for finding optimal policies in sequential decision processes. However, deep RL methods suffer from two weaknesses: collecting the amount of agent experience required for practical RL problems is prohibitively expensive, and the learned policies exhibit poor generalization on tasks outside of the training distribution. To mitigate these issues, we introduce automaton distillation, a form of neuro-symbolic transfer learning in which Q-value estimates from a teacher are distilled into a low-dimensional representation in the form of an automaton. We then propose two methods for generating Q-value estimates: static transfer, which reasons over an abstract Markov Decision Process constructed based on prior knowledge, and dynamic transfer, where symbolic information is extracted from a teacher Deep Q-Network (DQN). The resulting Q-value estimates from either method are used to bootstrap learning in the target environment via a modified DQN loss function. We list several failure modes of existing automaton-based transfer methods and demonstrate that both static and dynamic automaton distillation decrease the time required to find optimal policies for various decision tasks.


Model-Free Robust Average-Reward Reinforcement Learning

arXiv.org Artificial Intelligence

Two performance criteria are commonly used for infinitehorizon MDPs: 1) the discounted-reward setting, where Robust Markov decision processes (MDPs) address the reward is discounted exponentially with time; and 2) the challenge of model uncertainty by optimizing the average-reward setting, where the long-term averagereward the worst-case performance over an uncertainty over time is of interest. For systems that operate for set of MDPs. In this paper, we focus on the an extended period of time, e.g., queue control, inventory robust average-reward MDPs under the modelfree management in supply chains, or communication networks, setting. We first theoretically characterize it is more important to optimize the average-reward since the structure of solutions to the robust averagereward policies obtained from the discounted-reward setting may Bellman equation, which is essential for be myopic and have poor long-term performance (Kazemi our later convergence analysis.


Robust Average-Reward Markov Decision Processes

arXiv.org Artificial Intelligence

In robust Markov decision processes (MDPs), the uncertainty in the transition kernel is addressed by finding a policy that optimizes the worst-case performance over an uncertainty set of MDPs. While much of the literature has focused on discounted MDPs, robust average-reward MDPs remain largely unexplored. In this paper, we focus on robust average-reward MDPs, where the goal is to find a policy that optimizes the worst-case average reward over an uncertainty set. We first take an approach that approximates average-reward MDPs using discounted MDPs. We prove that the robust discounted value function converges to the robust average-reward as the discount factor $\gamma$ goes to $1$, and moreover, when $\gamma$ is large, any optimal policy of the robust discounted MDP is also an optimal policy of the robust average-reward. We further design a robust dynamic programming approach, and theoretically characterize its convergence to the optimum. Then, we investigate robust average-reward MDPs directly without using discounted MDPs as an intermediate step. We derive the robust Bellman equation for robust average-reward MDPs, prove that the optimal policy can be derived from its solution, and further design a robust relative value iteration algorithm that provably finds its solution, or equivalently, the optimal robust policy.


On the Robustness of AlphaFold: A COVID-19 Case Study

arXiv.org Artificial Intelligence

Protein folding neural networks (PFNNs) such as AlphaFold predict remarkably accurate structures of proteins compared to other approaches. However, the robustness of such networks has heretofore not been explored. This is particularly relevant given the broad social implications of such technologies and the fact that biologically small perturbations in the protein sequence do not generally lead to drastic changes in the protein structure. In this paper, we demonstrate that AlphaFold does not exhibit such robustness despite its high accuracy. This raises the challenge of detecting and quantifying the extent to which these predicted protein structures can be trusted. To measure the robustness of the predicted structures, we utilize (i) the root-mean-square deviation (RMSD) and (ii) the Global Distance Test (GDT) similarity measure between the predicted structure of the original sequence and the structure of its adversarially perturbed version. We prove that the problem of minimally perturbing protein sequences to fool protein folding neural networks is NP-complete. Based on the well-established BLOSUM62 sequence alignment scoring matrix, we generate adversarial protein sequences and show that the RMSD between the predicted protein structure and the structure of the original sequence are very large when the adversarial changes are bounded by (i) 20 units in the BLOSUM62 distance, and (ii) five residues (out of hundreds or thousands of residues) in the given protein sequence. In our experimental evaluation, we consider 111 COVID-19 proteins in the Universal Protein resource (UniProt), a central resource for protein data managed by the European Bioinformatics Institute, Swiss Institute of Bioinformatics, and the US Protein Information Resource. These result in an overall GDT similarity test score average of around 34%, demonstrating a substantial drop in the performance of AlphaFold.


Learning Probabilistic Reward Machines from Non-Markovian Stochastic Reward Processes

arXiv.org Machine Learning

The success of reinforcement learning in typical settings is, in part, predicated on underlying Markovian assumptions on the reward signal by which an agent learns optimal policies. In recent years, the use of reward machines has relaxed this assumption by enabling a structured representation of non-Markovian rewards. In particular, such representations can be used to augment the state space of the underlying decision process, thereby facilitating non-Markovian reinforcement learning. However, these reward machines cannot capture the semantics of stochastic reward signals. In this paper, we make progress on this front by introducing probabilistic reward machines (PRMs) as a representation of non-Markovian stochastic rewards. We present an algorithm to learn PRMs from the underlying decision process as well as to learn the PRM representation of a given decision-making policy.


Controller Synthesis for Omega-Regular and Steady-State Specifications

arXiv.org Artificial Intelligence

Given a Markov decision process (MDP) and a linear-time ($\omega$-regular or LTL) specification, the controller synthesis problem aims to compute the optimal policy that satisfies the specification. More recently, problems that reason over the asymptotic behavior of systems have been proposed through the lens of steady-state planning. This entails finding a control policy for an MDP such that the Markov chain induced by the solution policy satisfies a given set of constraints on its steady-state distribution. This paper studies a generalization of the controller synthesis problem for a linear-time specification under steady-state constraints on the asymptotic behavior. We present an algorithm to find a deterministic policy satisfying $\omega$-regular and steady-state constraints by characterizing the solutions as an integer linear program, and experimentally evaluate our approach.


Rediscovering Deep Neural Networks in Finite-State Distributions

arXiv.org Machine Learning

We propose a new way of thinking about deep neural networks, in which the linear and non-linear components of the network are naturally derived and justified in terms of principles in probability theory. In particular, the models constructed in our framework assign probabilities to uncertain realizations, leading to Kullback-Leibler Divergence (KLD) as the linear layer. In our model construction, we also arrive at a structure similar to ReLU activation supported with Bayes' theorem. The non-linearities in our framework are normalization layers with ReLU and Sigmoid as element-wise approximations. Additionally, the pooling function is derived as a marginalization of spatial random variables according to the mechanics of the framework. As such, Max Pooling is an approximation to the aforementioned marginalization process. Since our models are comprised of finite state distributions (FSD) as variables and parameters, exact computation of information-theoretic quantities such as entropy and KLD is possible, thereby providing more objective measures to analyze networks. Unlike existing designs that rely on heuristics, the proposed framework restricts subjective interpretations of CNNs and sheds light on the functionality of neural networks from a completely new perspective.


Randomized Robust Matrix Completion for the Community Detection Problem

arXiv.org Machine Learning

This paper focuses on the unsupervised clustering of large partially observed graphs. We propose a provable randomized framework in which a clustering algorithm is applied to a graphs adjacency matrix generated from a stochastic block model. A sub-matrix is constructed using random sampling, and the low rank component is found using a convex-optimization based matrix completion algorithm. The clusters are then identified based on this low rank component using a correlation based retrieval step. Additionally, a new random node sampling algorithm is presented which significantly improves upon the performance of the clustering algorithm with unbalanced data. Given a partially observed graph with adjacency matrix A \in R^{N \times N} , the proposed approach can reduce the computational complexity from O(N^2) to O(N).