Not enough data to create a plot.
Try a different view from the menu above.
Chowdhary, Girish
Energy Shaping Control of a CyberOctopus Soft Arm
Chang, Heng-Sheng, Halder, Udit, Shih, Chia-Hsien, Tekinalp, Arman, Parthasarathy, Tejaswin, Gribkova, Ekaterina, Chowdhary, Girish, Gillette, Rhanor, Gazzola, Mattia, Mehta, Prashant G.
This paper entails application of the energy shaping methodology to control a flexible, elastic Cosserat rod model. Recent interest in such continuum models stems from applications in soft robotics, and from the growing recognition of the role of mechanics and embodiment in biological control strategies: octopuses are often regarded as iconic examples of this interplay. Here, the dynamics of the Cosserat rod, modeling a single octopus arm, are treated as a Hamiltonian system and the internal muscle actuators are modeled as distributed forces and couples. The proposed energy shaping control design procedure involves two steps: (1) a potential energy is designed such that its minimizer is the desired equilibrium configuration; (2) an energy shaping control law is implemented to reach the desired equilibrium. By interpreting the controlled Hamiltonian as a Lyapunov function, asymptotic stability of the equilibrium configuration is deduced. The energy shaping control law is shown to require only the deformations of the equilibrium configuration. A forward-backward algorithm is proposed to compute these deformations in an online iterative manner. The overall control design methodology is implemented and demonstrated in a dynamic simulation environment. Results of several bio-inspired numerical experiments involving the control of octopus arms are reported.
Learning to Cope with Adversarial Attacks
Lee, Xian Yeow, Havens, Aaron, Chowdhary, Girish, Sarkar, Soumik
The security of Deep Reinforcement Learning (Deep RL) algorithms deployed in real life applications are of a primary concern. In particular, the robustness of RL agents in cyber-physical systems against adversarial attacks are especially vital since the cost of a malevolent intrusions can be extremely high. Studies have shown Deep Neural Networks (DNN), which forms the core decision-making unit in most modern RL algorithms, are easily subjected to adversarial attacks. Hence, it is imperative that RL agents deployed in real-life applications have the capability to detect and mitigate adversarial attacks in an online fashion. An example of such a framework is the Meta-Learned Advantage Hierarchy (MLAH) agent that utilizes a meta-learning framework to learn policies robustly online. Since the mechanism of this framework are still not fully explored, we conducted multiple experiments to better understand the framework's capabilities and limitations. Our results shows that the MLAH agent exhibits interesting coping behaviors when subjected to different adversarial attacks to maintain a nominal reward. Additionally, the framework exhibits a hierarchical coping capability, based on the adaptability of the Master policy and sub-policies themselves. From empirical results, we also observed that as the interval of adversarial attacks increase, the MLAH agent can maintain a higher distribution of rewards, though at the cost of higher instabilities.
Cross-Domain Transfer in Reinforcement Learning using Target Apprentice
Joshi, Girish, Chowdhary, Girish
In this paper, we present a new approach to Transfer Learning (TL) in Reinforcement Learning (RL) for cross-domain tasks. Many of the available techniques approach the transfer architecture as a method of speeding up the target task learning. We propose to adapt and reuse the mapped source task optimal-policy directly in related domains. We show the optimal policy from a related source task can be near optimal in target domain provided an adaptive policy accounts for the model error between target and source. The main benefit of this policy augmentation is generalizing policies across multiple related domains without having to re-learn the new tasks. Our results show that this architecture leads to better sample efficiency in the transfer, reducing sample complexity of target task learning to target apprentice learning.
Kernel Observers: Systems-Theoretic Modeling and Inference of Spatiotemporally Evolving Processes
Kingravi, Hassan A., Maske, Harshal R., Chowdhary, Girish
We consider the problem of estimating the latent state of a spatiotemporally evolving continuous function using very few sensor measurements. We show that layering a dynamical systems prior over temporal evolution of weights of a kernel model is a valid approach to spatiotemporal modeling that does not necessarily require the design of complex nonstationary kernels. Furthermore, we show that such a predictive model can be utilized to determine sensing locations that guarantee that the hidden state of the phenomena can be recovered with very few measurements. We provide sufficient conditions on the number and spatial location of samples required to guarantee state recovery, and provide a lower bound on the minimum number of samples required to robustly infer the hidden states. Our approach outperforms existing methods in numerical experiments.
Trusting Learning Based Adaptive Flight Control Algorithms
Mühlegg, Maximilian (Technische Universität München) | Holzapfel, Florian (Technische Universität München) | Chowdhary, Girish (Oklahoma State University)
Autonomous unmanned aerial systems (UAS) are envisioned to become increasingly utilized in commercial airspace. In order to be attractive for commercial applications, UAS are required to undergo a quick development cycle, ensure cost effectiveness and work reliably in changing environments. Learning based adaptive control systems have been proposed to meet these demands. These techniques promise more flexibility when compared with traditional linear control techniques. However, no consistent verification and validation (V&V) framework exists for adaptive controllers. The underlying purpose of the V&V processes in certifying control algorithms for aircraft is to build trust in a safety critical system. In the past, most adaptive control algorithms were solely designed to ensure stability of a model system and meet robustness requirements against selective uncertainties and disturbances. However, these assessments do not guarantee reliable performance of the real system required by the V&V process. The question arises how trust can be defined for learning based adaptive control algorithms. From our perspective, self-confidence of an adaptive flight controller will be an integral part of building trust in the system. The notion of self-confidence in the adaptive control context relates to the estimate of the adaptive controller in its capabilities to operate reliably, and its ability to foresee the need for taking action before undesired behaviors lead to a loss of the system. In this paper we present a pathway to a possible answer to the question of how self-confidence for adaptive controllers can be achieved. In particular, we elaborate how algorithms for diagnosis and prognosis can be integrated to help in this process.
Uninformed-to-Informed Exploration in Unstructured Real-World Environments
Axelrod, Allan (Oklahoma State University) | Chowdhary, Girish (Oklahoma State University)
Conventionally, the process of learning the model (exploration) is initialized as either an uninformed or informed policy, where the latter leverages observations to guide future exploration. Informed exploration is ideal as it may allow a model to be learned in fewer samples. However, informed exploration cannot be implemented from the onset when a-priori knowledge on the sensing domain statistics are not available; such policies would only sample the first set of locations, repeatedly. Hence, we present a theoretically-derived bound for transitioning from uninformed exploration to informed exploration for unstructured real-world environments which may be partially-observable and time-varying. This bound is used in tandem with a sparsified Bayesian nonparametric Poisson Exposure Process, which is used to learn to predict the value of information in partiallyobservable and time-varying domains. The result is an uninformed-to-informed exploration policy which outperforms baseline algorithms in real-world data-sets.