Goto

Collaborating Authors

 Liu, Xuan


Integrating Contact-aware Feedback CPG System for Learning-based Soft Snake Robot Locomotion Controllers

arXiv.org Artificial Intelligence

This paper aims to solve the contact-aware locomotion problem of a soft snake robot by developing bio-inspired contact-aware locomotion controllers. To provide effective contact information for the controllers, we develop a scale-covered sensor structure mimicking natural snakes' scale sensilla. In the design of the control framework, our core contribution is the development of a novel sensory feedback mechanism for the Matsuoka central pattern generator (CPG) network. This mechanism allows the Matsuoka CPG system to work like a "spinal cord" in the whole contact-aware control scheme, which simultaneously takes the stimuli including tonic input signals from the "brain" (a goal-tracking locomotion controller) and sensory feedback signals from the "reflex arc" (the contact reactive controller), and generates rhythmic signals to actuate the soft snake robot to slither through densely allocated obstacles. In the "reflex arc" design, we develop two distinctive types of reactive controllers -- 1) a reinforcement learning (RL) sensor regulator that learns to manipulate the sensory feedback inputs of the CPG system, and 2) a local reflexive sensor-CPG network that directly connects sensor readings and the CPG's feedback inputs in a specific topology. Combining with the locomotion controller and the Matsuoka CPG system, these two reactive controllers facilitate two different contact-aware locomotion control schemes. The two control schemes are tested and evaluated in both simulated and real soft snake robots, showing promising performance in the contact-aware locomotion tasks. The experimental results also validate the advantages of the modified Matsuoka CPG system with a new sensory feedback mechanism for bio-inspired robot controller design.


Hierarchical Optimization-Derived Learning

arXiv.org Artificial Intelligence

In recent years, by utilizing optimization techniques to formulate the propagation of deep model, a variety of so-called Optimization-Derived Learning (ODL) approaches have been proposed to address diverse learning and vision tasks. Although having achieved relatively satisfying practical performance, there still exist fundamental issues in existing ODL methods. In particular, current ODL methods tend to consider model construction and learning as two separate phases, and thus fail to formulate their underlying coupling and depending relationship. In this work, we first establish a new framework, named Hierarchical ODL (HODL), to simultaneously investigate the intrinsic behaviors of optimization-derived model construction and its corresponding learning process. Then we rigorously prove the joint convergence of these two sub-tasks, from the perspectives of both approximation quality and stationary analysis. To our best knowledge, this is the first theoretical guarantee for these two coupled ODL components: optimization and learning. We further demonstrate the flexibility of our framework by applying HODL to challenging learning tasks, which have not been properly addressed by existing ODL methods. Finally, we conduct extensive experiments on both synthetic data and real applications in vision and other learning tasks to verify the theoretical properties and practical performance of HODL in various application scenarios.


Reinforcement Learning of CPG-regulated Locomotion Controller for a Soft Snake Robot

arXiv.org Artificial Intelligence

Intelligent control of soft robots is challenging due to the nonlinear and difficult-to-model dynamics. One promising model-free approach for soft robot control is reinforcement learning (RL). However, model-free RL methods tend to be computationally expensive and data-inefficient and may not yield natural and smooth locomotion patterns for soft robots. In this work, we develop a bio-inspired design of a learning-based goal-tracking controller for a soft snake robot. The controller is composed of two modules: An RL module for learning goal-tracking behaviors given the unmodeled and stochastic dynamics of the robot, and a central pattern generator (CPG) with the Matsuoka oscillators for generating stable and diverse locomotion patterns. We theoretically investigate the maneuverability of Matsuoka CPG's oscillation bias, frequency, and amplitude for steering control, velocity control, and sim-to-real adaptation of the soft snake robot. Based on this analysis, we proposed a composition of RL and CPG modules such that the RL module regulates the tonic inputs to the CPG system given state feedback from the robot, and the output of the CPG module is then transformed into pressure inputs to pneumatic actuators of the soft snake robot. This design allows the RL agent to naturally learn to entrain the desired locomotion patterns determined by the CPG maneuverability. We validated the optimality and robustness of the control design in both simulation and real experiments, and performed extensive comparisons with state-of-art RL methods to demonstrate the benefit of our bio-inspired control design.


Value-Function-based Sequential Minimization for Bi-level Optimization

arXiv.org Artificial Intelligence

Gradient-based Bi-Level Optimization (BLO) methods have been widely applied to handle modern learning tasks. However, most existing strategies are theoretically designed based on restrictive assumptions (e.g., convexity of the lower-level sub-problem), and computationally not applicable for high-dimensional tasks. Moreover, there are almost no gradient-based methods able to solve BLO in those challenging scenarios, such as BLO with functional constraints and pessimistic BLO. In this work, by reformulating BLO into approximated single-level problems, we provide a new algorithm, named Bi-level Value-Function-based Sequential Minimization (BVFSM), to address the above issues. Specifically, BVFSM constructs a series of value-function-based approximations, and thus avoids repeated calculations of recurrent gradient and Hessian inverse required by existing approaches, time-consuming especially for high-dimensional tasks. We also extend BVFSM to address BLO with additional functional constraints. More importantly, BVFSM can be used for the challenging pessimistic BLO, which has never been properly solved before. In theory, we prove the asymptotic convergence of BVFSM on these types of BLO, in which the restrictive lower-level convexity assumption is discarded. To our best knowledge, this is the first gradient-based algorithm that can solve different kinds of BLO (e.g., optimistic, pessimistic, and with constraints) with solid convergence guarantees. Extensive experiments verify the theoretical investigations and demonstrate our superiority on various real-world applications.


Learning to shortcut and shortlist order fulfillment deciding

arXiv.org Artificial Intelligence

With the increase of order fulfillment options and business objectives taken into consideration in the deciding process, order fulfillment deciding is becoming more and more complex. For example, with the advent of ship from store retailers now have many more fulfillment nodes to consider, and it is now common to take into account many and varied business goals in making fulfillment decisions. With increasing complexity, efficiency of the deciding process can become a real concern. Finding the optimal fulfillment assignments among all possible ones may be too costly to do for every order especially during peak times. In this work, we explore the possibility of exploiting regularity in the fulfillment decision process to reduce the burden on the deciding system. By using data mining we aim to find patterns in past fulfillment decisions that can be used to efficiently predict most likely assignments for future decisions. Essentially, those assignments that can be predicted with high confidence can be used to shortcut, or bypass, the expensive deciding process, or else a set of most likely assignments can be used for shortlisting -- sending a much smaller set of candidates for consideration by the fulfillment deciding system.


Improving the Interpretability of Deep Neural Networks with Knowledge Distillation

arXiv.org Machine Learning

Abstract--Deep Neural Networks have achieved huge success at a wide spectrum of applications from language modeling, computer vision to speech recognition. However, nowadays, good performance alone is not sufficient to satisfy the needs of practical deployment where interpretability is demanded for cases involving ethicsand mission critical applications. The complex models of Deep Neural Networks make it hard to understand and reason the predictions, which hinders its further progress. To tackle this problem, we apply the Knowledge Distillation technique to distill Deep Neural Networks into decision trees in order to attain good performance and interpretability simultaneously. We formulate the problem at hand as a multi-output regression problem and the experiments demonstrate that the student model achieves significantly better accuracy performance (about 1% to 5%) than vanilla decision trees at the same level of tree depth. The experiments are implemented on the TensorFlow platform to make it scalable to big datasets. To the best of our knowledge, we are the first to distill Deep Neural Networks into vanilla decision trees on multi-class datasets. I. INTRODUCTION Despite Deep Neural Networks' (DNN) superior discrimination powerin many fields, the logics of each hidden feature representations before the output layer still remain to be blackbox. Understandingwhy a specific prediction is made is of utmost importance for the end-users to trust and adopt the model, and for the system designers to refine the model by performing feature engineering and parameter tuning.


Compositional planning in Markov decision processes: Temporal abstraction meets generalized logic composition

arXiv.org Artificial Intelligence

Abstract-- In hierarchical planning for Markov decision processes (MDPs), temporal abstraction allows planning with macro-actions that take place at different time scale in form of sequential composition. In this paper, we propose a novel approach to compositional reasoning and hierarchical planning for MDPs under temporal logic constraints. In addition to sequential composition, we introduce a composition of policies based on generalized logic composition: Given sub-policies for sub-tasks and a new task expressed as logic compositions of subtasks, a semi-optimal policy, which is optimal in planning with only sub-policies, can be obtained by simply composing sub-polices. Thus, a synthesis algorithm is developed to compute optimal policies efficiently by planning with primitive actions, policies for sub-tasks, and the compositions of sub-policies, for maximizing the probability of satisfying temporal logic specifications. We demonstrate the correctness and efficiency of the proposed method in stochastic planning examples with a single agent and multiple task specifications. I. INTRODUCTION Temporal logic is an expressive language to describe desired system properties: safety, reachability, obligation, stability, and liveness [18]. The algorithms for planning and probabilistic verification with temporal logic constraints have developed, with both centralized [2], [7], [17] and distributed methods [10]. Yet, there are two main barriers to practical applications: 1) The issue of scalability: In temporal logic constrained control problems, it is often necessary to introduce additional memory states for keeping track of the evolution of state variables with respect to these temporal logic constraints. The additional memory states grow exponentially (or double exponentially depending on the class of temporal logic) in the length of a specification [11] and make synthesis computational extensive.


Interpretable Deep Convolutional Neural Networks via Meta-learning

arXiv.org Machine Learning

Model interpretability is a requirement in many applications in which crucial decisions are made by users relying on a model's outputs. The recent movement for "algorithmic fairness" also stipulates explainability, and therefore interpretability of learning models. And yet the most successful contemporary Machine Learning approaches, the Deep Neural Networks, produce models that are highly non-interpretable. We attempt to address this challenge by proposing a technique called CNN-INTE to interpret deep Convolutional Neural Networks (CNN) via meta-learning. In this work, we interpret a specific hidden layer of the deep CNN model on the MNIST image dataset. We use a clustering algorithm in a two-level structure to find the meta-level training data and Random Forest as base learning algorithms to generate the meta-level test data. The interpretation results are displayed visually via diagrams, which clearly indicates how a specific test instance is classified. Our method achieves global interpretation for all the test instances without sacrificing the accuracy obtained by the original deep CNN model. This means our model is faithful to the deep CNN model, which leads to reliable interpretations.