Undirected Networks
Reinforcement Learning for Strategic Recommendations
Theocharous, Georgios, Chandak, Yash, Thomas, Philip S., de Nijs, Frits
Strategic recommendations (SR) refer to the problem where an intelligent agent observes the sequential behaviors and activities of users and decides when and how to interact with them to optimize some long-term objectives, both for the user and the business. These systems are in their infancy in the industry and in need of practical solutions to some fundamental research challenges. At Adobe research, we have been implementing such systems for various use-cases, including points of interest recommendations, tutorial recommendations, next step guidance in multi-media editing software, and ad recommendation for optimizing lifetime value. There are many research challenges when building these systems, such as modeling the sequential behavior of users, deciding when to intervene and offer recommendations without annoying the user, evaluating policies offline with high confidence, safe deployment, non-stationarity, building systems from passive data that do not contain past recommendations, resource constraint optimization in multi-user systems, scaling to large and dynamic actions spaces, and handling and incorporating human cognitive biases. In this paper we cover various use-cases and research challenges we solved to make these systems practical.
Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL
Chen, Xiaoyu, Hu, Jiachen, Li, Lihong, Wang, Liwei
Reinforcement learning (RL) in episodic, factored Markov decision processes (FMDPs) is studied. We propose an algorithm called FMDP-BF, which leverages the factorization structure of FMDP. The regret of FMDP-BF is shown to be exponentially smaller than that of optimal algorithms designed for non-factored MDPs, and improves on the best previous result for FMDPs~\citep{osband2014near} by a factored of $\sqrt{H|\mathcal{S}_i|}$, where $|\mathcal{S}_i|$ is the cardinality of the factored state subspace and $H$ is the planning horizon. To show the optimality of our bounds, we also provide a lower bound for FMDP, which indicates that our algorithm is near-optimal w.r.t. timestep $T$, horizon $H$ and factored state-action subspace cardinality. Finally, as an application, we study a new formulation of constrained RL, known as RL with knapsack constraints (RLwK), and provides the first sample-efficient algorithm based on FMDP-BF.
Risk-Sensitive Reinforcement Learning: a Martingale Approach to Reward Uncertainty
Vadori, Nelson, Ganesh, Sumitra, Reddy, Prashant, Veloso, Manuela
We introduce a novel framework to account for sensitivity to rewards uncertainty in sequential decision-making problems. While risk-sensitive formulations for Markov decision processes studied so far focus on the distribution of the cumulative reward as a whole, we aim at learning policies sensitive to the uncertain/stochastic nature of the rewards, which has the advantage of being conceptually more meaningful in some cases. To this end, we present a new decomposition of the randomness contained in the cumulative reward based on the Doob decomposition of a stochastic process, and introduce a new conceptual tool - the \textit{chaotic variation} - which can rigorously be interpreted as the risk measure of the martingale component associated to the cumulative reward process. We innovate on the reinforcement learning side by incorporating this new risk-sensitive approach into model-free algorithms, both policy gradient and value function based, and illustrate its relevance on grid world and portfolio optimization problems.
A Survey of FPGA-Based Robotic Computing
Wan, Zishen, Yu, Bo, Li, Thomas Yuang, Tang, Jie, Zhu, Yuhao, Wang, Yu, Raychowdhury, Arijit, Liu, Shaoshan
Recent researches on robotics have shown significant improvement, spanning from algorithms, mechanics to hardware architectures. Robotics, including manipulators, legged robots, drones, and autonomous vehicles, are now widely applied in diverse scenarios. However, the high computation and data complexity of robotic algorithms pose great challenges to its applications. On the one hand, CPU platform is flexible to handle multiple robotic tasks. GPU platform has higher computational capacities and easy-touse development frameworks, so they have been widely adopted in several applications. On the other hand, FPGA-based robotic accelerators are becoming increasingly competitive alternatives, especially in latency-critical and power-limited scenarios. With specialized designed hardware logic and algorithm kernels, FPGA-based accelerators can surpass CPU and GPU in performance and energy efficiency. In this paper, we give an overview of previous work on FPGA-based robotic accelerators covering different stages of the robotic system pipeline. An analysis of software and hardware optimization techniques and main technical issues is presented, along with some commercial and space applications, to serve as a guide for future work. Therefore, the computation and storage complexity, as well as real-time and power constraints of the robotic system, Over the last decade, we have seen significant progress hinders its wide application in latency-critical or power-limited in the development of robotics, spanning from algorithms, scenarios [13]. Various robotic systems, like Therefore, it is essential to choose a proper compute platform manipulators, legged robots, unmanned aerial vehicles, selfdriving for the robotic system. CPU and GPU are two widely cars have been designed for search and rescue [1], [2], used commercial compute platforms. CPU is designed to exploration [3], [4], package delivery [5], entertainment [6], handle a wide range of tasks quickly and is often used to [7] and more applications and scenarios. These robots are develop novel algorithms. A typical CPU can achieve 10-on the rise of demonstrating their full potential. Take drones, 100 GFLOPS with below 1GOP/J power efficiency [14]. In a type of aerial robots, for example, the number of drones contrast, GPU is designed with thousands of processor cores has grown by 2.83x between 2015 and 2019 based on the running simultaneously, which enable massive parallelism. The typical GPU can perform up to 10 TOPS performance and registered number has reached 1.32 million in 2019, and the become a good candidate for high-performance scenarios. Recently, FFA expects this number will come to 1.59 billion by 2024.
Toward the Fundamental Limits of Imitation Learning
Rajaraman, Nived, Yang, Lin F., Jiao, Jiantao, Ramachandran, Kannan
Imitation learning (IL) aims to mimic the behavior of an expert policy in a sequential decision-making problem given only demonstrations. In this paper, we focus on understanding the minimax statistical limits of IL in episodic Markov Decision Processes (MDPs). We first consider the setting where the learner is provided a dataset of $N$ expert trajectories ahead of time, and cannot interact with the MDP. Here, we show that the policy which mimics the expert whenever possible is in expectation $\lesssim \frac{|\mathcal{S}| H^2 \log (N)}{N}$ suboptimal compared to the value of the expert, even when the expert follows an arbitrary stochastic policy. Here $\mathcal{S}$ is the state space, and $H$ is the length of the episode. Furthermore, we establish a suboptimality lower bound of $\gtrsim |\mathcal{S}| H^2 / N$ which applies even if the expert is constrained to be deterministic, or if the learner is allowed to actively query the expert at visited states while interacting with the MDP for $N$ episodes. To our knowledge, this is the first algorithm with suboptimality having no dependence on the number of actions, under no additional assumptions. We then propose a novel algorithm based on minimum-distance functionals in the setting where the transition model is given and the expert is deterministic. The algorithm is suboptimal by $\lesssim \min \{ H \sqrt{|\mathcal{S}| / N} ,\ |\mathcal{S}| H^{3/2} / N \}$, showing that knowledge of transition improves the minimax rate by at least a $\sqrt{H}$ factor.
Is there a role for statistics in artificial intelligence?
Friedrich, Sarah, Antes, Gerd, Behr, Sigrid, Binder, Harald, Brannath, Werner, Dumpert, Florian, Ickstadt, Katja, Kestler, Hans, Lederer, Johannes, Leitgöb, Heinz, Pauly, Markus, Steland, Ansgar, Wilhelm, Adalbert, Friede, Tim
The research on and application of artificial intelligence (AI) has triggered a comprehensive scientific, economic, social and political discussion. Here we argue that statistics, as an interdisciplinary scientific field, plays a substantial role both for the theoretical and practical understanding of AI and for its future development. Statistics might even be considered a core element of AI. With its specialist knowledge of data evaluation, starting with the precise formulation of the research question and passing through a study design stage on to analysis and interpretation of the results, statistics is a natural partner for other disciplines in teaching, research and practice. This paper aims at contributing to the current discussion by highlighting the relevance of statistical methodology in the context of AI development. In particular, we discuss contributions of statistics to the field of artificial intelligence concerning methodological development, planning and design of studies, assessment of data quality and data collection, differentiation of causality and associations and assessment of uncertainty in results. Moreover, the paper also deals with the equally necessary and meaningful extension of curricula in schools and universities.
Supervised Learning with Projected Entangled Pair States
Cheng, Song, Wang, Lei, Zhang, Pan
Tensor networks, a model that originated from quantum physics, has been gradually generalized as efficient models in machine learning in recent years. However, in order to achieve exact contraction, only tree-like tensor networks such as the matrix product states and tree tensor networks have been considered, even for modeling two-dimensional data such as images. In this work, we construct supervised learning models for images using the projected entangled pair states (PEPS), a two-dimensional tensor network having a similar structure prior to natural images. Our approach first performs a feature map, which transforms the image data to a product state on a grid, then contracts the product state to a PEPS with trainable parameters to predict image labels. The tensor elements of PEPS are trained by minimizing differences between training labels and predicted labels. The proposed model is evaluated on image classifications using the MNIST and the Fashion-MNIST datasets. We show that our model is significantly superior to existing models using tree-like tensor networks. Moreover, using the same input features, our method performs as well as the multilayer perceptron classifier, but with much fewer parameters and is more stable. Our results shed light on potential applications of two-dimensional tensor network models in machine learning.
Embodied Visual Navigation with Automatic Curriculum Learning in Real Environments
Morad, Steven D., Mecca, Roberto, Poudel, Rudra P. K., Liwicki, Stephan, Cipolla, Roberto
We present NavACL, a method of automatic curriculum learning tailored to the navigation task. NavACL is simple to train and efficiently selects relevant tasks using geometric features. In our experiments, deep reinforcement learning agents trained using NavACL in collision-free environments significantly outperform state-of-the-art agents trained with uniform sampling -- the current standard. Furthermore, our agents are able to navigate through unknown cluttered indoor environments to semantically-specified targets using only RGB images. Collision avoidance policies and frozen feature networks support transfer to unseen real-world environments, without any modification or retraining requirements. We evaluate our policies in simulation, and in the real world on a ground robot and a quadrotor drone. Videos of real-world results are available in the supplementary material
Aggregating Long-Term Context for Learning Surgical Workflows
Ban, Yutong, Rosman, Guy, Ward, Thomas, Hashimoto, Daniel, Kondo, Taisei, Iwaki, Hidekazu, Meireles, Ozanan, Rus, Daniela
Analyzing surgical workflow is crucial for computers to understand surgeries. Deep learning techniques have recently been widely applied to recognize surgical workflows. Many of the existing temporal neural network models are limited in their capability to handle long-term dependencies in the data, instead of relying upon strong performance of the underlying per-frame visual models. We propose a new temporal network structure that leverages task-specific network representation to collect long-term sufficient statistics that are propagated by a sufficient statistics model (SSM). We leverage our approach within an LSTM back-bone for the task of surgical phase recognition and explore several choices for propagated statistics. We demonstrate superior results over existing state-of-the-art segmentation and novel segmentation techniques, on two laparoscopic cholecystectomy datasets: the already published Cholec80dataset and MGH100, a novel dataset with more challenging, yet clinically meaningful, segment labels.
A kernel function for Signal Temporal Logic formulae
Bortolussi, Luca, Gallo, Giuseppe Maria, Nenzi, Laura
Signal Temporal Logic (STL) [14] is gaining momentum as a requirement specification language for complex systems and, in particular, Cyber-Physical Systems [4]. STL has been applied in several flavours, from Runtime-monitoring [4] to control synthesis [10] and falsification problems [9], and recently also within learning algorithms, trying to find a maximally discriminating formula between sets of trajectories [5, 3, 16]. In these applications, a central role is played by the real-valued quantitative semantics [8], measuring robustness of satisfaction. Most of the applications of STL have been applied to deterministic (hybrid) systems, with less emphasis on non-deterministic or stochastic ones [2]. Another area in which formal methods are providing interesting tools is in logic-based distances between models, like bisimulation metrics for Markov models [1], which are typically based on a branching logic. In fact, extending these ideas to linear time logic is hard [7], and typically requires statistical approximations.