Instructional Material
Fundamental Tradeoffs in Learning with Prior Information
We seek to understand fundamental tradeoffs between the accuracy of prior information that a learner has on a given problem and its learning performance. We introduce the notion of prioritized risk, which differs from traditional notions of minimax and Bayes risk by allowing us to study such fundamental tradeoffs in settings where reality does not necessarily conform to the learner's prior. We present a general reduction-based approach for extending classical minimax lower-bound techniques in order to lower bound the prioritized risk for statistical estimation problems. We also introduce a novel generalization of Fano's inequality (which may be of independent interest) for lower bounding the prioritized risk in more general settings involving unbounded losses. We illustrate the ability of our framework to provide insights into tradeoffs between prior information and learning performance for problems in estimation, regression, and reinforcement learning.
Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning
Haarnoja, Tuomas, Moran, Ben, Lever, Guy, Huang, Sandy H., Tirumala, Dhruva, Wulfmeier, Markus, Humplik, Jan, Tunyasuvunakool, Saran, Siegel, Noah Y., Hafner, Roland, Bloesch, Michael, Hartikainen, Kristian, Byravan, Arunkumar, Hasenclever, Leonard, Tassa, Yuval, Sadeghi, Fereshteh, Batchelor, Nathan, Casarini, Federico, Saliceti, Stefano, Game, Charles, Sreendra, Neil, Patel, Kushal, Gwira, Marlon, Huber, Andrea, Hurley, Nicole, Nori, Francesco, Hadsell, Raia, Heess, Nicolas
We investigate whether Deep Reinforcement Learning (Deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies in dynamic environments. We used Deep RL to train a humanoid robot with 20 actuated joints to play a simplified one-versus-one (1v1) soccer game. We first trained individual skills in isolation and then composed those skills end-to-end in a self-play setting. The resulting policy exhibits robust and dynamic movement skills such as rapid fall recovery, walking, turning, kicking and more; and transitions between them in a smooth, stable, and efficient manner - well beyond what is intuitively expected from the robot. The agents also developed a basic strategic understanding of the game, and learned, for instance, to anticipate ball movements and to block opponent shots. The full range of behaviors emerged from a small set of simple rewards. Our agents were trained in simulation and transferred to real robots zero-shot. We found that a combination of sufficiently high-frequency control, targeted dynamics randomization, and perturbations during training in simulation enabled good-quality transfer, despite significant unmodeled effects and variations across robot instances. Although the robots are inherently fragile, minor hardware modifications together with basic regularization of the behavior during training led the robots to learn safe and effective movements while still performing in a dynamic and agile way. Indeed, even though the agents were optimized for scoring, in experiments they walked 156% faster, took 63% less time to get up, and kicked 24% faster than a scripted baseline, while efficiently combining the skills to achieve the longer term objectives. Examples of the emergent behaviors and full 1v1 matches are available on the supplementary website.
SEQUENT: Towards Traceable Quantum Machine Learning using Sequential Quantum Enhanced Training
Altmann, Philipp, Sรผnkel, Leo, Stein, Jonas, Mรผller, Tobias, Roch, Christoph, Linnhoff-Popien, Claudia
Therefore, hybrid approaches have been proposed, where the power of both classical and With classical computation evolving towards performance quantum computation are united for improved results saturation, new computing paradigms like (Bergholm et al., 2018). By this, it is possible to leverage quantum computing arise, promising superior performance the advantages of quantum computing for tasks in complex problem domains. However, current with parameter spaces that cannot be computed solely architectures merely reach numbers of 100 quantum by quantum computers due to hardware and simulation bits (qubits), prone to noise, and classical computers limitations. Within those hybrid algorithms the run out of resources simulating similar sized quantum part is, analogue to the classical deep neural systems (Preskill, 2018). Thus, most real world applications networks (DNNs), represented by so called variational are not yet feasible solely relying on quantum quantum circuits (VQCs), which are parameterized compute. Especially in the field of machine learning, and can be trained in a supervised manner where parameter spaces sized upwards of 50 million using labeled data (Cerezo et al., 2021). For hybrid are required for tasks like image classification, machine learning, we will from hereon refer to VQCs the resources of current quantum hardware or simulators as quantum parts and to DNNs as classical parts.
Yelp's latest update includes AI suggestions, new review options and more
Yelp has announced a bunch of updates across its site and apps, including a light lean into the AI trend. New features include providing a consumer guarantee, expanded review options and password-free logins. Yelp is utilizing AI and natural language models to further improve its search features. When you search for a specific place, like a tennis court, Yelp will suggest options and add a review with helpful information about going there -- such as being able to book in advance. Further updates include showing you relevant businesses across the country and clickable tags like "Breakfast and Brunch."
Proximal Curriculum for Reinforcement Learning Agents
Tzannetos, Georgios, Ribeiro, Bรกrbara Gomes, Kamalaruban, Parameswaran, Singla, Adish
We consider the problem of curriculum design for reinforcement learning (RL) agents in contextual multi-task settings. Existing techniques on automatic curriculum design typically require domain-specific hyperparameter tuning or have limited theoretical underpinnings. To tackle these limitations, we design our curriculum strategy, ProCuRL, inspired by the pedagogical concept of Zone of Proximal Development (ZPD). ProCuRL captures the intuition that learning progress is maximized when picking tasks that are neither too hard nor too easy for the learner. We mathematically derive ProCuRL by analyzing two simple learning settings. We also present a practical variant of ProCuRL that can be directly integrated with deep RL frameworks with minimal hyperparameter tuning. Experimental results on a variety of domains demonstrate the effectiveness of our curriculum strategy over state-of-the-art baselines in accelerating the training process of deep RL agents.
Dynamic Datasets and Market Environments for Financial Reinforcement Learning
Liu, Xiao-Yang, Xia, Ziyi, Yang, Hongyang, Gao, Jiechao, Zha, Daochen, Zhu, Ming, Wang, Christina Dan, Wang, Zhaoran, Guo, Jian
The financial market is a particularly challenging playground for deep reinforcement learning due to its unique feature of dynamic datasets. Building high-quality market environments for training financial reinforcement learning (FinRL) agents is difficult due to major factors such as the low signal-to-noise ratio of financial data, survivorship bias of historical data, and model overfitting. In this paper, we present FinRL-Meta, a data-centric and openly accessible library that processes dynamic datasets from real-world markets into gym-style market environments and has been actively maintained by the AI4Finance community. First, following a DataOps paradigm, we provide hundreds of market environments through an automatic data curation pipeline. Second, we provide homegrown examples and reproduce popular research papers as stepping stones for users to design new trading strategies. We also deploy the library on cloud platforms so that users can visualize their own results and assess the relative performance via community-wise competitions. Third, we provide dozens of Jupyter/Python demos organized into a curriculum and a documentation website to serve the rapidly growing community. The open-source codes for the data curation pipeline are available at https://github.com/AI4Finance-Foundation/FinRL-Meta
Discriminative and Generative Learning for Linear Estimation of Random Signals [Lecture Notes]
Shlezinger, Nir, Routtenberg, Tirza
Inference tasks in signal processing are often characterized by the availability of reliable statistical modeling with some missing instance-specific parameters. One conventional approach uses data to estimate these missing parameters and then infers based on the estimated model. Alternatively, data can also be leveraged to directly learn the inference mapping end-to-end. These approaches for combining partially-known statistical models and data in inference are related to the notions of generative and discriminative models used in the machine learning literature, typically considered in the context of classifiers. The goal of this lecture note is to introduce the concepts of generative and discriminative learning for inference with a partially-known statistical model. While machine learning systems often lack the interpretability of traditional signal processing methods, we focus on a simple setting where one can interpret and compare the approaches in a tractable manner that is accessible and relevant to signal processing readers. In particular, we exemplify the approaches for the task of Bayesian signal estimation in a jointly Gaussian setting with the mean-squared error (MSE) objective, i.e., a linear estimation setting.
Stochastic Cell Transmission Models of Traffic Networks
Feinstein, Zachary, Kleiber, Marcel, Weber, Stefan
Cell transmission models enable the quantification of the motion of traffic participants on a high level of aggregation. This provides computational advantages in comparison to microscopic traffic models that capture the motion of traffic participants in great detail. This gain in computational efficiency is sometimes disadvantageously associated with lower granularity, which complicates the representation of complex traffic modules and interactions of traffic participants. In this paper, we propose a rigorous framework for cell transmission models that incorporates three important features: a) The cells are identified with the nodes of a graph. We introduce a precise notation for the directions of the traffic participants within each cell. This allows the construction of cell transmission models for general traffic networks.
Awesome-META+: Meta-Learning Research and Learning Platform
Wang, Jingyao, Zhang, Chuyuan, Ding, Ye, Yang, Yuxuan
Artificial intelligence technology has already had a profound impact in various fields such as economy, industry, and education, but still limited. Meta-learning, also known as "learning to learn", provides an opportunity for general artificial intelligence, which can break through the current AI bottleneck. However, meta learning started late and there are fewer projects compare with CV, NLP etc. Each deployment requires a lot of experience to configure the environment, debug code or even rewrite, and the frameworks are isolated. Moreover, there are currently few platforms that focus exclusively on meta-learning, or provide learning materials for novices, for which the threshold is relatively high. Based on this, Awesome-META+, a meta-learning framework integration and learning platform is proposed to solve the above problems and provide a complete and reliable meta-learning framework application and learning platform. The project aims to promote the development of meta-learning and the expansion of the community, including but not limited to the following functions: 1) Complete and reliable meta-learning framework, which can adapt to multi-field tasks such as target detection, image classification, and reinforcement learning. 2) Convenient and simple model deployment scheme which provide convenient meta-learning transfer methods and usage methods to lower the threshold of meta-learning and improve efficiency. 3) Comprehensive researches for learning. 4) Objective and credible performance analysis and thinking.
Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware
Zhao, Tony Z., Kumar, Vikash, Levine, Sergey, Finn, Chelsea
Fine manipulation tasks, such as threading cable ties or slotting a battery, are notoriously difficult for robots because they require precision, careful coordination of contact forces, and closed-loop visual feedback. Performing these tasks typically requires high-end robots, accurate sensors, or careful calibration, which can be expensive and difficult to set up. Can learning enable low-cost and imprecise hardware to perform these fine manipulation tasks? We present a low-cost system that performs end-to-end imitation learning directly from real demonstrations, collected with a custom teleoperation interface. Imitation learning, however, presents its own challenges, particularly in high-precision domains: errors in the policy can compound over time, and human demonstrations can be non-stationary. To address these challenges, we develop a simple yet novel algorithm, Action Chunking with Transformers (ACT), which learns a generative model over action sequences. ACT allows the robot to learn 6 difficult tasks in the real world, such as opening a translucent condiment cup and slotting a battery with 80-90% success, with only 10 minutes worth of demonstrations. Project website: https://tonyzhaozh.github.io/aloha/