Instructional Material
Bridging the Gap Between Offline and Online Reinforcement Learning Evaluation Methodologies
Sujit, Shivakanth, Braga, Pedro H. M., Bornschein, Jorg, Kahou, Samira Ebrahimi
Reinforcement learning (RL) has shown great promise with algorithms learning in environments with large state and action spaces purely from scalar reward signals. A crucial challenge for current deep RL algorithms is that they require a tremendous amount of environment interactions for learning. This can be infeasible in situations where such interactions are expensive; such as in robotics. Offline RL algorithms try to address this issue by bootstrapping the learning process from existing logged data without needing to interact with the environment from the very beginning. While online RL algorithms are typically evaluated as a function of the number of environment interactions, there exists no single established protocol for evaluating offline RL methods.In this paper, we propose a sequential approach to evaluate offline RL algorithms as a function of the training set size and thus by their data efficiency. Sequential evaluation provides valuable insights into the data efficiency of the learning process and the robustness of algorithms to distribution changes in the dataset while also harmonizing the visualization of the offline and online learning phases. Our approach is generally applicable and easy to implement. We compare several existing offline RL algorithms using this approach and present insights from a variety of tasks and offline datasets.
MathGloss: Building mathematical glossaries from text
Horowitz, Lucy, de Paiva, Valeria
MathGloss is a project to create a knowledge graph (KG) for undergraduate mathematics from text, automatically, using modern natural language processing (NLP) tools and resources already available on the web. MathGloss is a linked database of undergraduate concepts in mathematics. So far, it combines five resources: (i) Wikidata, a collaboratively edited, multilingual knowledge graph hosted by the Wikimedia Foundation, (ii) terms covered in mathematics courses at the University of Chicago, (iii) the syllabus of the French undergraduate mathematics curriculum which includes hyperlinks to the automated theorem prover Lean 4, (iv) MuLiMa, a multilingual dictionary of mathematics curated by mathematicians, and (v) the nLab, a wiki for category theory also curated by mathematicians. MathGloss's goal is to bring together resources for learning mathematics and to allow every mathematician to tailor their learning to their own preferences. Moreover, by organizing different resources for learning undergraduate mathematics alongside those for learning formal mathematics, we hope to make it easier for mathematicians and formal tools (theorem provers, computer algebra systems, etc) experts to "understand" each other and break down some of the barriers to formal math.
Knowledge Augmented Machine Learning with Applications in Autonomous Driving: A Survey
Wörmann, Julian, Bogdoll, Daniel, Brunner, Christian, Bührle, Etienne, Chen, Han, Chuo, Evaristus Fuh, Cvejoski, Kostadin, van Elst, Ludger, Gottschall, Philip, Griesche, Stefan, Hellert, Christian, Hesels, Christian, Houben, Sebastian, Joseph, Tim, Keil, Niklas, Kelsch, Johann, Keser, Mert, Königshof, Hendrik, Kraft, Erwin, Kreuser, Leonie, Krone, Kevin, Latka, Tobias, Mattern, Denny, Matthes, Stefan, Motzkus, Franz, Munir, Mohsin, Nekolla, Moritz, Paschke, Adrian, von Pilchau, Stefan Pilar, Pintz, Maximilian Alexander, Qiu, Tianming, Qureishi, Faraz, Rizvi, Syed Tahseen Raza, Reichardt, Jörg, von Rueden, Laura, Sagel, Alexander, Sasdelli, Diogo, Scholl, Tobias, Schunk, Gerhard, Schwalbe, Gesina, Shen, Hao, Shoeb, Youssef, Stapelbroek, Hendrik, Stehr, Vera, Srinivas, Gurucharan, Tran, Anh Tuan, Vivekanandan, Abhishek, Wang, Ya, Wasserrab, Florian, Werner, Tino, Wirth, Christian, Zwicklbauer, Stefan
The availability of representative datasets is an essential prerequisite for many successful artificial intelligence and machine learning models. However, in real life applications these models often encounter scenarios that are inadequately represented in the data used for training. There are various reasons for the absence of sufficient data, ranging from time and cost constraints to ethical considerations. As a consequence, the reliable usage of these models, especially in safety-critical applications, is still a tremendous challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches. Knowledge augmented machine learning approaches offer the possibility of compensating for deficiencies, errors, or ambiguities in the data, thus increasing the generalization capability of the applied models. Even more, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-driven models with existing knowledge. The identified approaches are structured according to the categories knowledge integration, extraction and conformity. In particular, we address the application of the presented methods in the field of autonomous driving.
Learning material synthesis-process-structure-property relationship by data fusion: Bayesian Coregionalization N-Dimensional Piecewise Function Learning
Kusne, A. Gilad, McDannald, Austin, DeCost, Brian
Autonomous materials research labs require the ability to combine and learn from diverse data streams. This is especially true for learning material synthesis-process-structure-property relationships, key to accelerating materials optimization and discovery as well as accelerating mechanistic understanding. We present the Synthesis-process-structure-property relAtionship coreGionalized lEarner (SAGE) algorithm. A fully Bayesian algorithm that uses multimodal coregionalization to merge knowledge across data sources to learn synthesis-process-structure-property relationships. SAGE outputs a probabilistic posterior for the relationships including the most likely relationships given the data.
ERUDITE: Human-in-the-Loop IoT for an Adaptive Personalized Learning System
Taherisadr, Mojtaba, Faruque, Mohammad Abdullah Al, Elmalaki, Salma
Thanks to the rapid growth in wearable technologies and recent advancement in machine learning and signal processing, monitoring complex human contexts becomes feasible, paving the way to develop human-in-the-loop IoT systems that naturally evolve to adapt to the human and environment state autonomously. Nevertheless, a central challenge in designing many of these IoT systems arises from the requirement to infer the human mental state, such as intention, stress, cognition load, or learning ability. While different human contexts can be inferred from the fusion of different sensor modalities that can correlate to a particular mental state, the human brain provides a richer sensor modality that gives us more insights into the required human context. This paper proposes ERUDITE, a human-in-the-loop IoT system for the learning environment that exploits recent wearable neurotechnology to decode brain signals. Through insights from concept learning theory, ERUDITE can infer the human state of learning and understand when human learning increases or declines. By quantifying human learning as an input sensory signal, ERUDITE can provide adequate personalized feedback to humans in a learning environment to enhance their learning experience. ERUDITE is evaluated across $15$ participants and showed that by using the brain signals as a sensor modality to infer the human learning state and providing personalized adaptation to the learning environment, the participants' learning performance increased on average by $26\%$. Furthermore, we showed that ERUDITE can be deployed on an edge-based prototype to evaluate its practicality and scalability.
UniMOS: A Universal Framework For Multi-Organ Segmentation Over Label-Constrained Datasets
Li, Can, Shao, Sheng, Qu, Junyi, Pang, Shuchao, Orgun, Mehmet A.
Machine learning models for medical images can help physicians diagnose and manage diseases. However, due to the fact that medical image annotation requires a great deal of manpower and expertise, as well as the fact that clinical departments perform image annotation based on task orientation, there is the problem of having fewer medical image annotation data with more unlabeled data and having many datasets that annotate only a single organ. In this paper, we present UniMOS, the first universal framework for achieving the utilization of fully and partially labeled images as well as unlabeled images. Specifically, we construct a Multi-Organ Segmentation (MOS) module over fully/partially labeled data as the basenet and designed a new target adaptive loss. Furthermore, we incorporate a semi-supervised training module that combines consistent regularization and pseudolabeling techniques on unlabeled data, which significantly improves the segmentation of unlabeled data. Experiments show that the framework exhibits excellent performance in several medical image segmentation tasks compared to other advanced methods, and also significantly improves data utilization and reduces annotation cost. Code and models are available at: https://github.com/lw8807001/UniMOS.
Diffusion Model-Augmented Behavioral Cloning
Wang, Hsiang-Chun, Chen, Shang-Fu, Hsu, Ming-Hao, Lai, Chun-Mao, Sun, Shao-Hua
Imitation learning addresses the challenge of learning by observing an expert's demonstrations without access to reward signals from environments. Most existing imitation learning methods that do not require interacting with environments either model the expert distribution as the conditional probability p(a|s) (e.g., behavioral cloning, BC) or the joint probability p(s, a). Despite its simplicity, modeling the conditional probability with BC usually struggles with generalization. While modeling the joint probability can improve generalization performance, the inference procedure is often time-consuming, and the model can suffer from manifold overfitting. This work proposes an imitation learning framework that benefits from modeling both the conditional and joint probability of the expert distribution. Our proposed diffusion model-augmented behavioral cloning (DBC) employs a diffusion model trained to model expert behaviors and learns a policy to optimize both the BC loss (conditional) and our proposed diffusion model loss (joint). DBC outperforms baselines in various continuous control tasks in navigation, robot arm manipulation, dexterous manipulation, and locomotion. We design additional experiments to verify the limitations of modeling either the conditional probability or the joint probability of the expert distribution, as well as compare different generative models. Ablation studies justify the effectiveness of our design choices. Recently, the success of deep reinforcement learning (DRL) (Mnih et al., 2015; Lillicrap et al., 2016; Arulkumaran et al., 2017) has inspired the research community to develop DRL frameworks to control robots, aiming to automate the process of designing sensing, planning, and control algorithms by letting the robot learn in an end-to-end fashion. Yet, acquiring complex skills through trial and error can still lead to undesired behaviors even with sophisticated reward design (Christiano et al., 2017; Leike et al., 2018; Lee et al., 2019). Moreover, the exploring process could damage expensive robotic platforms or even be dangerous to humans (Garcıa and Fernández, 2015; Levine et al., 2020). To overcome this issue, imitation learning (i.e., learning from demonstration) (Schaal, 1997; Osa et al., 2018) has received growing attention, whose aim is to learn a policy from expert demonstrations, which are often more accessible than appropriate reward functions for reinforcement learning. Among various imitation learning directions, adversarial imitation learning (Ho and Ermon, 2016; Zolna et al., 2021; Kostrikov et al., 2019) and inverse reinforcement learning (Ng and Russell, 2000; Abbeel and Ng, 2004) have achieved encouraging results in a variety of domains. Yet, these methods require interacting with environments, which can still be expensive or even dangerous. On the other hand, behavioral cloning (BC) (Pomerleau, 1989; Bain and Sammut, 1995) does not require interacting with environments.
Visual AI and Linguistic Intelligence Through Steerability and Composability
Noever, David, Noever, Samantha Elizabeth Miller
This study explores the capabilities of multimodal large language models (LLMs) in handling challenging multistep tasks that integrate language and vision, focusing on model steerability, composability, and the application of long-term memory and context understanding. The problem addressed is the LLM's ability (Nov 2023 GPT-4 Vision Preview) to manage tasks that require synthesizing visual and textual information, especially where stepwise instructions and sequential logic are paramount. The research presents a series of 14 creatively and constructively diverse tasks, ranging from AI Lego Designing to AI Satellite Image Analysis, designed to test the limits of current LLMs in contexts that previously proved difficult without extensive memory and contextual understanding. Key findings from evaluating 800 guided dialogs include notable disparities in task completion difficulty. For instance, 'Image to Ingredient AI Bartender' (Low difficulty) contrasted sharply with 'AI Game Self-Player' (High difficulty), highlighting the LLM's varying proficiency in processing complex visual data and generating coherent instructions. Tasks such as 'AI Genetic Programmer' and 'AI Negotiator' showed high completion difficulty, emphasizing challenges in maintaining context over multiple steps. The results underscore the importance of developing LLMs that combine long-term memory and contextual awareness to mimic human-like thought processes in complex problem-solving scenarios.
Community-Aware Efficient Graph Contrastive Learning via Personalized Self-Training
Li, Yuecheng, Hu, Yanming, Fu, Lele, Chen, Chuan, Yang, Lei, Zheng, Zibin
In recent years, graph contrastive learning (GCL) has emerged as one of the optimal solutions for various supervised tasks at the node level. However, for unsupervised and structure-related tasks such as community detection, current GCL algorithms face difficulties in acquiring the necessary community-level information, resulting in poor performance. In addition, general contrastive learning algorithms improve the performance of downstream tasks by increasing the number of negative samples, which leads to severe class collision and unfairness of community detection. To address above issues, we propose a novel Community-aware Efficient Graph Contrastive Learning Framework (CEGCL) to jointly learn community partition and node representations in an end-to-end manner. Specifically, we first design a personalized self-training (PeST) strategy for unsupervised scenarios, which enables our model to capture precise community-level personalized information in a graph. With the benefit of the PeST, we alleviate class collision and unfairness without sacrificing the overall model performance. Furthermore, the aligned graph clustering (AlGC) is employed to obtain the community partition. In this module, we align the clustering space of our downstream task with that in PeST to achieve more consistent node embeddings. Finally, we demonstrate the effectiveness of our model for community detection both theoretically and experimentally. Extensive experimental results also show that our CEGCL exhibits state-of-the-art performance on three benchmark datasets with different scales.
Predicting the Probability of Collision of a Satellite with Space Debris: A Bayesian Machine Learning Approach
Catulo, João Simões, Soares, Cláudia, Guimarães, Marta
Space is becoming more crowded in Low Earth Orbit due to increased space activity. Such a dense space environment increases the risk of collisions between space objects endangering the whole space population. Therefore, the need to consider collision avoidance as part of routine operations is evident to satellite operators. Current procedures rely on the analysis of multiple collision warnings by human analysts. However, with the continuous growth of the space population, this manual approach may become unfeasible, highlighting the importance of automation in risk assessment. In 2019, ESA launched a competition to study the feasibility of applying machine learning in collision risk estimation and released a dataset that contained sequences of Conjunction Data Messages (CDMs) in support of real close encounters. The competition results showed that the naive forecast and its variants are strong predictors for this problem, which suggests that the CDMs may follow the Markov property. The proposed work investigates this theory by benchmarking Hidden Markov Models (HMM) in predicting the risk of collision between two resident space objects by using one feature of the entire dataset: the sequence of the probability in the CDMs. In addition, Bayesian statistics are used to infer a joint distribution for the parameters of the models, which allows the development of robust and reliable probabilistic predictive models that can incorporate physical or prior knowledge about the problem within a rigorous theoretical framework and provides prediction uncertainties that nicely reflect the accuracy of the predicted risk. This work shows that the implemented HMM outperforms the naive solution in some metrics, which further adds to the idea that the collision warnings may be Markovian and suggests that this is a powerful method to be further explored.