Instructional Material
Cross-Episodic Curriculum for Transformer Agents
We present a new algorithm, Cross-Episodic Curriculum (CEC), to boost the learning efficiency and generalization of Transformer agents. Central to CEC is the placement of cross-episodic experiences into a Transformer's context, which forms the basis of a curriculum. By sequentially structuring online learning trials and mixed-quality demonstrations, CEC constructs curricula that encapsulate learning progression and proficiency increase across episodes. Such synergy combined with the potent pattern recognition capabilities of Transformer models delivers a powerful cross-episodic attention mechanism. The effectiveness of CEC is demonstrated under two representative scenarios: one involving multi-task reinforcement learning with discrete control, such as in DeepMind Lab, where the curriculum captures the learning progression in both individual and progressively complex settings; and the other involving imitation learning with mixed-quality data for continuous control, as seen in RoboMimic, where the curriculum captures the improvement in demonstrators' expertise.
Improving the portability of predicting students performance models by using ontologies
Zambrano, Javier Lopez, Lara, Juan A., Romero, Cristobal
One of the main current challenges in Educational Data Mining and Learning Analytics is the portability or transferability of predictive models obtained for a particular course so that they can be applied to other different courses. To handle this challenge, one of the foremost problems is the models excessive dependence on the low-level attributes used to train them, which reduces the models portability. To solve this issue, the use of high level attributes with more semantic meaning, such as ontologies, may be very useful. Along this line, we propose the utilization of an ontology that uses a taxonomy of actions that summarises students interactions with the Moodle learning management system. We compare the results of this proposed approach against our previous results when we used low-level raw attributes obtained directly from Moodle logs. The results indicate that the use of the proposed ontology improves the portability of the models in terms of predictive accuracy. The main contribution of this paper is to show that the ontological models obtained in one source course can be applied to other different target courses with similar usage levels without losing prediction accuracy.
Herald: A Natural Language Annotated Lean 4 Dataset
Gao, Guoxiong, Wang, Yutong, Jiang, Jiedong, Gao, Qi, Qin, Zihan, Xu, Tianyi, Dong, Bin
Verifiable formal languages like Lean have profoundly impacted mathematical reasoning, particularly through the use of large language models (LLMs) for automated reasoning. A significant challenge in training LLMs for these formal languages is the lack of parallel datasets that align natural language with formal language proofs. To address this challenge, this paper introduces a novel framework for translating the Mathlib4 corpus (a unified library of mathematics in formal language Lean 4) into natural language. Building upon this, we employ a dual augmentation strategy that combines tactic-based and informal-based approaches, leveraging the Lean-jixia system, a Lean 4 analyzer. We present the results of this pipeline on Mathlib4 as Herald (Hierarchy and Retrieval-based Translated Lean Dataset). We also propose the Herald Translator, which is fine-tuned on Herald. Herald translator achieves a 93.2% accuracy (Pass@128) on formalizing statements in the miniF2F-test and a 22.5% accuracy on our internal graduate-level textbook dataset, outperforming InternLM2-Math-Plus-7B (74.0% and 7.5%) and TheoremLlama (50.1% and 4.0%). Furthermore, we propose a section-level translation framework for real-world applications. As a direct application of Herald translator, we have successfully translated a template section in the Stack project, marking a notable progress in the automatic formalization of graduate-level mathematical literature. Our model, along with the datasets, will be open-sourced to the public soon.
Reviews: HOUDINI: Lifelong Learning as Program Synthesis
The authors present an algorithm for transfer learning using a symbolic program synthesizer for finding the most adequate neural network architecture and selecting relevant neural network modules from previous tasks for transfer. The approach is heavily based on concepts from programming languages, but also studies the relevant concept of high-level transfer that is crucial for true lifelong learning. Results show how the algorithm is capable of selectively transferring (high- and low-level) knowledge in a meaningful way, and numerical results validate the significance of the approach. The authors claim that their method targets the lifelong learning problem, but theirs is really a transfer learning approach. Solving catastrophic forgetting by completely freezing the network parameters precludes the method from being true lifelong learning, in which the learning of subsequent tasks affects the performance of earlier tasks.
Reviews: Virtual Class Enhanced Discriminative Embedding Learning
The paper proposes a simple technique for improved feature learning in convolutional neural networks. The technique consists of adding a "negative" virtual class to CNN training on classification tasks with the softmax loss function. The authors evaluate their approach on a range of computer vision datasets, (CIFAR10/100/100, LFW, SLLFW, CUB200, ImageNet32) and find that it outperforms simple baselines on all of them, and outperforms more complicated state-of-the-art techniques on most of them. The authors also present an analysis from a few different standpoints as to why their method is effective. Strengths: - The technique proposed by the authors is extremely simple to implement (just a one line change in existing code would suffice, as far as I can tell).
FACMIC: Federated Adaptative CLIP Model for Medical Image Classification
Wu, Yihang, Desrosiers, Christian, Chaddad, Ahmad
Federated learning (FL) has emerged as a promising approach to medical image analysis that allows deep model training using decentralized data while ensuring data privacy. However, in the field of FL, communication cost plays a critical role in evaluating the performance of the model. Thus, transferring vision foundation models can be particularly challenging due to the significant resource costs involved. In this paper, we introduce a federated adaptive Contrastive Language Image Pretraining (CLIP) model designed for classification tasks. We employ a light-weight and efficient feature attention module for CLIP that selects suitable features for each client's data. Additionally, we propose a domain adaptation technique to reduce differences in data distribution between clients. Experimental results on four publicly available datasets demonstrate the superior performance of FACMIC in dealing with realworld and multisource medical imaging data. Our codes are available at https://github.com/AIPMLab/FACMIC.
An Innovative Solution: AI-Based Digital Screen-Integrated Tables for Educational Settings
In this paper, we have gone through different AI-Based frameworks used for various educational tasks like digital customized assignment allotment and performance monitoring, identifying slow-learners and fast-learners, etc. application describes a novel invention, digital screen-integrated tables, designed specifically for educational settings. The tables feature integrated digital screens controlled by a central processing unit (CPU), enabling synchronized display of educational content such as textbooks, presentations, exam questions, and interactive learning materials. Additionally, the invention facilitates the collection of student performance data during classroom activities and assessments. The gathered data is utilized for analysis using machine learning models to identify patterns and trends in student learning behaviours. By leveraging machine learning algorithms, educators can ascertain whether a student is a fast learner or a slow learner, based on which, the teacher can allocate more resources to the slow learners. This innovative approach aims to address the evolving needs of modern classrooms by providing a dynamic and data-driven learning environment. The unique integration of digital screens into traditional classroom furniture represents a significant advancement in educational technology. This patent filing encompasses the design, functionality, and method of operation of the digital screen-integrated tables, emphasizing their innovative features and applications in educational institutions.
Monocular Visual Place Recognition in LiDAR Maps via Cross-Modal State Space Model and Multi-View Matching
Yao, Gongxin, Li, Xinyang, Fu, Luowei, Pan, Yu
Achieving monocular camera localization within pre-built LiDAR maps can bypass the simultaneous mapping process of visual SLAM systems, potentially reducing the computational overhead of autonomous localization. To this end, one of the key challenges is cross-modal place recognition, which involves retrieving 3D scenes (point clouds) from a LiDAR map according to online RGB images. In this paper, we introduce an efficient framework to learn descriptors for both RGB images and point clouds. It takes visual state space model (VMamba) as the backbone and employs a pixel-view-scene joint training strategy for cross-modal contrastive learning. To address the field-of-view differences, independent descriptors are generated from multiple evenly distributed viewpoints for point clouds. A visible 3D points overlap strategy is then designed to quantify the similarity between point cloud views and RGB images for multi-view supervision. Additionally, when generating descriptors from pixel-level features using NetVLAD, we compensate for the loss of geometric information, and introduce an efficient scheme for multi-view generation. Experimental results on the KITTI and KITTI-360 datasets demonstrate the effectiveness and generalization of our method. The code will be released upon acceptance.
Provable Methods for Searching with an Imperfect Sensor
Chakraborty, Nilanjan, Kasthurirangan, Prahlad Narasimhan, Mitchell, Joseph S. B., Nguyen, Linh, Perk, Michael
Assume that a target is known to be present at an unknown point among a finite set of locations in the plane. We search for it using a mobile robot that has imperfect sensing capabilities. It takes time for the robot to move between locations and search a location; we have a total time budget within which to conduct the search. We study the problem of computing a search path/strategy for the robot that maximizes the probability of detection of the target. Considering non-uniform travel times between points (e.g., based on the distance between them) is crucial for search and rescue applications; such problems have been investigated to a limited extent due to their inherent complexity. In this paper, we describe fast algorithms with performance guarantees for this search problem and some variants, complement them with complexity results, and perform experiments to observe their performance.
Edit Distances and Their Applications to Downstream Tasks in Research and Commercial Contexts
Carmo, Félix do, Kanojia, Diptesh
Edit distances are a class of metrics used to quantify the similarity between two text sequences by calculating the minimum number of operations required to transform one sequence into another. These operations typically include insertion, deletion, substitution, and movement of characters or words. The application of edit distances extends beyond simple string comparison and is used extensively in evaluating machinetranslated text against human references, quality estimation, and post-editing tasks. This tutorial is targeted at researchers of machine translation and of human translation, as well as corporate members of AMTA. It focuses on the uses of edit distances, such as TER - Translation Edit Rate (Snover et al., 2006), as proxies of translation effort and as informants of other downstream tasks, such as MT evaluation and post-editing, error annotation with MQM (Burchardt, 2013), quality estimation - QE (Specia et al., 2022) and automatic post-editing - APE (do Carmo et al., 2021). The application of edit distances in downstream tasks often assumes that these accurately represent work done by post-editors and real errors that need to be corrected in MT output. We will discuss how imperfect edit distances are in capturing the details of this error correction work and the implications for researchers and for commercial applications of these uses of edit distances. In terms of commercial applications, we will discuss their integration in computer-assisted translation tools and how the perception of the connection between edit distances and post-editor effort affects the definition of translator rates.