Goto

Collaborating Authors

 Instructional Material


Reviews: Life-Long Disentangled Representation Learning with Cross-Domain Latent Homologies

Neural Information Processing Systems

The authors present an algorithm for lifelong representation learning that adapts variational autoencoders to the lifelong learning setting. The framework is presented as a full generative process, where a set of latent factors are (selectively) shared across tasks, and the tasks themselves are generated by an unknown distribution. The algorithm optimizes for the reconstruction error with a regularization based on the MDL principle that has been studied for learning disentangled representations. The algorithm automatically detects distribution shifts (i.e., task changes) and avoids catastrophic forgetting by "hallucinating" data for previous tasks while training on a new one. The authors show empirically that their algorithm is able to extract relevant semantic knowledge from one task and transfer it to the next.


Do AI assistants help students write formal specifications? A study with ChatGPT and the B-Method

arXiv.org Artificial Intelligence

This paper investigates the role of AI assistants, specifically OpenAI's ChatGPT, in teaching formal methods (FM) to undergraduate students, using the B-method as a formal specification technique. While existing studies demonstrate the effectiveness of AI in coding tasks, no study reports on its impact on formal specifications. We examine whether ChatGPT provides an advantage when writing B-specifications and analyse student trust in its outputs. Our findings indicate that the AI does not help students to enhance the correctness of their specifications, with low trust correlating to better outcomes. Additionally, we identify a behavioural pattern with which to interact with ChatGPT which may influence the correctness of B-specifications.


TutorLLM: Customizing Learning Recommendations with Knowledge Tracing and Retrieval-Augmented Generation

arXiv.org Artificial Intelligence

The integration of AI in education offers significant potential to enhance learning efficiency. Large Language Models (LLMs), such as ChatGPT, Gemini, and Llama, allow students to query a wide range of topics, providing unprecedented flexibility. However, LLMs face challenges, such as handling varying content relevance and lack of personalization. To address these challenges, we propose TutorLLM, a personalized learning recommender LLM system based on Knowledge Tracing (KT) and Retrieval-Augmented Generation (RAG). The novelty of TutorLLM lies in its unique combination of KT and RAG techniques with LLMs, which enables dynamic retrieval of context-specific knowledge and provides personalized learning recommendations based on the student's personal learning state. Specifically, this integration allows TutorLLM to tailor responses based on individual learning states predicted by the Multi-Features with Latent Relations BERT-based KT (MLFBK) model and to enhance response accuracy with a Scraper model. The evaluation includes user assessment questionnaires and performance metrics, demonstrating a 10\% improvement in user satisfaction and a 5\% increase in quiz scores compared to using general LLMs alone.


Video-Mined Task Graphs for Keystep Recognition in Instructional Videos

Neural Information Processing Systems

Procedural activity understanding requires perceiving human actions in terms of a broader task, where multiple keysteps are performed in sequence across a long video to reach a final goal state---such as the steps of a recipe or the steps of a DIY fix-it task. Prior work largely treats keystep recognition in isolation of this broader structure, or else rigidly confines keysteps to align with a particular sequential script. We propose discovering a task graph automatically from how-to videos to represent probabilistically how people tend to execute keysteps, then leverage this graph to regularize keystep recognition in novel videos. On multiple datasets of real-world instructional video, we show the impact: more reliable zero-shot keystep localization and improved video representation learning, exceeding the state of the art.


Coordinating Distributed Example Orders for Provably Accelerated Training

Neural Information Processing Systems

Recent research on online Gradient Balancing (GraB) has revealed that there exist permutation-based example orderings for SGD that are guaranteed to outperform random reshuffling (RR). Whereas RR arbitrarily permutes training examples, GraB leverages stale gradients from prior epochs to order examples -- achieving a provably faster convergence rate than RR. However, GraB is limited by design: while it demonstrates an impressive ability to scale-up training on centralized data, it does not naturally extend to modern distributed ML workloads. We therefore propose Coordinated Distributed GraB (CD-GraB), which uses insights from prior work on kernel thinning to translate the benefits of provably faster permutation-based example ordering to distributed settings. With negligible overhead, CD-GraB exhibits a linear speedup in convergence rate over centralized GraB and outperforms distributed RR on a variety of benchmark tasks.


A case for reframing automated medical image classification as segmentation

Neural Information Processing Systems

Image classification and segmentation are common applications of deep learning to radiology. While many tasks can be framed using either classification or segmentation, classification has historically been cheaper to label and more widely used. However, recent work has drastically reduced the cost of training segmentation networks. First, we use an information theoretic approach to analyze why segmentation vs. classification models may achieve different performance on the same dataset and overarching task. We use our analysis and experiments to summarize the benefits of switching from segmentation to classification, including: improved sample efficiency, enabling improved performance with fewer labeled images (up to an order of magnitude lower), on low-prevalence classes, and on certain rare subgroups (up to 161.1\% improved recall); improved robustness to spurious correlations (up to 44.8\% improved robust AUROC); and improved model interpretability, evaluation, and error analysis.


Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online Reinforcement Learning

Neural Information Processing Systems

Offline-to-online reinforcement learning (RL) is a training paradigm that combines pre-training on a pre-collected dataset with fine-tuning in an online environment. However, the incorporation of online fine-tuning can intensify the well-known distributional shift problem. Existing solutions tackle this problem by imposing a policy constraint on the policy improvement objective in both offline and online learning. They typically advocate a single balance between policy improvement and constraints across diverse data collections. This one-size-fits-all manner may not optimally leverage each collected sample due to the significant variation in data quality across different states.


Margin-Independent Online Multiclass Learning via Convex Geometry

Neural Information Processing Systems

We consider the problem of multi-class classification, where a stream of adversarially chosen queries arrive and must be assigned a label online. Unlike traditional bounds which seek to minimize the misclassification rate, we minimize the total distance from each query to the region corresponding to its assigned label. When the true labels are determined via a nearest neighbor partition -- i.e. the label of a point is given by which of k centers it is closest to in Euclidean distance -- we show that one can achieve a loss that is independent of the total number of queries. We complement this result by showing that learning general convex sets requires an almost linear loss per query. Our results build off of regret guarantees for the problem of contextual search.


Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning

Neural Information Processing Systems

However, existing algorithms and theories for learning near-optimal policies in these two settings are rather different and disconnected. Towards bridging this gap, this paper initiates the theoretical study of *policy finetuning*, that is, online RL where the learner has additional access to a "reference policy" \mu close to the optimal policy \pi_\star in a certain sense. We consider the policy finetuning problem in episodic Markov Decision Processes (MDPs) with S states, A actions, and horizon length H . This offline result is the first that matches the sample complexity lower bound in this setting, and resolves a recent open question in offline RL. We then establish an \Omega(H 3S\min\{C \star, A\}/\varepsilon 2) sample complexity lower bound for *any* policy finetuning algorithm, including those that can adaptively explore the environment.


No-Regret Online Reinforcement Learning with Adversarial Losses and Transitions

Neural Information Processing Systems

Existing online learning algorithms for adversarial Markov Decision Processes achieve \mathcal{O}(\sqrt{T}) regret after T rounds of interactions even if the loss functions are chosen arbitrarily by an adversary, with the caveat that the transition function has to be fixed.This is because it has been shown that adversarial transition functions make no-regret learning impossible.Despite such impossibility results, in this work, we develop algorithms that can handle both adversarial losses and adversarial transitions, with regret increasing smoothly in the degree of maliciousness of the adversary.More concretely, we first propose an algorithm that enjoys \widetilde{\mathcal{O}}(\sqrt{T} C {P}) regret where C {P} measures how adversarial the transition functions are and can be at most \mathcal{O}(T) .While this algorithm itself requires knowledge of C {P}, we further develop a black-box reduction approach that removes this requirement.Moreover, we also show that further refinements of the algorithm not only maintains the same regret bound, but also simultaneously adapts to easier environments (where losses are generated in a certain stochastically constrained manner as in [Jin et al. 2021]) and achieves \widetilde{\mathcal{O}}(U \sqrt{UC {L}} C {P}) regret, where U is some standard gap-dependent coefficient and C {L} is the amount of corruption on losses.