Instructional Material
Reviews: Online Continual Learning with Maximal Interfered Retrieval
This paper describes an approach to improve rehearsal-based continual learning techniques (either replay-based or with a generative model) by identifying samples that are most useful to avoid forgetting. This is achieved by computing the increase in loss on the replayed samples, and using this to determine which samples should be used during learning. It is a simple and intuitive idea, the paper is clearly written, and experiments on multiple datasets are compelling. I think it could make a nice addition to the conference, but needs a few improvements first. My main criticism is that the approach requires a separate virtual gradient step for each actual step, to compute the change in loss on the replay samples.
Review for NeurIPS paper: Neural Networks Fail to Learn Periodic Functions and How to Fix It
Weaknesses: A significant shortcoming in the approach is the lack of proper and thorough validation with recurrent neural networks. The stated problem (i.i.d. in fourier space, restricted to a compact region in R d) can be tackled with the auto-regressive approach provided by RNNs. However this is not mentioned, or compared against in the paper. I note that RNNs are somewhat compared against in the appendix, but this is with respect to RNN Snake vs RNN. Above all this paper needs a comparison showing the ability of feedforward Snake networks to outperform vanilla RNNs.
Review for NeurIPS paper: Neural Networks Fail to Learn Periodic Functions and How to Fix It
I think this is an interesting submission, that lead to a detailed discussion among the reviewers. Overall the work is novel and looks at an interesting question regarding extrapolation (and dealing with periodic functions). Overall I agree with some of the reviewers that the motivation and generally the write-up of the work could be improved, but I think there is already value in the work. I would like to highlight a few points that are worth considering: * a further discussion regarding how the proposed approach compares to RNNs or autoregressive models when it comes to modeling periodicity * there are some concerns regarding the methodology used (e.g.
Review for NeurIPS paper: Improved Schemes for Episodic Memory-based Lifelong Learning
There has been a plethora of recent and historical work on this topic, finding different ways to help networks alleviate the issue of catastrophic forgetting --- where a network trained on tasks A_0 through A_i, forgets these to differing degrees when trained on tasks A_i 1 onward. Most methods can be divided into regularisation based, memory based or meta-learning based. One relatively recent work is GEM (gradient of episodic memory) (and relatedly A-GEM). This works by storing examples from seen tasks in an episodic memory. When learning a new task, the gradient update is modified such that it does not increase the loss on examples from previous tasks (these are represented by the examples in memory).
Review for NeurIPS paper: Improved Schemes for Episodic Memory-based Lifelong Learning
The paper introduces a clear, simple generalisation of two established continual learning methods (GEM and A-GEM) which performs very well in a thorough empirical evaluation. All reviewers and the AC value the effort that the authors put in their response. There is consensus that the work has merit and all reviewers recommend accepting the paper (R1 and R4 raised their score).
LLM-Assisted Knowledge Graph Completion for Curriculum and Domain Modelling in Personalized Higher Education Recommendations
Abu-Rasheed, Hasan, Jumbo, Constance, Amin, Rashed Al, Weber, Christian, Wiese, Veit, Obermaisser, Roman, Fathi, Madjid
While learning personalization offers great potential for learners, modern practices in higher education require a deeper consideration of domain models and learning contexts, to develop effective personalization algorithms. This paper introduces an innovative approach to higher education curriculum modelling that utilizes large language models (LLMs) for knowledge graph (KG) completion, with the goal of creating personalized learning-path recommendations. Our research focuses on modelling university subjects and linking their topics to corresponding domain models, enabling the integration of learning modules from different faculties and institutions in the student's learning path. Central to our approach is a collaborative process, where LLMs assist human experts in extracting high-quality, fine-grained topics from lecture materials. We develop a domain, curriculum, and user models for university modules and stakeholders. We implement this model to create the KG from two study modules: Embedded Systems and Development of Embedded Systems Using FPGA. The resulting KG structures the curriculum and links it to the domain models. We evaluate our approach through qualitative expert feedback and quantitative graph quality metrics. Domain experts validated the relevance and accuracy of the model, while the graph quality metrics measured the structural properties of our KG. Our results show that the LLM-assisted graph completion approach enhances the ability to connect related courses across disciplines to personalize the learning experience. Expert feedback also showed high acceptance of the proposed collaborative approach for concept extraction and classification.
CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification
Patrício, Cristiano, Rio-Torto, Isabel, Cardoso, Jaime S., Teixeira, Luís F., Neves, João C.
The main challenges limiting the adoption of deep learning-based solutions in medical workflows are the availability of annotated data and the lack of interpretability of such systems. Concept Bottleneck Models (CBMs) tackle the latter by constraining the final disease prediction on a set of predefined and human-interpretable concepts. However, the increased interpretability achieved through these concept-based explanations implies a higher annotation burden. Moreover, if a new concept needs to be added, the whole system needs to be retrained. Inspired by the remarkable performance shown by Large Vision-Language Models (LVLMs) in few-shot settings, we propose a simple, yet effective, methodology, CBVLM, which tackles both of the aforementioned challenges. First, for each concept, we prompt the LVLM to answer if the concept is present in the input image. Then, we ask the LVLM to classify the image based on the previous concept predictions. Moreover, in both stages, we incorporate a retrieval module responsible for selecting the best examples for in-context learning. By grounding the final diagnosis on the predicted concepts, we ensure explainability, and by leveraging the few-shot capabilities of LVLMs, we drastically lower the annotation cost. We validate our approach with extensive experiments across four medical datasets and twelve LVLMs (both generic and medical) and show that CBVLM consistently outperforms CBMs and task-specific supervised methods without requiring any training and using just a few annotated examples. More information on our project page: https://cristianopatricio.github.io/CBVLM/.
UI-TARS: Pioneering Automated GUI Interaction with Native Agents
Qin, Yujia, Ye, Yining, Fang, Junjie, Wang, Haoming, Liang, Shihao, Tian, Shizuo, Zhang, Junda, Li, Jiahao, Li, Yunxin, Huang, Shijue, Zhong, Wanjun, Li, Kuanye, Yang, Jiale, Miao, Yu, Lin, Woyu, Liu, Longxiang, Jiang, Xu, Ma, Qianli, Li, Jingyu, Xiao, Xiaojun, Cai, Kai, Li, Chuang, Zheng, Yaowei, Jin, Chaolin, Li, Chen, Zhou, Xiao, Wang, Minchao, Chen, Haoli, Li, Zhaojian, Yang, Haihua, Liu, Haifeng, Lin, Feng, Peng, Tao, Liu, Xin, Shi, Guang
This paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions (e.g., keyboard and mouse operations). Unlike prevailing agent frameworks that depend on heavily wrapped commercial models (e.g., GPT-4o) with expert-crafted prompts and workflows, UI-TARS is an end-to-end model that outperforms these sophisticated frameworks. Experiments demonstrate its superior performance: UI-TARS achieves SOTA performance in 10+ GUI agent benchmarks evaluating perception, grounding, and GUI task execution (see below). Notably, in the OSWorld benchmark, UI-TARS achieves scores of 24.6 with 50 steps and 22.7 with 15 steps, outperforming Claude's 22.0 and 14.9 respectively. In AndroidWorld, UI-TARS achieves 46.6, surpassing GPT-4o's 34.5. UI-TARS incorporates several key innovations: (1) Enhanced Perception: leveraging a large-scale dataset of GUI screenshots for context-aware understanding of UI elements and precise captioning; (2) Unified Action Modeling, which standardizes actions into a unified space across platforms and achieves precise grounding and interaction through large-scale action traces; (3) System-2 Reasoning, which incorporates deliberate reasoning into multi-step decision making, involving multiple reasoning patterns such as task decomposition, reflection thinking, milestone recognition, etc. (4) Iterative Training with Reflective Online Traces, which addresses the data bottleneck by automatically collecting, filtering, and reflectively refining new interaction traces on hundreds of virtual machines. Through iterative training and reflection tuning, UI-TARS continuously learns from its mistakes and adapts to unforeseen situations with minimal human intervention. We also analyze the evolution path of GUI agents to guide the further development of this domain.
Reviews: Lifelong Learning with Weighted Majority Votes
From my very personal point of view, the lifelong learning paradigm is a vague concept and is sometimes evoked for studying different scenarios that can be named otherwise (like transfer learning). However, I think that the framework studied here (from Balcan et al., 2015) is a very pertinent "lifelong problem". The authors present an honest work in the right direction. The proofs are not trivial but, as I explain below, the contribution appears to me insufficient for NIPS. The risk bound minimized by the learning algorithm may be very high, as it relies on the VC-dimension of the predictors.
Optimizing LLM test-time compute involves solving a meta-RL problem
Figure 1: Training models to optimize test-time compute and learn "how to discover" correct responses, as opposed to the traditional learning paradigm of learning "what answer" to output. The major strategy to improve large language models (LLMs) thus far has been to use more and more high-quality data for supervised fine-tuning (SFT) or reinforcement learning (RL). Unfortunately, it seems this form of scaling will soon hit a wall, with the scaling laws for pre-training plateauing, and with reports that high-quality text data for training maybe exhausted by 2028, particularly for more difficult tasks, like solving reasoning problems which seems to require scaling current data by about 100x to see any significant improvement. The current performance of LLMs on problems from these hard tasks remains underwhelming (see example). There is thus a pressing need for data-efficient methods for training LLMs that extend beyond data scaling and can address more complex challenges.