Instructional Material
Curriculum Fine-tuning of Vision Foundation Model for Medical Image Classification Under Label Noise
Deep neural networks have demonstrated remarkable performance in various vision tasks, but their success heavily depends on the quality of the training data. Noisy labels are a critical issue in medical datasets and can significantly degrade model performance. Previous clean sample selection methods have not utilized the well pre-trained features of vision foundation models (VFMs) and assumed that training begins from scratch. In this paper, we propose CUFIT, a curriculum fine-tuning paradigm of VFMs for medical image classification under label noise. Our method is motivated by the fact that linear probing of VFMs is relatively unaffected by noisy samples, as it does not update the feature extractor of the VFM, thus robustly classifying the training samples. Subsequently, curriculum fine-tuning of two adapters is conducted, starting with clean sample selection from the linear probing phase.
The Power of Resets in Online Reinforcement Learning
Simulators are a pervasive tool in reinforcement learning, but most existing algorithms cannot efficiently exploit simulator access -- particularly in high-dimensional domains that require general function approximation. We explore the power of simulators through online reinforcement learning with local simulator access (or, local planning), an RL protocol where the agent is allowed to reset to previously observed states and follow their dynamics during training. We use local simulator access to unlock new statistical guarantees that were previously out of reach:- We show that MDPs with low coverability (Xie et al. 2023) -- a general structural condition that subsumes Block MDPs and Low-Rank MDPs -- can be learned in a sample-efficient fashion with only Q -realizability (realizability of the optimal state-value function); existing online RL algorithms require significantly stronger representation conditions.- As a consequence, we show that the notorious Exogenous Block MDP problem (Efroni et al. 2022) is tractable under local simulator access.The results above are achieved through a computationally inefficient algorithm. We complement them with a more computationally efficient algorithm, RVFS (Recursive Value Function Search), which achieves provable sample complexity guarantees under a strengthened statistical assumption known as pushforward coverability.
Learning with Restricted Boltzmann Machines: Asymptotics of AMP and GD in High Dimensions
Xu, Yizhou, Krzakala, Florent, Zdeborovรก, Lenka
The Restricted Boltzmann Machine (RBM) is one of the simplest generative neural networks capable of learning input distributions. Despite its simplicity, the analysis of its performance in learning from the training data is only well understood in cases that essentially reduce to singular value decomposition of the data. Here, we consider the limit of a large dimension of the input space and a constant number of hidden units. In this limit, we simplify the standard RBM training objective into a form that is equivalent to the multi-index model with non-separable regularization. This opens a path to analyze training of the RBM using methods that are established for multi-index models, such as Approximate Message Passing (AMP) and its state evolution, and the analysis of Gradient Descent (GD) via the dynamical mean-field theory. We then give rigorous asymptotics of the training dynamics of RBM on data generated by the spiked covariance model as a prototype of a structure suitable for unsupervised learning. We show in particular that RBM reaches the optimal computational weak recovery threshold, aligning with the BBP transition, in the spiked covariance model.
Evaluating the Performance of Nigerian Lecturers using Multilayer Perceptron
Ezeibe, I. E., Okide, S. O., Asogwa, D. C.
Evaluating the performance of a lecturer has been essential for enhancing teaching quality, improving student learning outcomes, and strengthening the institution's reputation. The absence of such a system brings about lecturer performance evaluation which was neither comprehensive nor holistic. This system was designed using a web-based platform, created a secure database, and by using a custom dataset, captured some performance metrics which included student evaluation scores, Research Publications, Years of Experience, and Administrative Duties. Multilayer Perceptron (MLP) algorithm was utilized due to its ability to process complex data patterns and generates accurate predictions in a lecturer's performance based on historical data. This research focused on designing multiple performance metrics beyond the standard ones, incorporating student participation, and integrating analytical tools to deliver a comprehensive and holistic evaluation of lecturers' performance and was developed using Object-Oriented Analysis and Design (OOAD) methodology. Lecturers' performance is evaluated by the model, and the evaluation accuracy is about 91% compared with actual performance. Finally, by evaluating the performance of the MLP model, it is concluded that MLP enhanced lecturer performance evaluation by providing accurate predictions, reducing bias, and supporting data-driven decisions, ultimately improving the fairness and efficiency of the evaluation process. The MLP model's performance was evaluated using Mean Squared Error (MSE) and Mean Absolute Error (MAE), achieved a test loss (MSE) of 256.99 and a MAE of 13.76, and reflected a high level of prediction accuracy. The model also demonstrated an estimated accuracy rate of approximately 96%, validated its effectiveness in predicting lecturer performance.
Are LLMs Ready for English Standardized Tests? A Benchmarking and Elicitation Perspective
Tang, Luoxi, Sundar, Tharunya, Yang, Shuai, Patra, Ankita, Chippada, Manohar, Zhao, Giqi, Li, Yi, Zhang, Riteng, Zhao, Tunan, Yang, Ting, Meng, Yuqiao, Ma, Weicheng, Xi, Zhaohan
AI is transforming education by enabling powerful tools that enhance learning experiences. Among recent advancements, large language models (LLMs) hold particular promise for revolutionizing how learners interact with educational content. In this work, we investigate the potential of LLMs to support standardized test preparation by focusing on English Standardized Tests (ESTs). Specifically, we assess their ability to generate accurate and contextually appropriate solutions across a diverse set of EST question types. We introduce ESTBOOK, a comprehensive benchmark designed to evaluate the capabilities of LLMs in solving EST questions. ESTBOOK aggregates five widely recognized tests, encompassing 29 question types and over 10,576 questions across multiple modalities, including text, images, audio, tables, and mathematical symbols. Using ESTBOOK, we systematically evaluate both the accuracy and inference efficiency of LLMs. Additionally, we propose a breakdown analysis framework that decomposes complex EST questions into task-specific solution steps. This framework allows us to isolate and assess LLM performance at each stage of the reasoning process. Evaluation findings offer insights into the capability of LLMs in educational contexts and point toward targeted strategies for improving their reliability as intelligent tutoring systems.
Supervised Models Can Generalize Also When Trained on Random Labels
Allerbo, Oskar, Schรถn, Thomas B.
The success of unsupervised learning raises the question of whether also supervised models can be trained without using the information in the output $y$. In this paper, we demonstrate that this is indeed possible. The key step is to formulate the model as a smoother, i.e. on the form $\hat{f}=Sy$, and to construct the smoother matrix $S$ independently of $y$, e.g. by training on random labels. We present a simple model selection criterion based on the distribution of the out-of-sample predictions and show that, in contrast to cross-validation, this criterion can be used also without access to $y$. We demonstrate on real and synthetic data that $y$-free trained versions of linear and kernel ridge regression, smoothing splines, and neural networks perform similarly to their standard, $y$-based, versions and, most importantly, significantly better than random guessing.
PIIvot: A Lightweight NLP Anonymization Framework for Question-Anchored Tutoring Dialogues
Zent, Matthew, Smith, Digory, Woodhead, Simon
Personally identifiable information (PII) anonymization is a high-stakes task that poses a barrier to many open-science data sharing initiatives. While PII identification has made large strides in recent years, in practice, error thresholds and the recall/precision trade-off still limit the uptake of these anonymization pipelines. We present PIIvot, a lighter-weight framework for PII anonymization that leverages knowledge of the data context to simplify the PII detection problem. To demonstrate its effectiveness, we also contribute QATD-2k, the largest open-source real-world tutoring dataset of its kind, to support the demand for quality educational dialogue data.
Bayesian Ensembling: Insights from Online Optimization and Empirical Bayes
Waxman, Daniel, Llorente, Fernando, Djuriฤ, Petar M.
We revisit the classical problem of Bayesian ensembles and address the challenge of learning optimal combinations of Bayesian models in an online, continual learning setting. To this end, we reinterpret existing approaches such as Bayesian model averaging (BMA) and Bayesian stacking through a novel empirical Bayes lens, shedding new light on the limitations and pathologies of BMA. Further motivated by insights from online optimization, we propose Online Bayesian Stacking (OBS), a method that optimizes the log-score over predictive distributions to adaptively combine Bayesian models. A key contribution of our work is establishing a novel connection between OBS and portfolio selection, bridging Bayesian ensemble learning with a rich, well-studied theoretical framework that offers efficient algorithms and extensive regret analysis. We further clarify the relationship between OBS and online BMA, showing that they optimize related but distinct cost functions. Through theoretical analysis and empirical evaluation, we identify scenarios where OBS outperforms online BMA and provide principled guidance on when practitioners should prefer one approach over the other.
AI-guided Antibiotic Discovery Pipeline from Target Selection to Compound Identification
Schuh, Maximilian G., Hesse, Joshua, Sieber, Stephan A.
Antibiotic resistance presents a growing global health crisis, demanding new therapeutic strategies that target novel bacterial mechanisms. Recent advances in protein structure prediction and machine learning-driven molecule generation offer a promising opportunity to accelerate drug discovery. However, practical guidance on selecting and integrating these models into real-world pipelines remains limited. In this study, we develop an end-to-end, artificial intelligence-guided antibiotic discovery pipeline that spans target identification to compound realization. We leverage structure-based clustering across predicted proteomes of multiple pathogens to identify conserved, essential, and non-human-homologous targets. We then systematically evaluate six leading 3D-structure-aware generative models$\unicode{x2014}$spanning diffusion, autoregressive, graph neural network, and language model architectures$\unicode{x2014}$on their usability, chemical validity, and biological relevance. Rigorous post-processing filters and commercial analogue searches reduce over 100 000 generated compounds to a focused, synthesizable set. Our results highlight DeepBlock and TamGen as top performers across diverse criteria, while also revealing critical trade-offs between model complexity, usability, and output quality. This work provides a comparative benchmark and blueprint for deploying artificial intelligence in early-stage antibiotic development.
How Managers Perceive AI-Assisted Conversational Training for Workplace Communication
Wilhelm, Lance T., Ding, Xiaohan, Knutsen, Kirk McInnis, Carik, Buse, Rho, Eugenia H.
Effective workplace communication is essential for managerial success, yet many managers lack access to tailored and sustained training. Although AI-assisted communication systems may offer scalable training solutions, little is known about how managers envision the role of AI in helping them improve their communication skills. To investigate this, we designed a conversational role-play system, CommCoach, as a functional probe to understand how managers anticipate using AI to practice their communication skills. Through semi-structured interviews, participants emphasized the value of adaptive, low-risk simulations for practicing difficult workplace conversations. They also highlighted opportunities, including human-AI teaming, transparent and context-aware feedback, and greater control over AI-generated personas. AI-assisted communication training should balance personalization, structured learning objectives, and adaptability to different user styles and contexts. However, achieving this requires carefully navigating tensions between adaptive and consistent AI feedback, realism and potential bias, and the open-ended nature of AI conversations versus structured workplace discourse.