Education
This Indigenous Language Survived Russian Occupation. Can It Survive YouTube?
This Indigenous Language Survived Russian Occupation. YouTube's search and recommendation algorithms are driving children to Russian-language content even when they seek out videos in Kyrgyz, creating a cultural shift that concerns some parents. When anthropology researcher Ashley McDermott was doing fieldwork in Kyrgyzstan a few years ago, she says many people voiced the same concern: Children were losing touch with their indigenous language. The Central Asian country of 7 million people was under Russian control for a century until 1991, but Kyrgyz (pronounced kur-giz) survived and remains widely spoken among adults. McDermott, a doctoral student at the University of Michigan, says she also heard that some kids in rural villages where Kyrgyz dominated had spontaneously learned to speak Russian.
Threshold Learning for Optimal Decision Making
Decision making under uncertainty is commonly modelled as a process of competitive stochastic evidence accumulation to threshold (the drift-diffusion model). However, it is unknown how animals learn these decision thresholds. We examine threshold learning by constructing a reward function that averages over many trials to Wald's cost function that defines decision optimality. These rewards are highly stochastic and hence challenging to optimize, which we address in two ways: first, a simple two-factor reward-modulated learning rule derived from Williams' REINFORCE method for neural networks; and second, Bayesian optimization of the reward function with a Gaussian process. Bayesian optimization converges in fewer trials than REINFORCE but is slower computationally with greater variance. The REINFORCE method is also a better model of acquisition behaviour in animals and a similar learning rule has been proposed for modelling basal ganglia function.
Transfer Learning on Heterogeneous Feature Spaces for Treatment Effects Estimation
Consider the problem of improving the estimation of conditional average treatment effects (CATE) for a target domain of interest by leveraging related information from a source domain with a different feature space. This heterogeneous transfer learning problem for CATE estimation is ubiquitous in areas such as healthcare where we may wish to evaluate the effectiveness of a treatment for a new patient population for which different clinical covariates and limited data are available. In this paper, we address this problem by introducing several building blocks that use representation learning to handle the heterogeneous feature spaces and a flexible multi-task architecture with shared and private layers to transfer information between potential outcome functions across domains. Then, we show how these building blocks can be used to recover transfer learning equivalents of the standard CATE learners. On a new semi-synthetic data simulation benchmark for heterogeneous transfer learning we not only demonstrate performance improvements of our heterogeneous transfer causal effect learners across datasets, but also provide insights into the differences between these learners from a transfer perspective.
The Bernstein-von Mises theorem for Bayesian one-pass online learning
Lee, Jeyong, Choi, Junhyeok, Kim, Dongguen, Chae, Minwoo
Bayesian online learning provides a coherent framework for sequential inference. However, its theoretical understanding remains limited, particularly in the one-pass setting. Existing theoretical guarantees typically require the mini-batch sample size to diverge, a condition that fails in the one-pass regime. In this paper, we propose a new Bayesian online learning algorithm tailored to the one-pass setting, which incorporates a warm-start phase to ensure stable sequential updates. For this algorithm, we show that the sequentially updated posterior attains the optimal convergence rate. Building on this, we establish an online analogue of the Bernstein-von Mises theorem, which guarantees valid uncertainty quantification without diverging mini-batch sample sizes. Our analysis is based on a novel theoretical framework that differs fundamentally from existing approaches in the online learning literature. Numerical experiments on generalized linear models show that the proposed method matches the performance of the batch estimator while outperforming existing online procedures.
Optimized Deferral for Imbalanced Settings
Cortes, Corinna, Mao, Anqi, Mohri, Mehryar, Zhong, Yutao
Learning algorithms can be significantly improved by routing complex or uncertain inputs to specialized experts, balancing accuracy with computational cost. This approach, known as learning to defer, is essential in domains like natural language generation, medical diagnosis, and computer vision, where an effective deferral can reduce errors at low extra resource consumption. However, the two-stage learning to defer setting, which leverages existing predictors such as a collection of LLMs or other classifiers, often faces challenges due to an expert imbalance problem. This imbalance can lead to suboptimal performance, with deferral algorithms favoring the majority expert. We present a comprehensive study of two-stage learning to defer in expert imbalance settings. We cast the deferral loss optimization as a novel cost-sensitive learning problem over the input-expert domain. We derive new margin-based loss functions and guarantees tailored to this setting, and develop novel algorithms for cost-sensitive learning. Leveraging these results, we design principled deferral algorithms, MILD (Margin-based Imbalanced Learning to Defer), specifically suited for expert imbalance settings. Extensive experiments demonstrate the effectiveness of our approach, showing clear improvements over existing baselines on both image classification and real-world Large Language Model (LLM) routing tasks.
Sequential Inference for Gaussian Processes: A Signal Processing Perspective
Waxman, Daniel, Llorente, Fernando, Djurić, Petar M.
The proliferation of capable and efficient machine learning (ML) models marks one of the strongest methodological shifts in signal processing (SP) in its nearly 100-year history. ML models support the development of SP systems that represent complex, nonlinear relationships with high predictive accuracy. Adapting these models often requires sequential inference, which differs both theoretically and methodologically from the usual paradigm of ML, where data are often assumed independent and identically distributed. Gaussian processes (GPs) are a flexible yet principled framework for modeling random functions, and they have become increasingly relevant to SP as statistical and ML methods assume a more prominent role. We provide a self-contained, tutorial-style overview of GPs, with a particular focus on recent methodological advances in sequential, incremental, or streaming inference. We introduce these techniques from a signal-processing perspective while bridging them to recent advances in ML. Many of the developments we survey have direct applications to state-space modeling, sequential regression and forecasting, anomaly detection in time series, sequential Bayesian optimization, adaptive and active sensing, and sequential detection and decision-making. By organizing these advances from a signal-processing perspective, we intend to equip practitioners with practical tools and a coherent roadmap for deploying sequential GP models in real-world systems.
Improved Algorithms for Online Submodular Maximization via First-order Regret Bounds
We consider the problem of nonnegative submodular maximization in the online setting. At time step t, an algorithm selects a set St C 2V where C is a feasible family of sets. An adversary then reveals a submodular function ft. The goal is to design an efficient algorithm for minimizing the expected approximate regret. In this work, we give a general approach for improving regret bounds in online submodular maximization by exploiting "first-order" regret bounds for online linear optimization. For monotone submodular maximization subject to a matroid, we give an efficient algorithm which achieves a (1 c/e ε)-regret of O( p kTln(n/k)) where n is the size of the ground set, k is the rank of the matroid, ε > 0 is a constant, and cis the average curvature. Even without assuming any curvature (i.e., taking c = 1), this regret bound improves on previous results of Streeter et al. (2009) and Golovin et al. (2014). For nonmonotone, unconstrained submodular functions, we give an algorithm with 1/2-regret O( nT), improving on the results of Roughgarden and Wang (2018). Our approach is based on Blackwell approachability; in particular, we give a novel first-order regret bound for the Blackwell instances that arise in this setting.