Statistical Learning
ABayesian Approach to Contextual Dynamic Pricing using the Proportional Hazards Model with Discrete Price Data
Dynamic pricing algorithms typically assume continuous price variables, which may not reflect real-world scenarios where prices are often discrete. This paper demonstrates that leveraging discrete price information within a semi-parametric model can substantially improve performance, depending on the size of the support set of the price variable relative to the time horizon. Specifically, we propose a novel semi-parametric contextual dynamic pricing algorithm, namely BayesCoxCP, based on a Bayesian approach to the Cox proportional hazards model. Our theoretical analysis establishes high-probability regret bounds that adapt to the sparsity level ฮณ, proving that our algorithm achieves a regret upper bound of eO(T(1+ฮณ)/2 + dT) for ฮณ < 1/3 and eO(T2/3 + dT) for ฮณ 1/3, where ฮณ represents the sparsity of the price grid relative to the time horizon T. Through numerical experiments, we demonstrate that our proposed algorithm significantly outperforms an existing method, particularly in scenarios with sparse discrete price points.
Online Functional Tensor Decomposition via Continual Learning for Streaming Data Completion
Online tensor decompositions are powerful and proven techniques that address the challenges in processing high-velocity streaming tensor data, such as traffic flow and weather system. The main aim of this work is to propose a novel online functional tensor decomposition (OFTD) framework, which represents a spatialtemporal continuous function using the CP tensor decomposition parameterized by coordinate-based implicit neural representations (INRs). The INRs allow for natural characterization of continually expanded streaming data by simply adding new coordinates into the network. Particularly, our method transforms the classical online tensor decomposition algorithm into a more dynamic continual learning paradigm of updating the INR weights to fit the new data without forgetting the previous tensor knowledge. To this end, we introduce a long-tail memory replay method that adapts to the local continuity property of INR. Extensive experiments for streaming tensor completion using traffic, weather, user-item, and video data verify the effectiveness of the OFTD approach for streaming data analysis. This endeavor serves as a pivotal inspiration for future research to connect classical online tensor tools with continual learning paradigms to better explore knowledge underlying streaming tensor data.
Next Semantic Scale Prediction via Hierarchical Diffusion Language Models
In this paper we introduce Hierarchical Diffusion Language Models (HDLM) - a novel family of discrete diffusion models for language modeling. HDLM builds on a hierarchical vocabulary where low-level tokens with detailed semantics are surjectively mapped to high-level tokens with coarse-grained meanings. In the forward process, each token is independently perturbed to its higher-level ancestor with more abstract semantics according to the scheduler, while in the reverse process the model progressively predicts the next, more detailed semantics. Taken together, HDLM provides a general time-varying next semantic scale prediction process for language modeling. We derive closed-form expressions for the diffusion Evidence Lower Bound (ELBO), and show that HDLM can be implemented in a flexible manner while including the existing MDLM as a special case. We also propose practical training techniques based on the insights. Extensive text generation experiments validate the effectiveness of HDLM, which demonstrates consistently lower validation and generative perplexity than baselines.
Reward-oriented Causal Representation Learning
Causal representation learning (CRL) is the process of disentangling the latent low-dimensional causally-related generating factors underlying high-dimensional observable data. Extensive recent studies have characterized CRL identifiability and perfect recovery of the latent variables and their attendant causal graph. This paper introduces the notion of reward-oriented CRL, the purpose of which is to move away from perfectly learning the latent representation and instead learning it to the extent needed for optimizing a desired downstream task (reward). In reward-oriented CRL, perfectly learning the latent representation can be excessive; instead, it must be learned at the coarsest level sufficient for optimizing the desired task. Reward-oriented CRL is formalized as the optimization of a desired function of the observable data over the space of all possible interventions and focuses on linear causal and transformation models. To sequentially identify the optimal subset of interventions, an adaptive exploration algorithm is designed that learns the latent causal graph and the variables needed to identify the best intervention. It is shown that for an n-dimensional latent space and a d-dimensional observation space, over a horizon T the algorithm's regret scales as O(d
Unified Transferability Metrics for Time Series Foundation Models
With the increasing number of time series pre-trained models, designing transferability evaluation metrics for time series has become an urgent problem to address. While transferability evaluation has been extensively studied in computer vision, we aim to address a critical gap by developing tailored metrics for time series analysis. In this paper, we introduce TEMPLATE, a transferability estimation framework specifically tailored for versatile time series analysis, comprising three complementary metrics: (1) Dependency Learning Score quantifies a model's capacity to capture temporal dependencies.
Bi-Directional Communication-Efficient Stochastic FL via Remote Source Generation
The literature largely focuses on lossy compression of model updates in deterministic FL. In contrast, stochastic (Bayesian) FL considers distributions over parameters, enabling uncertainty quantification, better generalization, and, crucially, inherent communication-regularized training through a mirror-descent structure. In this paper, we consider both uplink and downlink communication in stochastic FL, and propose a communication framework based on remote source generation. Employing Minimal Random Coding (MRC) for remote generation, we allow the server and the clients to sample from local and global posteriors (sources), respectively, rather than transmitting locally sampled updates. The framework encompasses communication-regularized local optimization and principled compression of model updates, leveraging gradually updated prior distributions as side information. Through extensive simulations, we show that our method achieves 5 32 reduction in total communication cost while preserving accuracy. We further analyze the communication cost, refining existing MRC bounds and enabling precise quantification of uplink and downlink trade-offs. We also extend our method to conventional FL via stochastic quantization and prove a contraction property for the biased MRC compressor to facilitate convergence analysis.
ACloser Look at Model Collapse: From a Generalization-to-Memorization Perspective
The widespread use of diffusion models has led to an abundance of AI-generated data, raising concerns about model collapse--a phenomenon in which recursive iterations of training on synthetic data lead to performance degradation. Prior work primarily characterizes this collapse via variance shrinkage or distribution shift, but these perspectives miss practical manifestations of model collapse. This paper identifies a transition from generalization to memorization during model collapse in diffusion models, where models increasingly replicate training data instead of generating novel content during iterative training on synthetic samples. This transition is directly driven by the declining entropy of the synthetic training data produced in each training cycle, which serves as a clear indicator of model degradation. Motivated by this insight, we propose an entropy-based data selection strategy to mitigate the transition from generalization to memorization and alleviate model collapse. Empirical results show that our approach significantly enhances visual quality and diversity in recursive generation, effectively preventing collapse.
SAFE: Multitask Failure Detection for Vision-Language-Action Models
While vision-language-action models (VLAs) have shown promising robotic behaviors across a diverse set of manipulation tasks, they achieve limited success rates when deployed on novel tasks out of the box. To allow these policies to safely interact with their environments, we need a failure detector that gives a timely alert such that the robot can stop, backtrack, or ask for help. However, existing failure detectors are trained and tested only on one or a few specific tasks, while generalist VLAs require the detector to generalize and detect failures also in unseen tasks and novel environments. In this paper, we introduce the multitask failure detection problem and propose SAFE, a failure detector for generalist robot policies such as VLAs. We analyze the VLA feature space and find that VLAs have sufficient highlevel knowledge about task success and failure, which is generic across different tasks.
Multi-Objective Hyperparameter Selection via Hypothesis Testing on Reliability Graphs
The selection of hyperparameters, such as prompt templates in large language models (LLMs), must often strike a balance between reliability and cost. In many cases, structural relationships between the expected reliability levels of the hyperparameters can be inferred from prior information and held-out data - e.g., longer prompt templates may be more detailed and thus more reliable. However, existing hyperparameter selection methods either do not provide formal reliability guarantees or are unable to incorporate structured knowledge in the hyperparameter space. This paper introduces reliability graph-based Pareto testing (RG-PT), a novel multi-objective hyperparameter selection framework that maintains formal reliability guarantees in terms of false discovery rate (FDR), while accounting for known relationships among hyperparameters via a directed acyclic graph. Edges in the graph reflect expected reliability and cost trade-offs among hyperparameters, which are inferred via the Bradley-Terry (BT) ranking model from prior information and held-out data. Experimental evaluations demonstrate that RG-PT significantly outperforms existing methods such as learn-then-test (LTT) and Pareto testing (PT) through a more efficient exploration of the hyperparameter space.
Implicit Bias of Spectral Descent and Muon on Multiclass Separable Data
Different gradient-based methods for optimizing overparameterized models can all achieve zero training error yet converge to distinctly different solutions inducing different generalization properties. We provide the first complete characterization of implicit optimization bias for p-norm normalized steepest descent (NSD) and momentum steepest descent (NMD) algorithms in multi-class linear classification with cross-entropy loss. Our key theoretical contribution is proving that these algorithms converge to solutions maximizing the margin with respect to the classifier matrix's p-norm, with established convergence rates. These results encompass important special cases including Spectral Descent and Muon, which we show converge to max-margin solutions with respect to the spectral norm. A key insight of our contribution is that the analysis of general entry-wise and Schatten p-norms can be reduced to the analysis of NSD/NMD with max-norm by exploiting a natural ordering property between all p-norms relative to the max-norm and its dual sumnorm. For the specific case of descent with respect to the max-norm, we further extend our analysis to include preconditioning, showing that Adam converges to the matrix's max-norm solution. Our results demonstrate that the multi-class linear setting, which is inherently richer than the binary counterpart, provides the most transparent framework for studying implicit biases of matrix-parameter optimization algorithms.