concentration
Learning to Bid in Repeated Second-Price Auctions with Dynamic Values and Aggregated Feedback
Heymann, Benjamin, Sakhi, Otmane
We study the problem of learning to bid when the bidder's value is dynamic, i.e., when the current value depends on past outcomes. Specifically, we consider a bidder participating in repeated second-price auctions whose value depends on the time elapsed since their last successful bid, with auctions arriving in continuous time and only aggregated feedback revealed at the end of the horizon. Such a bidder must (1) balance the immediate benefit of winning the current auction against its impact on future values and (2) learn unknown environmental parameters. We derive regret bounds for a class of learning methods that combine plug-in estimators with a differential-equation characterization of the optimal policy, and show that a specific confidence bound algorithm learns the optimal policy with a near optimal regret of $\widetilde{O}(\log N)$ for piecewise linear primitives, and $\widetilde{O}(N^{1/3})$ for general, smooth primitives, achieving these regrets without explicit randomization. These theoretical results are supported by numerical experiments.
Application of Deep Reinforcement Learning to Event-Triggered Control for Networked Artificial Pancreas Systems
Ikemoto, Junya, Maruyama, Satoshi, Hashimoto, Kazumune
This paper proposes a deep reinforcement learning (DRL)-based event-triggered controller design for networked artificial pancreas (AP) systems. Although existing DRL-based AP controllers typically assume periodic control updates, networked control systems (NCSs) require a reduction in communication frequency to achieve energy-efficient operation, which is directly tied to control updates. However, jointly learning both insulin dosing and update timing significantly increases the complexity of the learning problem. To alleviate this complexity, we develop a practical DRL-based controller design that avoids explicitly learning update timing by introducing a rule-based criterion defined by changes in blood glucose. As a result, decision-making occurs at irregular intervals, and the problem is naturally formulated as a semi-Markov decision process (SMDP), for which we extend a standard DRL algorithm. Numerical experiments demonstrate that the proposed method improves communication efficiency while maintaining control performance.
Characterizing the Generalization Error of Random Feature Regression with Arbitrary Data-Augmentation
Morisset, Lucas, Durmus, Alain, Hardy, Adrien
Data augmentation (DA) is now a standard ingredient in modern machine learning pipelines, with extensive empirical evidence reporting improvements in generalization across modalities and tasks Mumuni and Mumuni (2022); Wang et al. (2025). It is often used to encode task-relevant symmetries directly into the training procedure, for instance by encouraging invariance to image rotations or other transformations of the input Shorten and Khoshgoftaar (2019); Chen et al. (2020). It has also been identified as one of the most effective regularization techniques across both supervised learning settings Bishop (1995); Cubuk et al. (2019); Mumuni and Mumuni (2022); Wang et al. (2025) and self-supervised/unsupervised learning Feng et al. (2021); Van Assel et al. (2025). Domain-specific augmentation pipelines have been central to progress in computer vision Shorten and Khoshgoftaar (2019); Kumar et al. (2024), natural language processing Feng et al. (2021); Shorten et al. (2021); Bayer et al. (2022), and time-series or audio applications Wen et al. (2020); Iwana and Uchida (2021); Iglesias et al. (2023). Despite these empirical successes, the benefits of DA remain highly task-and data-dependent, and augmentation schemes are often engineered in an ad hoc manner Fawzi et al. (2016); Cubuk et al. (2019); Lim et al. (2019); Hataya et al. (2020). In contrast with this rich empirical literature, comprehensive theoretical analyses of DA remain relatively scarce. Two classical starting points are, first, the interpretation of additive Gaussian noise as a form of explicit (ridge-like) regularization Bishop (1995); Lin et al. (2024), and second, the idea that leveraging distributional invariances and group structure in the learning objective helps decrease the variance of the model without increasing its bias Chen et al. (2020). Yet, when applied to modern and complex augmentation schemes, these works either provide only upper bounds on the generalization error Lin et al. (2024), or require very strong assumptions on the data distribution (e.g.
Spectral Lens: Activation and Gradient Spectra as Diagnostics of LLM Optimization
Liu, Andy Zeyi, Paquette, Elliot, Sous, John
Training loss and throughput can hide distinct internal representation in language-model training. To examine these hidden mechanics, we use spectral measurements as practical and operational diagnostics. Using a controlled family of decoder-only models adapted from the modded NanoGPT codebase, we introduce an empirical protocol based on activation covariance and per-sample gradient SVD spectra. This dual-view reveals three empirical findings and one mechanistic explanation. First, batch size acts as a latent determinant of representation geometry: runs that reach equal loss settle into systematically distinct activation spectra. Second, the activation covariance tail measured early in training reliably forecasts downstream token efficiency. Third, movement of the activation spectrum head (leading modes), together with gradient spectra, characterizes underlying learning-dynamics changes, separating learning-side architectural improvements from primarily execution-side gains. These predictive and diagnostic signals persist across the 12-, 36-, and 48-layer model tiers. Finally, a mechanistic model proves the main observations and explains how activation covariance spectra correlate with task-aligned feature learning.
Bayesian inference with sources of uncertainty: from confidence modelling to sparse estimation
Rosa, Rafael Mouallem, Arbel, Julyan, Nguyen, Hien Duy
We introduce a general framework that extends Bayesian inference by allowing the researcher to explicitly encode confidence in each source of uncertainty within the model. This mechanism provides a new handle for model design and regularisation control. Building on this framework, we develop a general approach for inducing sparsity in statistical models and illustrate its use in linear and logistic regression, as well as in Bayesian neural networks.
Graph Convolutional Support Vector Regression for Robust Spatiotemporal Forecasting of Urban Air Pollution
Jahan, Nourin, Panja, Madhurima, T, Muhammed Navas, Chakraborty, Tanujit
Urban air quality forecasting is challenging because pollutant concentrations are nonlinear, nonstationary, spatiotemporally dependent, and often affected by anomalous observations caused by traffic congestion, industrial emissions, and seasonal meteorological variability. This study proposes a Graph Convolutional Support Vector Regression (GCSVR) framework for robust spatiotemporal forecasting of urban air pollution. The model combines graph convolutional learning to capture inter-station spatial dependence with support vector regression to model nonlinear temporal dynamics while reducing sensitivity to outlier observations. The proposed framework is evaluated using air quality records from 37 monitoring stations in Delhi and 18 stations in Mumbai, representing inland and coastal metropolitan environments in India. Forecasting performance is assessed across multiple horizons and compared with established temporal and spatiotemporal benchmarks. The results show that GCSVR consistently improves predictive accuracy and maintains stable performance across seasons and outlier-prone pollution episodes. Statistical test further confirms the reliability of the proposed approach across the two cities. Finally, conformal prediction is integrated with GCSVR to generate calibrated prediction intervals, enhancing its practical value for uncertainty-aware air quality monitoring and public health decision-making.
Adaptive Principal Component Regression with Applications to Panel Data
Principal component regression (PCR) is a popular technique for fixed-design error-in-variables regression, a generalization of the linear regression setting in which the observed covariates are corrupted with random noise. We provide the first time-uniform finite sample guarantees for online (regularized) PCR whenever data is collected adaptively. Since the proof techniques for analyzing PCR in the fixed design setting do not readily extend to the online setting, our results rely on adapting tools from modern martingale concentration to the error-in-variables setting. As an application of our bounds, we provide a framework for experiment design in panel data settings when interventions are assigned adaptively. Our framework may be thought of as a generalization of the synthetic control and synthetic interventions frameworks, where data is collected via an adaptive intervention assignment policy.
Neural Circuits for Fast Poisson Compressed Sensing in the Olfactory Bulb
Within a single sniff, the mammalian olfactory system can decode the identity and concentration of odorants wafted on turbulent plumes of air. Yet, it must do so given access only to the noisy, dimensionally-reduced representation of the odor world provided by olfactory receptor neurons. As a result, the olfactory system must solve a compressed sensing problem, relying on the fact that only a handful of the millions of possible odorants are present in a given scene. Inspired by this principle, past works have proposed normative compressed sensing models for olfactory decoding. However, these models have not captured the unique anatomy and physiology of the olfactory bulb, nor have they shown that sensing can be achieved within the 100-millisecond timescale of a single sniff. Here, we propose a rate-based Poisson compressed sensing circuit model for the olfactory bulb.
4b5deb9a14d66ab0acc3b8a2360cde7c-Supplemental.pdf
What can linearized neural networks actually say about generalization? As mentioned in the main text, all our models are trained using the same scheme which was selected without any hyperparameter tuning, besides ensuring a good performance on CIFAR2 for the neural networks. Namely, we train using stochastic gradient descent (SGD) to optimize a binary crossentropy loss, with a decaying learning rate starting at 0.05 and momentum set to 0.9. Furthermore, we use a batch size of 128and train for a 100epochs. This is enough to obtain close-to-zero training losses for the neural networks, and converge to a stable test accuracy in the case of the linearized models1.