Asia
The invisibility cloak inventor now has better tricks up his sleeve
John Pendry is known for creating an invisibility cloak. John Pendry's kitchen is dominated by a huge photograph of what looks like the view through a kaleidoscope: dizzying shards of purple, green, yellow and white. Given that Pendry is famous above all else for inventing an invisibility cloak - a device that can bend light around objects - I wonder if I am looking at something related to that. But no, he tells me, the image simply shows crystals of vitamin C magnified many times. All that invisibility-cloak stuff is in the past, he says, and he has moved on to "more exciting things".
Efficient machine unlearning with minimax optimality
Xie, Jingyi, Zhang, Linjun, Li, Sai
There is a growing demand for efficient data removal to comply with regulations like the GDPR and to mitigate the influence of biased or corrupted data. This has motivated the field of machine unlearning, which aims to eliminate the influence of specific data subsets without the cost of full retraining. In this work, we propose a statistical framework for machine unlearning with generic loss functions and establish theoretical guarantees. For squared loss, especially, we develop Unlearning Least Squares (ULS) and establish its minimax optimality for estimating the model parameter of remaining data when only the pre-trained estimator, forget samples, and a small subsample of the remaining data are available. Our results reveal that the estimation error decomposes into an oracle term and an unlearning cost determined by the forget proportion and the forget model bias. We further establish asymptotically valid inference procedures without requiring full retraining. Numerical experiments and real-data applications demonstrate that the proposed method achieves performance close to retraining while requiring substantially less data access.
StrADiff: A Structured Source-Wise Adaptive Diffusion Framework for Linear and Nonlinear Blind Source Separation
This paper presents a Structured Source-Wise Adaptive Diffusion Framework for linear and nonlinear blind source separation. The framework interprets each latent dimension as a source component and assigns to it an individual adaptive diffusion mechanism, thereby establishing source-wise latent modeling rather than relying on a single shared latent prior. The resulting formulation learns source recovery and the mixing/reconstruction process jointly within a unified end-to-end objective, allowing model parameters and latent sources to adapt simultaneously during training. This yields a common framework for both linear and nonlinear blind source separation. In the present instantiation, each source is further equipped with its own adaptive Gaussian process (GP) prior to impose source-wise temporal structure on the latent trajectories, while the overall framework is not restricted to Gaussian process priors and can in principle accommodate other structured source priors. The proposed model thus provides a general structured diffusion-based route to unsupervised source recovery, with potential relevance beyond blind source separation to interpretable latent modeling, source-wise disentanglement, and potentially identifiable nonlinear latent-variable learning under appropriate structural conditions.
Lipschitz regularity in Flow Matching and Diffusion Models: sharp sampling rates and functional inequalities
Under general assumptions on the target distribution $p^\star$, we establish a sharp Lipschitz regularity theory for flow-matching vector fields and diffusion-model scores, with optimal dependence on time and dimension. As applications, we obtain Wasserstein discretization bounds for Euler-type samplers in dimension $d$: with $N$ discretization steps, the error achieves the optimal rate $\sqrt{d}/N$ up to logarithmic factors. Moreover, the constants do not deteriorate exponentially with the spatial extent of $p^\star$. We also show that the one-sided Lipschitz control yields a globally Lipschitz transport map from the standard Gaussian to $p^\star$, which implies Poincaré and log-Sobolev inequalities for a broad class of probability measures.
Hierarchical Contrastive Learning for Multimodal Data
Li, Huichao, Yu, Junhan, Zhou, Doudou
Multimodal representation learning is commonly built on a shared-private decomposition, treating latent information as either common to all modalities or specific to one. This binary view is often inadequate: many factors are shared by only subsets of modalities, and ignoring such partial sharing can over-align unrelated signals and obscure complementary information. We propose Hierarchical Contrastive Learning (HCL), a framework that learns globally shared, partially shared, and modality-specific representations within a unified model. HCL combines a hierarchical latent-variable formulation with structural sparsity and a structure-aware contrastive objective that aligns only modalities that genuinely share a latent factor. Under uncorrelated latent variables, we prove identifiability of the hierarchical decomposition, establish recovery guarantees for the loading matrices, and derive parameter estimation and excess-risk bounds for downstream prediction. Simulations show accurate recovery of hierarchical structure and effective selection of task-relevant components. On multimodal electronic health records, HCL yields more informative representations and consistently improves predictive performance.
Individual-heterogeneous sub-Gaussian Mixture Models
The classical Gaussian mixture model assumes homogeneity within clusters, an assumption that often fails in real-world data where observations naturally exhibit varying scales or intensities. To address this, we introduce the individual-heterogeneous sub-Gaussian mixture model, a flexible framework that assigns each observation its own heterogeneity parameter, thereby explicitly capturing the heterogeneity inherent in practical applications. Built upon this model, we propose an efficient spectral method that provably achieves exact recovery of the true cluster labels under mild separation conditions, even in high-dimensional settings where the number of features far exceeds the number of samples. Numerical experiments on both synthetic and real data demonstrate that our method consistently outperforms existing clustering algorithms, including those designed for classical Gaussian mixture models.
The Hiremath Early Detection (HED) Score: A Measure-Theoretic Evaluation Standard for Temporal Intelligence
We introduce the Hiremath Early Detection (HED) Score, a principled, measure-theoretic evaluation criterion for quantifying the time-value of information in systems operating over non-stationary stochastic processes subject to abrupt regime transitions. Existing evaluation paradigms, chiefly the ROC/AUC framework and its downstream variants, are temporally agnostic: they assign identical credit to a detection at t + 1 and a detection at t + tau for arbitrarily large tau. This indifference to latency is a fundamental inadequacy in time-critical domains including cyber-physical security, algorithmic surveillance, and epidemiological monitoring. The HED Score resolves this by integrating a baseline-neutral, exponentially decaying kernel over the posterior probability stream of a target regime, beginning precisely at the onset of the regime shift. The resulting scalar simultaneously encodes detection acuity, temporal lead, and pre-transition calibration quality. We prove that the HED Score satisfies three axiomatic requirements: (A1) Temporal Monotonicity, (A2) Invariance to Pre-Attack Bias, and (A3) Sensitivity Decomposability. We further demonstrate that the HED Score admits a natural parametric family indexed by the Hiremath Decay Constant (lambda_H), whose domain-specific calibration constitutes the Hiremath Standard Table. As an empirical vehicle, we present PARD-SSM (Probabilistic Anomaly and Regime Detection via Switching State-Space Models), which couples fractional Stochastic Differential Equations (fSDEs) with a Switching Linear Dynamical System (S-LDS) inference backend. On the NSL-KDD benchmark, PARD-SSM achieves a HED Score of 0.0643, representing a 388.8 percent improvement over a Random Forest baseline (0.0132), with statistical significance confirmed via block-bootstrap resampling (p < 0.001). We propose the HED Score as the successor evaluation standard to ROC/AUC.
Learning Nonlinear Regime Transitions via Semi-Parametric State-Space Models
We develop a semi-parametric state-space model for time-series data with latent regime transitions. Classical Markov-switching models use fixed parametric transition functions, such as logistic or probit links, which restrict flexibility when transitions depend on nonlinear and context-dependent effects. We replace this assumption with learned functions $f_0, f_1 \in \calH$, where $\calH$ is either a reproducing kernel Hilbert space or a spline approximation space, and define transition probabilities as $p_{jk,t} = \sigmoid(f(\bx_{t-1}))$. The transition functions are estimated jointly with emission parameters using a generalized Expectation-Maximization algorithm. The E-step uses the standard forward-backward recursion, while the M-step reduces to a penalized regression problem with weights from smoothed occupation measures. We establish identifiability conditions and provide a consistency argument for the resulting estimators. Experiments on synthetic data show improved recovery of nonlinear transition dynamics compared to parametric baselines. An empirical study on financial time series demonstrates improved regime classification and earlier detection of transition events.
Sequential Audit Sampling with Statistical Guarantees
Financial statement auditing is conducted under a risk-based evidence approach to obtain reasonable assurance. In practice, auditors often perform additional sampling or related procedures when an initial sample does not provide a sufficient basis for a conclusion. Across jurisdictions, current standards and practice manuals acknowledge such extensions, while the statistical design of sequential audit procedures has not been fully explored. This study formulates audit sampling with additional, sequentially collected items as a sequential testing problem for a finite population under sampling without replacement. We define null and alternative hypotheses in terms of a tolerable deviation rate, specify stopping and decision rules, and formulate exact sequential boundary conditions in terms of finite-population error probabilities. For practical implementation, we calibrate those boundaries by Monte Carlo simulation at least-favorable deviation rates. The exact design yields ex ante control of decision error probabilities, and the simulation-based implementation approximates that design while allowing the computation of expected stopping times. The framework is most naturally suited to attribute auditing and deviation-rate auditing, especially tests of controls, and it can be extended to one-sided, two-stage, and truncated designs.
Ensemble-Based Dirichlet Modeling for Predictive Uncertainty and Selective Classification
Franzen, Courtney, Pourkamali-Anaraki, Farhad
Neural network classifiers trained with cross-entropy loss achieve strong predictive accuracy but lack the capability to provide inherent predictive uncertainty estimates, thus requiring external techniques to obtain these estimates. In addition, softmax scores for the true class can vary substantially across independent training runs, which limits the reliability of uncertainty-based decisions in downstream tasks. Evidential Deep Learning aims to address these limitations by producing uncertainty estimates in a single pass, but evidential training is highly sensitive to design choices including loss formulation, prior regularization, and activation functions. Therefore, this work introduces an alternative Dirichlet parameter estimation strategy by applying a method of moments estimator to ensembles of softmax outputs, with an optional maximum-likelihood refinement step. This ensemble-based construction decouples uncertainty estimation from the fragile evidential loss design while also mitigating the variability of single-run cross-entropy training, producing explicit Dirichlet predictive distributions. Across multiple datasets, we show that the improved stability and predictive uncertainty behavior of these ensemble-derived Dirichlet estimates translate into stronger performance in downstream uncertainty-guided applications such as prediction confidence scoring and selective classification.