Goto

Collaborating Authors

 coefficient


Causality-Encoded Diffusion Models for Interventional Sampling and Edge Inference

Chen, Li, Shen, Xiaotong, Pan, Wei

arXiv.org Machine Learning

Diffusion models [1, 2, 3] have emerged as a powerful class of generative models, achieving state-of-the-art performance across a wide range of applications, including imaging [2] and scientific-data synthesis [4]. From a statistical perspective, they can be viewed as flexible nonparametric estimators of a (conditional) distribution via score estimation and reverse-time stochastic differential equations (SDEs) [5, 6]. Despite this expressive power, standard diffusion models are typically causality-agnostic: they learn a joint law without encoding the directional asymmetries required for causal interpretation. As a consequence, they do not, on their own, provide principled answers to interventional queries or support broader causal analyses, which are central to structural causal models (SCMs) [7]. When a causal ordering (or a directed acyclic graph) is available, it is natural to construct generative procedures that sample variables sequentially according to the causal factorisation. Such iterative, ordering-respecting approaches have been proposed using a variety of generative models, including generative adversarial networks [8], variational autoencoders [9], normalising flows [10], and diffusion-based constructions such as DDIM [11]. However, a rigorous statistical understandingof the advantages of exploitingsuch causalstructureand the inferential use of the resulting generator remain less developed.


Algebraic Invariants of Lightning Self-Attention

Alexandr, Yulia, Duan, Hao, Montúfar, Guido

arXiv.org Machine Learning

We study the polynomial coefficients of lightning self-attention as coordinates of an algebraic variety. We identify linear and nonlinear families of algebraic invariants, including Chow-type, low-rank, Veronese-type, and Sylvester resultant-based constraints.


A proposal for PU classification under Non-SCAR using clustering and logistic model

Furmanczyk, Konrad, Paczutkowski, Kacper

arXiv.org Machine Learning

The present study aims to investigate a cluster cleaning algorithm that is both computationally simple and capable of solving the PU classification when the SCAR condition is unsatisfied. A secondary objective of this study is to determine the robustness of the LassoJoint method to perturbations of the SCAR condition. In the first step of our algorithm, we obtain cleaning labels from 2-means clustering. Subsequently, we perform logistic regression on the cleaned data, assigning positive labels from the cleaning algorithm with additional true positive observations. The remaining observations are assigned the negative label. The proposed algorithm is evaluated by comparing 11 real data sets from machine learning repositories and a synthetic set. The findings obtained from this study demonstrate the efficacy of the clustering algorithm in scenarios where the SCAR condition is violated and further underscore the moderate robustness of the LassoJoint algorithm in this context.


Extraction of informative statistical features in the problem of forecasting time series generated by It{ô}-type processes

Korolev, Victor, Ivanov, Mikhail, Kukanova, Tatiana, Rukavitsa, Artyom, Vakshin, Alexander, Solomonov, Peter, Zeifman, Alexander

arXiv.org Machine Learning

In this paper, we consider the problem of extraction of most informative features from time series that are regarded as observed values of stochastic processes satisfying the It{ô} stochastic differential equations with unknown random drift and diffusion coefficients. We do not attract any additional information and use only the information contained in the time series as it is. Therefore, as additional features, we use the parameters of statistically adjusted mixture-type models of the observed regularities of the behavior of the time series. Several algorithms of construction of these parameters are discussed. These algorithms are based on statistical reconstruction of the coefficients which, in turn, is based on statistical separation of normal mixtures. We obtain two types of parameters by the techniques of the uniform and non-uniform statistical reconstruction of the coefficients of the underlying It{ô} process. The reconstructed coefficients obtained by uniform techniques do not depend on the current value of the process, while the non-uniform techniques reconstruct the coefficients with the account of their dependence on the value of the process. Actually, the non-uniform techniques used in this paper represent a stochastic analog of the Taylor expansion for the time series. The efficiency of the obtained additional features is compared by using them in the autoregressive algorithms of prediction of time series. In order to obtain pure conclusion that is not affected by unwanted factors, say, related to a special choice of the architecture of the neural network prediction methods, we used only simple autoregressive algorithms. We show that the use of additional statistical features improves the prediction.


Heat and Matérn Kernels on Matchings

Eremeev, Dmitry, Said, Salem, Borovitskiy, Viacheslav

arXiv.org Machine Learning

Applying kernel methods to matchings is challenging due to their discrete, non-Euclidean nature. In this paper, we develop a principled framework for constructing geometric kernels that respect the natural geometry of the space of matchings. To this end, we first provide a complete characterization of stationary kernels, i.e. kernels that respect the inherent symmetries of this space. Because the class of stationary kernels is too broad, we specifically focus on the heat and Matérn kernel families, adding an appropriate inductive bias of smoothness to stationarity. While these families successfully extend widely popular Euclidean kernels to matchings, evaluating them naively incurs a prohibitive super-exponential computational cost. To overcome this difficulty, we introduce and analyze a novel, sub-exponential algorithm leveraging zonal polynomials for efficient kernel evaluation. Finally, motivated by the known bijective correspondence between matchings and phylogenetic trees-a crucial data modality in biology-we explore whether our framework can be seamlessly transferred to the space of trees, establishing novel negative results and identifying a significant open problem.


Theta-regularized Kriging: Modelling and Algorithms

Xie, Xuelin, Lu, Xiliang

arXiv.org Machine Learning

To obtain more accurate model parameters and improve prediction accuracy, we proposed a regularized Kriging model that penalizes the hyperparameter theta in the Gaussian stochastic process, termed the Theta-regularized Kriging. We derived the optimization problem for this model from a maximum likelihood perspective. Additionally, we presented specific implementation details for the iterative process, including the regularized optimization algorithm and the geometric search cross-validation tuning algorithm. Three distinct penalty methods, Lasso, Ridge, and Elastic-net regularization, were meticulously considered. Meanwhile, the proposed Theta-regularized Kriging models were tested on nine common numerical functions and two practical engineering examples. The results demonstrate that, compared with other penalized Kriging models, the proposed model performs better in terms of accuracy and stability.


Cross-Spectral Witness for Hidden Nonequilibrium Beyond the Scalar Ceiling

Bi, Yuda, Calhoun, Vince D

arXiv.org Machine Learning

Partial observation is a pervasive obstacle in nonequilibrium physics: coarse graining may absorb hidden forcing into an apparently equilibrium-like reduced description, so a driven system can look reversible through the only variables one can measure. For scalar Gaussian observables of linear stochastic systems, no time-irreversibility statistic can detect the underlying drive. The Lucente--Crisanti ceiling constrains what one channel carries; what two channels carry is a different question, with a sharp closed-form answer. Two simultaneously observed channels retain an off-diagonal cross-spectral sector inaccessible to any scalar reduction; under channel-separable multiplicative structure the observed-channel response factors cancel identically, leaving a closed-form cross-spectral witness controlled only by the hidden spectrum, the loadings, and the innovation scales, strictly positive at every nonzero cross-coupling including at exact timescale coalescence where every scalar reduction is blind. Within general CSM this certifies shared hidden-sector drive; under the additional one-way coupling assumption the witness identifies the total entropy production rate at leading order with a square-root scaling.


Online Quantile Regression for Nonparametric Additive Models

Zhan, Haoran

arXiv.org Machine Learning

This paper introduces a projected functional gradient descent algorithm (P-FGD) for training nonparametric additive quantile regression models in online settings. This algorithm extends the functional stochastic gradient descent framework to the pinball loss. An advantage of P-FGD is that it does not need to store historical data while maintaining $O(J_t\ln J_t)$ computational complexity per step where $J_t$ denotes the number of basis functions. Besides, we only need $O(J_t)$ computational time for quantile function prediction at time $t$. These properties show that P-FGD is much better than the commonly used RKHS in online learning. By leveraging a novel Hilbert space projection identity, we also prove that the proposed online quantile function estimator (P-FGD) achieves the minimax optimal consistency rate $O(t^{-\frac{2s}{2s+1}})$ where $t$ is the current time and $s$ denotes the smoothness degree of the quantile function. Extensions to mini-batch learning are also established.


Data-Efficient Non-Gaussian Semi-Nonparametric Density Estimation for Nonlinear Dynamical Systems

Liao, Aaron R., Oguri, Kenshiro, Carpenter, Michele D.

arXiv.org Machine Learning

Accurate representation of non-Gaussian distributions of quantities of interest in nonlinear dynamical systems is critical for estimation, control, and decision-making, but can be challenging when forward propagations are expensive to carry out. This paper presents an approach for estimating probability density functions of states evolving under nonlinear dynamics using Seminonparametric (SNP), or Gallant-Nychka, densities. SNP densities employ a probabilists' Hermite polynomial basis to model non-Gaussian behavior and are positive everywhere on the support by construction. We use Monte Carlo to approximate the expectation integrals that arise in the maximum likelihood estimation of SNP coefficients, and introduce a convex relaxation to generate effective initial estimates. The method is demonstrated on density and quantile estimation for the chaotic Lorenz system. The results demonstrate that the proposed method can accurately capture non-Gaussian density structure and compute quantiles using significantly fewer samples than raw Monte Carlo sampling.


Are Stochastic Multi-objective Bandits Harder than Single-objective Bandits?

Guan, Changkun, Xu, Mengfan

arXiv.org Machine Learning

Multi-objective bandits have attracted increasing attention because of their broad applicability and mathematical elegance, where the reward of each arm is a multi-dimensional vector rather than a scalar. This naturally introduces Pareto order relations and Pareto regret. A long-standing question in this area is whether performance is fundamentally harder to optimize because of this added complexity. A recent surprising result shows that, in the adversarial setting, Pareto regret is no larger than classical regret; however, in the stochastic setting, where the regret notion is different, the picture remains unclear. In fact, existing work suggests that Pareto regret in the stochastic case increases with the dimensionality. This controversial yet subtle phenomenon motivates our central question: \emph{are multi-objective bandits actually harder than single-objective ones?} We answer this question in full by showing that, in the stochastic setting, Pareto regret is in fact governed by the maximum sub-optimality gap \(g^\dagger\), and hence by the minimum marginal regret of order \(Ω(\frac{K\log T}{g^\dagger})\). We further develop a new algorithm that achieves Pareto regret of order \(O(\frac{K\log T}{g^\dagger})\), and is therefore optimal. The algorithm leverages a nested two-layer uncertainty quantification over both arms and objectives through upper and lower confidence bound estimators. It combines a top-two racing strategy for arm selection with an uncertainty-greedy rule for dimension selection. Together, these components balance exploration and exploitation across the two layers. We also conduct comprehensive numerical experiments to validate the proposed algorithm, showing the desired regret guarantee and significant gains over benchmark methods.