deviation
tBayes-MICE: A Bayesian Approach to Multiple Imputation for Time Series Data
Ibenegbu, Amuche, de Micheaux, Pierre Lafaye, Chandra, Rohitash
Time-series analysis is often affected by missing data, a common problem across several fields, including healthcare and environmental monitoring. Multiple Imputation by Chained Equations (MICE) has been prominent for imputing missing values through "fully conditional specification". We extend MICE using the Bayesian framework (tBayes-MICE), utilising Bayesian inference to impute missing values via Markov Chain Monte Carlo (MCMC) sampling to account for uncertainty in MICE model parameters and imputed values. We also include temporally informed initialisation and time-lagged features in the model to respect the sequential nature of time-series data. We evaluate the tBayes-MICE method using two real-world datasets (AirQuality and PhysioNet), and using both the Random Walk Metropolis (RWM) and the Metropolis-Adjusted Langevin Algorithm (MALA) samplers. Our results demonstrate that tBayes-MICE reduces imputation errors relative to the baseline methods over all variables and accounts for uncertainty in the imputation process, thereby providing a more accurate measure of imputation error. We also found that MALA mixed better than RWM across most variables, achieving comparable accuracy while providing more consistent posterior exploration. Overall, these findings suggest that the tBayes-MICE framework represents a practical and efficient approach to time-series imputation, balancing increased accuracy with meaningful quantification of uncertainty in various environmental and clinical settings.
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- Oceania > Australia > New South Wales (0.04)
- Europe > Italy (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)
Identification and Inference in Nonlinear Dynamic Network Models
We study identification and inference in nonlinear dynamic systems defined on unknown interaction networks. The system evolves through an unobserved dependence matrix governing cross-sectional shock propagation via a nonlinear operator. We show that the network structure is not generically identified, and that identification requires sufficient spectral heterogeneity. In particular, identification arises when the network induces non-exchangeable covariance patterns through heterogeneous amplification of eigenmodes. When the spectrum is concentrated, dependence becomes observationally equivalent to common shocks or scalar heterogeneity, leading to non-identification. We provide necessary and sufficient conditions for identification, characterize observational equivalence classes, and propose a semiparametric estimator with asymptotic theory. We also develop tests for network dependence whose power depends on spectral properties of the interaction matrix. The results apply to a broad class of economic models, including production networks, contagion models, and dynamic interaction systems.
Adaptive Gaussian Process Search for Simulation-Based Sample Size Estimation in Clinical Prediction Models: Validation of the pmsims R Package
Olaniran, Oyebayo Ridwan, Shamsutdinova, Diana, Markham, Sarah, Zimmer, Felix, Stahl, Daniel, Forbes, Gordon, Carr, Ewan
Background: Determining an adequate sample size is essential for developing reliable and generalisable clinical prediction models, yet practical guidance on selecting appropriate methods remains limited. Existing analytical and simulation-based approaches often rely on restrictive assumptions and focus on mean-based criteria. We present and validate pmsims, an R package that uses Gaussian process surrogate modelling to provide a flexible and computationally efficient simulation-based framework for sample size determination across diverse prediction settings. Methods: We conducted a comprehensive simulation study with two aims. First, we compared three search engines implemented in pmsims: a Gaussian process-based adaptive method, a deterministic bisection method, and a hybrid approach, across binary, continuous, and survival outcomes. Second, we benchmarked the best-performing pmsims engine against existing analytical (pmsampsize) and simulation-based (samplesizedev) methods, evaluating recommended sample sizes, computational time, and achieved performance on large independent validation datasets. Results: The Gaussian process-based method consistently produced the most stable sample size estimates, particularly in low-signal, high-dimensional settings. In benchmarking, pmsims achieved performance close to prespecified targets across all outcome types, matching simulation-based approaches and outperforming analytical methods in more challenging scenarios. Conclusions: pmsims provides an efficient and flexible framework for principled sample size planning in clinical prediction modelling, requiring fewer model evaluations than non-adaptive simulation approaches.
- Research Report > New Finding (0.94)
- Research Report > Experimental Study (0.93)
Chain-of-Trajectories: Unlocking the Intrinsic Generative Optimality of Diffusion Models via Graph-Theoretic Planning
Chen, Ping, Liu, Xiang, Zhang, Xingpeng, Shen, Fei, Gong, Xun, Liu, Zhaoxiang, Chen, Zezhou, Hu, Huan, Wang, Kai, Lian, Shiguo
Diffusion models operate in a reflexive System 1 mode, constrained by a fixed, content-agnostic sampling schedule. This rigidity arises from the curse of state dimensionality, where the combinatorial explosion of possible states in the high-dimensional noise manifold renders explicit trajectory planning intractable and leads to systematic computational misallocation. To address this, we introduce Chain-of-Trajectories (CoTj), a train-free framework enabling System 2 deliberative planning. Central to CoTj is Diffusion DNA, a low-dimensional signature that quantifies per-stage denoising difficulty and serves as a proxy for the high-dimensional state space, allowing us to reformulate sampling as graph planning on a directed acyclic graph. Through a Predict-Plan-Execute paradigm, CoTj dynamically allocates computational effort to the most challenging generative phases. Experiments across multiple generative models demonstrate that CoTj discovers context-aware trajectories, improving output quality and stability while reducing redundant computation. This work establishes a new foundation for resource-aware, planning-based diffusion modeling. The code is available at https://github.com/UnicomAI/CoTj.
- Asia > China (0.04)
- North America > United States > New York (0.04)
- Asia > Singapore (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany (0.04)
- North America > United States > California (0.04)
- (5 more...)
- North America > Canada > Alberta (0.14)
- North America > United States > Texas (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Virginia (0.04)
- (3 more...)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- (2 more...)
- Information Technology > Game Theory (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Mathematics of Computing (0.64)
- Information Technology > Data Science > Data Mining > Big Data (0.46)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
- North America > United States > New York (0.04)
- (5 more...)
- Information Technology > Game Theory (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.46)
Supplement to " Estimating Riemannian Metric with Noise-Contaminated Intrinsic Distance "
Unlike distance metric learning where the subsequent tasks utilizing the estimated distance metric is the usual focus, the proposal focuses on the estimated metric characterizing the geometry structure. Despite the illustrated taxi and MNIST examples, it is still open to finding more compelling applications that target the data space geometry. Interpreting mathematical concepts such as Riemannian metric and geodesic in the context of potential application (e.g., cognition and perception research where similarity measures are common) could be inspiring. Our proposal requires sufficiently dense data, which could be demanding, especially for high-dimensional data due to the curse of dimensionality. Dimensional reduction (e.g., manifold embedding as in the MNIST example) can substantially alleviate the curse of dimensionality, and the dense data requirement will more likely hold true.
- Europe > Austria > Vienna (0.14)
- North America > United States > New York > Richmond County > New York City (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)