Goto

Collaborating Authors

 mspe


Integrative Learning of Dynamically Evolving Multiplex Graphs and Nodal Attributes Using Neural Network Gaussian Processes with an Application to Dynamic Terrorism Graphs

Rodriguez-Acosta, Jose, Guha, Sharmistha, Patel, Lekha, Shuler, Kurtis

arXiv.org Machine Learning

Exploring the dynamic co-evolution of multiplex graphs and nodal attributes is a compelling question in criminal and terrorism networks. This article is motivated by the study of dynamically evolving interactions among prominent terrorist organizations, considering various organizational attributes like size, ideology, leadership, and operational capacity. Statistically principled integration of multiplex graphs with nodal attributes is significantly challenging due to the need to leverage shared information within and across layers, account for uncertainty in predicting unobserved links, and capture temporal evolution of node attributes. These difficulties increase when layers are partially observed, as in terrorism networks where connections are deliberately hidden to obscure key relationships. To address these challenges, we present a principled methodological framework to integrate the multiplex graph layers and nodal attributes. The approach employs time-varying stochastic latent factor models, leveraging shared latent factors to capture graph structure and its co-evolution with node attributes. Latent factors are modeled using Gaussian processes with an infinitely wide deep neural network-based covariance function, termed neural network Gaussian processes (NN-GP). The NN-GP framework on latent factors exploits the predictive power of Bayesian deep neural network architecture while propagating uncertainty for reliability. Simulation studies highlight superior performance of the proposed approach in achieving inferential objectives. The approach, termed as dynamic joint learner, enables predictive inference (with uncertainty) of diverse unobserved dynamic relationships among prominent terrorist organizations and their organization-specific attributes, as well as clustering behavior in terms of friend-and-foe relationships, which could be informative in counter-terrorism research.



MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any Resolution

Neural Information Processing Systems

Although Vision Transformers (ViTs) have recently advanced computer vision tasks significantly, an important real-world problem was overlooked: adapting to variable input resolutions. Typically, images are resized to a fixed resolution, such as 224x224, for efficiency during training and inference. However, uniform input size conflicts with real-world scenarios where images naturally vary in resolution.



Supplementary Materials of "BAST: Bayesian Additive Regression Spanning Trees for Complex Constrained Domain "

Neural Information Processing Systems

These appendices provide supplementary details and results of BAST. Appendix A contains additional details on Bayesian estimation and prediction. Prediction at u is then performed as stated in Section 3.2. The experiment setup is the same as in Section 4.1 Table S3 shows the performance of BAST and BART using the hyperparameters chosen by CV (referred to as BAST -cv and BART -cv, respectively). As a benchmark, the performance metrics for BAST and BART using the hyperparameters in Section 4.1 are also included (referred to as Standard errors are given in parentheses.


MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any Resolution

Neural Information Processing Systems

Although Vision Transformers (ViTs) have recently advanced computer vision tasks significantly, an important real-world problem was overlooked: adapting to variable input resolutions. Typically, images are resized to a fixed resolution, such as 224x224, for efficiency during training and inference. However, uniform input size conflicts with real-world scenarios where images naturally vary in resolution. In this work, we propose to enhance the model adaptability to resolution variation by optimizing the patch embedding. The proposed method, called Multi-Scale Patch Embedding (MSPE), substitutes the standard patch embedding with multiple variable-sized patch kernels and selects the best parameters for different resolutions, eliminating the need to resize the original image. Our method does not require high-cost training or modifications to other parts, making it easy to apply to most ViT models.


Improving the Convergence Rates of Forward Gradient Descent with Repeated Sampling

Dexheimer, Niklas, Schmidt-Hieber, Johannes

arXiv.org Artificial Intelligence

Forward gradient descent (FGD) has been proposed as a biologically more plausible alternative of gradient descent as it can be computed without backward pass. Considering the linear model with $d$ parameters, previous work has found that the prediction error of FGD is, however, by a factor $d$ slower than the prediction error of stochastic gradient descent (SGD). In this paper we show that by computing $\ell$ FGD steps based on each training sample, this suboptimality factor becomes $d/(\ell \wedge d)$ and thus the suboptimality of the rate disappears if $\ell \gtrsim d.$ We also show that FGD with repeated sampling can adapt to low-dimensional structure in the input distribution. The main mathematical challenge lies in controlling the dependencies arising from the repeated sampling process.


Invariant Subspace Decomposition

Lazzaretto, Margherita, Peters, Jonas, Pfister, Niklas

arXiv.org Machine Learning

We consider the task of predicting a response Y from a set of covariates X in settings where the conditional distribution of Y given X changes over time. For this to be feasible, assumptions on how the conditional distribution changes over time are required. Existing approaches assume, for example, that changes occur smoothly over time so that short-term prediction using only the recent past becomes feasible. In this work, we propose a novel invariance-based framework for linear conditionals, called Invariant Subspace Decomposition (ISD), that splits the conditional distribution into a time-invariant and a residual time-dependent component. As we show, this decomposition can be utilized both for zero-shot and time-adaptation prediction tasks, that is, settings where either no or a small amount of training data is available at the time points we want to predict Y at, respectively. We propose a practical estimation procedure, which automatically infers the decomposition using tools from approximate joint matrix diagonalization. Furthermore, we provide finite sample guarantees for the proposed estimator and demonstrate empirically that it indeed improves on approaches that do not use the additional invariant structure.


Supplementary Materials of "BAST: Bayesian Additive Regression Spanning Trees for Complex Constrained Domain "

Neural Information Processing Systems

These appendices provide supplementary details and results of BAST. Appendix A contains additional details on Bayesian estimation and prediction. Supplementary simulation details and results including hyperparameter tuning and computation time can be found in Appendix B. Finally, Appendix C provides the proof of Proposition 1. Appendix A.1 Estimation This appendix provides details on the Markov chain Monte Carlo (MCMC) algorithm discussed in Section 3.1. This probability specification works well in our experiments, but one can modify it if desired. Appendix A.2 Prediction in Two-dimensional Constrained Domains In this subsection we provide details on specifying the neighbor set N To sample the cluster membership of u, we need to determine the cluster memberships for vertices on the domain boundary, which can be done by, for instance, assigning a boundary vertex to the same cluster as its nearest vertex in S with respect to the graph distance in the CDT mesh (when the number of vertices in the CDT graph is large, we expect this to well approximate the geodesic distance).


Sparse high-dimensional linear mixed modeling with a partitioned empirical Bayes ECM algorithm

Zgodic, Anja, Bai, Ray, Zhang, Jiajia, McLain, Alexander C.

arXiv.org Machine Learning

While high-dimensional data has been ubiquitous for some time, the use of longitudinal high-dimensional data or grouped (clustered) high-dimensional data has been recently increasing in research. For example, some genetic studies gather gene expression levels for an individual on multiple occasions in response to an exposure over time (Banchereau et al., 2016). Other ongoing studies - like the UK Biobank and the Adolescent Brain Cognitive Development Study - collect high-dimensional genetic/imaging information longitudinally to learn how individual changes in these markers are related to outcomes (Cole, 2020; Saragosa-Harris et al., 2022). Such data usually violates the traditional linear regression assumption that observations are independently and identically distributed. Data analysis should account for the dependence between observations belonging to the same individual. For the low dimensional setting where n p, extensive methodology is available for handling such data structures, e.g., linear mixed models (LMMs). The fields of LMMs and high-dimensional linear regression have extensive bodies of literature. However, they are largely separate, with a very narrow body of literature existing at the intersection of LMMs and high-dimensional longitudinal data (where p n). Unlike low-dimensional (p n) LMMs for which restricted maximum likelihood (REML) methods are readily available, fitting high-dimensional LMMs is considerably more challenging due to the non-convexity of the optimization function, which requires the inversion of large matrices in addition to iterative approaches. The few available methods for highdimensional LMMs rely on sparsity-inducing penalizations (e.g.