definitionc
Stepwise Variational Inference with Vine Copulas
Griesbauer, Elisabeth, Rønneberg, Leiv, Frigessi, Arnoldo, Czado, Claudia, Haff, Ingrid Hobæk
We propose stepwise variational inference (VI) with vine copulas: a universal VI procedure that combines vine copulas with a novel stepwise estimation procedure of the variational parameters. Vine copulas consist of a nested sequence of trees built from copulas, where more complex latent dependence can be modeled with increasing number of trees. We propose to estimate the vine copula approximate posterior in a stepwise fashion, tree by tree along the vine structure. Further, we show that the usual backward Kullback-Leibler divergence cannot recover the correct parameters in the vine copula model, thus the evidence lower bound is defined based on the Rényi divergence. Finally, an intuitive stopping criterion for adding further trees to the vine eliminates the need to pre-define a complexity parameter of the variational distribution, as required for most other approaches. Thus, our method interpolates between mean-field VI (MFVI) and full latent dependence. In many applications, in particular sparse Gaussian processes, our method is parsimonious with parameters, while outperforming MFVI.
LassoFlexNet: Flexible Neural Architecture for Tabular Data
Lui, Kry Yik Chau, Chi, Cheng, Basu, Kishore, Cao, Yanshuai
Despite their dominance in vision and language, deep neural networks often underperform relative to tree-based models on tabular data. To bridge this gap, we incorporate five key inductive biases into deep learning: robustness to irrelevant features, axis alignment, localized irregularities, feature heterogeneity, and training stability. We propose \emph{LassoFlexNet}, an architecture that evaluates the linear and nonlinear marginal contribution of each input via Per-Feature Embeddings, and sparsely selects relevant variables using a Tied Group Lasso mechanism. Because these components introduce optimization challenges that destabilize standard proximal methods, we develop a \emph{Sequential Hierarchical Proximal Adaptive Gradient optimizer with exponential moving averages (EMA)} to ensure stable convergence. Across $52$ datasets from three benchmarks, LassoFlexNet matches or outperforms leading tree-based models, achieving up to a $10$\% relative gain, while maintaining Lasso-like interpretability. We substantiate these empirical results with ablation studies and theoretical proofs confirming the architecture's enhanced expressivity and structural breaking of undesired rotational invariance.
ASimpleApproachtoAutomatedSpectralClustering Appendices
Let ˆc be the optimal solution of minimizec 12kφ(y) φ(X)ck2 + λ2kck2, where φ is induced by Gaussian kernel and y is arbitrary. It is worth noting that Algorithm 1 can be easily implemented parallelly, which will reduce the time complexity to O(max(m,r)n2 +kmn). Denote vi = (vi1,...,vin) the i-th row of V and let vi = (vi1,...,vid), where d < n. Clustering the columns of X given by Definition C.1 according to the polynomials is actually a manifold clustering problem beyond the setting of subspaceclustering. The following theorem verifies the effectiveness of (15) followed by the truncation operation in manifolddetection. 2 TheoremC.3.