Goto

Collaborating Authors

 overlap




Synthetic-to-Real Pose Estimation with Geometric Reconstruction Qiuxia Lin 1 Kerui Gu1 Linlin Y ang 2, 3 Angela Y ao 1 1

Neural Information Processing Systems

The warping estimation module W is based on an hourglass with five conv3 3 - bn - relu - pool2 2 in the encoders and five upsample2 2 - conv3 3 - bn - relu blocks in the decoders. In G, we use the Johnson architecture [ 3 ] with two down-sampling blocks, six residual-blocks and two up-sampling blocks. The design follows [ 7 ]. The inputs are the base image, displacement field, and inpainting map. It downsampled 4 and upsampled 4 to get the output, i.e. the reconstructed image.





Missing-Data-Induced Phase Transitions in Spectral PLS for Multimodal Learning

Gjølbye, Anders, Kargaard, Ida, Kargaard, Emma, Hansen, Lars Kai

arXiv.org Machine Learning

Partial Least Squares (PLS) learns shared structure from paired data via the top singular vectors of the empirical cross-covariance (PLS-SVD), but multimodal datasets often have missing entries in both views. We study PLS-SVD under independent entry-wise missing-completely-at-random masking in a proportional high-dimensional spiked model. After appropriate normalization, the masked cross-covariance behaves like a spiked rectangular random matrix whose effective signal strength is attenuated by $\sqrtρ$, where $ρ$ is the joint entry retention probability. As a result, PLS-SVD exhibits a sharp BBP-type phase transition: below a critical signal-to-noise threshold the leading singular vectors are asymptotically uninformative, while above it they achieve nontrivial alignment with the latent shared directions, with closed-form asymptotic overlap formulas. Simulations and semi-synthetic multimodal experiments corroborate the predicted phase diagram and recovery curves across aspect ratios, signal strengths, and missingness levels.


Demystifying Prediction Powered Inference

Song, Yilin, Kluger, Dan M., Parikh, Harsh, Gu, Tian

arXiv.org Machine Learning

Machine learning predictions are increasingly used to supplement incomplete or costly-to-measure outcomes in fields such as biomedical research, environmental science, and social science. However, treating predictions as ground truth introduces bias while ignoring them wastes valuable information. Prediction-Powered Inference (PPI) offers a principled framework that leverages predictions from large unlabeled datasets to improve statistical efficiency while maintaining valid inference through explicit bias correction using a smaller labeled subset. Despite its potential, the growing PPI variants and the subtle distinctions between them have made it challenging for practitioners to determine when and how to apply these methods responsibly. This paper demystifies PPI by synthesizing its theoretical foundations, methodological extensions, connections to existing statistics literature, and diagnostic tools into a unified practical workflow. Using the Mosaiks housing price data, we show that PPI variants produce tighter confidence intervals than complete-case analysis, but that double-dipping, i.e. reusing training data for inference, leads to anti-conservative confidence intervals and coverages. Under missing-not-at-random mechanisms, all methods, including classical inference using only labeled data, yield biased estimates. We provide a decision flowchart linking assumption violations to appropriate PPI variants, a summary table of selective methods, and practical diagnostic strategies for evaluating core assumptions. By framing PPI as a general recipe rather than a single estimator, this work bridges methodological innovation and applied practice, helping researchers responsibly integrate predictions into valid inference.


Biclustering Using Message Passing

Neural Information Processing Systems

Biclustering is the analog of clustering on a bipartite graph. Existent methods infer biclusters through local search strategies that find one cluster at a time; a common technique is to update the row memberships based on the current column memberships, and vice versa. We propose a biclustering algorithm that maximizes a global objective function using message passing. Our objective function closely approximates a general likelihood function, separating a cluster size penalty term into row-and column-count penalties. Because we use a global optimization framework, our approach excels at resolving the overlaps between biclusters, which are important features of biclusters in practice. Moreover, Expectation-Maximization can be used to learn the model parameters if they are unknown. In simulations, we find that our method outperforms two of the best existing biclustering algorithms, ISA and LAS, when the planted clusters overlap. Applied to three gene expression datasets, our method finds coregulated gene clusters that have high quality in terms of cluster size and density.


Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning

Neural Information Processing Systems

Accurate emotion perception is crucial for various applications, including human-computer interaction, education, and counseling.However, traditional single-modality approaches often fail to capture the complexity of real-world emotional expressions, which are inherently multimodal.