Goto

Collaborating Authors

 paradigm


Probabilistic Geometric Alignment via Bayesian Latent Transport for Domain-Adaptive Foundation Models

Aueawatthanaphisut, Aueaphum, Auewattanapisut, Kuepon

arXiv.org Machine Learning

Adapting large-scale foundation models to new domains with limited supervision remains a fundamental challenge due to latent distribution mismatch, unstable optimization dynamics, and miscalibrated uncertainty propagation. This paper introduces an uncertainty-aware probabilistic latent transport framework that formulates domain adaptation as a stochastic geometric alignment problem in representation space. A Bayesian transport operator is proposed to redistribute latent probability mass along Wasserstein-type geodesic trajectories, while a PAC-Bayesian regularization mechanism constrains posterior model complexity to mitigate catastrophic overfitting. The proposed formulation yields theoretical guarantees on convergence stability, loss landscape smoothness, and sample efficiency under distributional shift. Empirical analyses demonstrate substantial reduction in latent manifold discrepancy, accelerated transport energy decay, and improved covariance calibration compared with deterministic fine-tuning and adversarial domain adaptation baselines. Furthermore, bounded posterior uncertainty evolution indicates enhanced probabilistic reliability during cross-domain transfer. By establishing a principled connection between stochastic optimal transport geometry and statistical generalization theory, the proposed framework provides new insights into robust adaptation of modern foundation architectures operating in heterogeneous environments. These findings suggest that uncertainty-aware probabilistic alignment constitutes a promising paradigm for reliable transfer learning in next-generation deep representation systems.



DirectMulti-viewMulti-person3DPoseEstimation

Neural Information Processing Systems

Multi-view multi-person 3D pose estimation aims to localize 3D skeleton joints for each person instance in a scene from multi-view camera inputs. It is a fundamental task that benefits many real-world applications (such assurveillance, sportscast, gaming and mixed reality) and ismainly tackled byreconstruction-based [6,14,4]andvolumetric [40]approaches inpreviousliterature, as showninFig.1(a)and(b).


Learning to Edit Visual Programs with Self-Supervision

Neural Information Processing Systems

We design a system that learns how to edit visual programs. Our edit network consumes a complete input program and a visual target. From this input, we task our network with predicting a local edit operation that could be applied to the input program to improve its similarity to the target.




Learning to see the physical world: an interview with Jiajun Wu

AIHub

What is your research area? My research topic, at a high level, hasn't changed much since my dissertation. It has always been the problem of physical scene understanding - building machines that see, reason about, and interact with the physical world. Besides learning algorithms, what are the levels of abstraction needed by Al systems in their representations, and where do they come from? I aim to answer these fundamental questions, drawing inspiration from nature, i.e., the physical world itself, and from human cognition.