Goto

Collaborating Authors

 mag


The Geometric Alignment Tax: Tokenization vs. Continuous Geometry in Scientific Foundation Models

Raju, Prashant C.

arXiv.org Machine Learning

Foundation models for biology and physics optimize predictive accuracy, but their internal representations systematically fail to preserve the continuous geometry of the systems they model. We identify the root cause: the Geometric Alignment Tax, an intrinsic cost of forcing continuous manifolds through discrete categorical bottlenecks. Controlled ablations on synthetic dynamical systems demonstrate that replacing cross-entropy with a continuous head on an identical encoder reduces geometric distortion by up to 8.5x, while learned codebooks exhibit a non-monotonic double bind where finer quantization worsens geometry despite improving reconstruction. Under continuous objectives, three architectures differ by 1.3x; under discrete tokenization, they diverge by 3,000x. Evaluating 14 biological foundation models with rate-distortion theory and MINE, we identify three failure regimes: Local-Global Decoupling, Representational Compression, and Geometric Vacuity. A controlled experiment confirms that Evo 2's reverse-complement robustness on real DNA reflects conserved sequence composition, not learned symmetry. No model achieves simultaneously low distortion, high mutual information, and global coherence.


Complete Causal Identification from Ancestral Graphs under Selection Bias

Chen, Leihao, Mooij, Joris M.

arXiv.org Machine Learning

Many causal discovery algorithms, including the celebrated FCI algorithm, output a Partial Ancestral Graph (PAG). PAGs serve as an abstract graphical representation of the underlying causal structure, modeled by directed acyclic graphs with latent and selection variables. This paper develops a characterization of the set of extended-type conditional independence relations that are invariant across all causal models represented by a PAG. This theory allows us to formulate a general measure-theoretic version of Pearl's causal calculus and a sound and complete identification algorithm for PAGs under selection bias. Our results also apply when PAGs are learned by certain algorithms that integrate observational data with experimental data and incorporate background knowledge.





CollaborativeCausalDiscovery withAtomicInterventions

Neural Information Processing Systems

Asinterventions areexpensive(require carefully controlled experiments) andperforming multiple interventions is time-consuming, an important goal in causal discovery is to design algorithms that utilize simple (preferably, single variable) and fewer interventions [Shanmugam et al.,2015]. However, when there are latents or unobserved variables in the system, in the worst-case, it is not possible to learn the exact causal DAG without intervening on every variable at least once.