Goto

Collaborating Authors

 infinity



A New Bridge Links the Strange Math of Infinity to Computer Science

WIRED

Descriptive set theorists study the niche mathematics of infinity. Now, they've shown that their problems can be rewritten in the concrete language of algorithms. All of modern mathematics is built on the foundation of set theory, the study of how to organize abstract collections of objects. But in general, research mathematicians don't need to think about it when they're solving their problems. They can take it for granted that sets behave the way they'd expect, and carry on with their work. Descriptive set theorists are an exception. This small community of mathematicians never stopped studying the fundamental nature of sets--particularly the strange infinite ones that other mathematicians ignore. Their field just got a lot less lonely. In 2023, a mathematician named Anton Bernshteyn published a deep and surprising connection between the remote mathematical frontier of descriptive set theory and modern computer science.


Implicit Bias of Mirror Flow on Separable Data

Neural Information Processing Systems

We examine the continuous-time counterpart of mirror descent, namely mirror flow, on classification problems which are linearly separable. Such problems are minimised'at infinity' and have many possible solutions; we study which solution is preferred by the algorithm depending on the mirror potential. For exponential tailed losses and under mild assumptions on the potential, we show that the iterates converge in direction towards a $\phi_\infty$-maximum margin classifier. The function $\phi_\infty$ is the horizon function of the mirror potential and characterises its shape'at infinity'. When the potential is separable, a simple formula allows to compute this function. We analyse several examples of potentials and provide numerical experiments highlighting our results.



Dynamical Properties of Tokens in Self-Attention and Effects of Positional Encoding

Pham, Duy-Tung, Nguyen, An The, Tran, Viet-Hoang, Chung, Nhan-Phu, Tong, Xin T., Nguyen, Tan M., Vo, Thieu N.

arXiv.org Artificial Intelligence

This paper investigates the dynamical properties of tokens in pre-trained Transformer models and explores their application to improving Transformers. To this end, we analyze the dynamical system governing the continuous-time limit of the pre-trained model and characterize the asymptotic behavior of its solutions. Specifically, we characterize when tokens move closer to or farther from one another over time, depending on the model parameters. We provide sufficient conditions, based on these parameters, to identify scenarios where tokens either converge to zero or diverge to infinity. Unlike prior works, our conditions are broader in scope and more applicable to real-world models. Furthermore, we investigate how different forms of positional encoding -- specifically absolute and rotary -- affect these dynamical regimes. Empirical evidence reveals that the convergence scenario adversely impacts model performance. Motivated by these insights, we propose simple refinements to Transformer architectures that mitigate convergence behavior in models with absolute or rotary positional encoding. These findings support theoretical foundations and design principles for improving Transformer models.