Goto

Collaborating Authors

 cusp


Evidence of Phase Transitions in Small Transformer-Based Language Models

Hong, Noah, Hong, Tao

arXiv.org Artificial Intelligence

Phase transitions have been proposed as the origin of emergent abilities in large language models (LLMs), where new capabilities appear abruptly once models surpass critical thresholds of scale. Prior work, such as that of Wei et al., demonstrated these phenomena under model and data scaling, with transitions revealed after applying a log scale to training compute. In this work, we ask three complementary questions: (1) Are phase transitions unique to large models, or can they also be observed in small transformer-based language models? (2) Can such transitions be detected directly in linear training space, rather than only after log rescaling? and (3) Can these transitions emerge at early stages of training? To investigate, we train a small GPT-style transformer on a character-level corpus and analyze the evolution of vocabulary usage throughout training. We track the average word length, the number of correct versus incorrect words, and shifts in vocabulary diversity. Building on these measures, we apply Poisson and sub-Poisson statistics to quantify how words connect and reorganize. This combined analysis reveals a distinct transition point during training. Notably, these transitions are not apparent in standard loss or validation curves, but become visible through our vocabulary- and statistics-based probes. Our findings suggest that phase-transition reorganizations are a general feature of language model training, observable even in modest models, detectable directly in linear training space, and occurring surprisingly early as coherence emerges. This perspective provides new insight into the nonlinear dynamics of language model training and underscores the importance of tailored metrics for uncovering phase transition behaviors


e6ff107459d435e38b54ad4c06202c33-Supplemental.pdf

Neural Information Processing Systems

Supplementary Material: Can we have it all? This section provides the proof for Proposition 1. We now prove Theorem 2 on the trade-off between spatial and adversarial robustness. Now if the adversarial robustness (i.e., the LHS above) is at least The other side of the trade-off can be proved similarly. F or m = d/ 2, this accuracy is as bad as that of a random classifier .



Spectro-Riemannian Graph Neural Networks

Grover, Karish, Yu, Haiyang, Song, Xiang, Zhu, Qi, Xie, Han, Ioannidis, Vassilis N., Faloutsos, Christos

arXiv.org Machine Learning

Can integrating spectral and curvature signals unlock new potential in graph representation learning? Non-Euclidean geometries, particularly Riemannian manifolds such as hyperbolic (negative curvature) and spherical (positive curvature), offer powerful inductive biases for embedding complex graph structures like scale-free, hierarchical, and cyclic patterns. Meanwhile, spectral filtering excels at processing signal variations across graphs, making it effective in homophilic and heterophilic settings. Leveraging both can significantly enhance the learned representations. To this end, we propose Spectro-Riemannian Graph Neural Networks (CUSP) - the first graph representation learning paradigm that unifies both CUrvature (geometric) and SPectral insights. CUSP is a mixed-curvature spectral GNN that learns spectral filters to optimize node embeddings in products of constant-curvature manifolds (hyperbolic, spherical, and Euclidean). Specifically, CUSP introduces three novel components: (a) Cusp Laplacian, an extension of the traditional graph Laplacian based on Ollivier-Ricci curvature, designed to capture the curvature signals better; (b) Cusp Filtering, which employs multiple Riemannian graph filters to obtain cues from various bands in the eigenspectrum; and (c) Cusp Pooling, a hierarchical attention mechanism combined with a curvature-based positional encoding to assess the relative importance of differently curved substructures in our graph. Empirical evaluation across eight homophilic and heterophilic datasets demonstrates the superiority of CUSP in node classification and link prediction tasks, with a gain of up to 5.3% over state-of-the-art models.


Machine Learning Gravity Compactifications on Negatively Curved Manifolds

De Luca, G. Bruno

arXiv.org Artificial Intelligence

In theories with extra dimensions, four-dimensional vacua are non-trivial even classically. In fact, even a configuration that looks like a vacuum in four dimensions can have (and often has) a very rich structure for the geometry and the matter fields in the extra dimensions. The study of these structures, which are simultaneously rich and highly constrained by the UV completion of the theory, is important to extract the fourdimensional physics, as well as for holographic approaches to quantum gravity. As we will discuss in detail below, the problem of directly solving the equations of motion for vacuum compactifications is computationally challenging, and a popular approach to bypass this challenge is to exploit supersymmetry in some form. This is the case, for example, of the famous KKLT proposal [1] for obtaining de Sitter vacua, which uses as starting point a supersymmetric Calabi-Yau compactification, on top of which supersymmetry-breaking effects are added. But this is true also for AdS: most of the explicitly known AdS compactifications of string/M-theory are either supersymmetric, obtained by starting from supersymmetric compactifications and turning on supersymmetry breaking effects, or as non-supersymmetric vacua of lower-dimensional supergravities obtained from reduction around supersymmetry vacua.


Cat got your tongue? How AI could is on cusp of breakthrough that'd allow people and ANIMALS to talk to each other in '12 to 36 months'

Daily Mail - Science & tech

It sounds like the plot of a new Disney movie, but experts predict AI will allow people to communicate with household pets and even wild animals. Researchers around the world are using'digital bioacoustics' - tiny, portable, digital recorders - to capture the sounds, tics and behaviors of animals that are too quiet or nuanced for humans to pick up on. These databases will be used train artificial intelligence to decipher these miniature communications and translate them into something more comprehendible to us, almost like a'ChatGPT for animals'. Projects such as the Earth Species Project expect a breakthrough in the next 12 to 36 months. Founded in 2017, the AI non-profit aims to record, understand and'talk back' to animals - from cats and dogs to more unusual species such as whales and crows.


AI revolution puts skilled jobs at highest risk, OECD says

The Guardian

Major economies are on the "cusp of an AI revolution" that could trigger job losses in skilled professions such as law, medicine and finance, according to an influential international organisation. The Organisation for Economic Co-operation and Development (OECD) said the occupations at highest risk from AI-driven automation were highly skilled jobs and represented about 27% of employment across its 38 member countries, which include the UK, Japan, Germany, the US, Australia and Canada. The body said it was "clear that the potential for [AI-driven jobs] substitution remains significant, raising fears of decreasing wages and job losses". However, it added that for the time being AI was changing jobs rather than replacing them. "Occupations in finance, medicine and legal activities which often require many years of education, and whose core functions rely on accumulated experience to reach decisions, may suddenly find themselves at risk of automation from AI," said the OECD.


They "Cloned" Bruce Willis. Who's Next?

Slate

Getting digitally cloned was easier than Devin Finley expected it to be. The voice-over artist, who also works as a model and bar manager, entered a studio in Manhattan last spring and read a script from a teleprompter. Across the room, a man with a large camera working for Hour One, a Tel Aviv–based video agency specializing in providing clients with lifelike virtual humans, filmed Finley from the waist up. Over Zoom, a director offered instructions about how much to move his hands. He was done in less than an hour.


Parallel curves of cubic Béziers

#artificialintelligence

The problem of parallel or offset curves has remained challenging for a long time. Parallel curves have applications in 2D graphics (for drawing strokes and also adding weight to fonts), and also robotic path planning and manufacturing, among others. The exact offset curve of a cubic Bézier can be described (it is an analytic curve of degree 10) but it not tractable to work with. Thus, in practice the approach is almost always to compute an approximation to the true parallel curve. A single cubic Bézier might not be a good enough approximation to the parallel curve of the source cubic Bézier, so in those cases it is sudivided into multiple Bézier segments. A number of algorithms have been published, of varying quality.


With AI And 5G, We're On The Cusp Of A New Era In Innovation

#artificialintelligence

While each revolutionizes sectors and creates new experiences on its own, the combination of 5G and AI will be really disruptive. On-device computing, the edge cloud, and 5G work together to form a ubiquitous connectivity fabric of smart devices and services. This point of convergence is critical to our concept of the intelligent wireless edge. The commercial deployment of 5G has begun. But, to put it another way, 5G isn't just another G. It's a total ecosystem shift in how networks are managed and administered, as well as how apps function on them.