bst
Diffusion Models Are Statistically Optimal for Learning Low-Dimensional Multi-Modal Distributions
Score-based diffusion models have demonstrated remarkable empirical success in learning high-dimensional distributions, particularly those exhibiting low-dimensional and multi-modal structures. However, theoretical understanding of their statistical efficiency remains limited. Existing theories typically rely on strong regularity assumptions, such as uniformly bounded densities or globally smooth score functions, which fail to capture such intrinsic structures. In this work, we study the sample complexity of diffusion models for learning distributions supported on a union of low-dimensional subspaces. Assuming that the data distribution within each subspace is subgaussian, we show that diffusion models require at most $\widetilde{O}(\varepsilon^{-k \vee 2})$ samples to achieve $\varepsilon$ error in 1-Wasserstein distance, where $k$ is the intrinsic dimension. This near-optimal convergence rate depends only on the intrinsic dimension and significantly improves upon prior theoretical guarantees that suffer from the curse of dimensionality. Notably, our analysis applies to a broad collection of distributions without imposing smoothness, bounded-density, or log-concavity assumptions. Overall, our results show that diffusion models can statistically adapt to intrinsic low-dimensional structure while naturally accommodating multi-modal data, offering a rigorous theoretical justification for their success in complex high-dimensional learning tasks.
Sutton's predictions v Only The Poets frontman Tommy Longhurst
The 197th Manchester derby takes place at Etihad Stadium on Sunday, but will it be the Blues or the Reds who claim the points - and local bragging rights? This is so hard to call, for many reasons, said BBC Sport football expert Chris Sutton. Manchester United could be buoyed by their win over Burnley before the international break, but I actually have bigger doubts about what we will see from Manchester City after seeing them capitulate the way they did against Brighton. Sutton is making predictions for all 380 Premier League games this season, against AI, BBC Sport readers and a variety of guests. For week four, he takes on Only The Poets frontman Tommy Longhurst. The Reading band are charging £1 a ticket when they play the O2 Academy Brixton, in February 2026.
From new Call of Duty to Star Wars Outlaws, it's a massive few days for game reveals
For the best part of 15 years, every June I would get on a plane to Los Angeles to cover E3. It was the giant video games conference where most of the major games and consoles of the past few decades were first shown, from the PlayStation to the Wii U, Fallout 4 to Final Fantasy VII Remake. Alas, the pandemic killed E3, and so this year we have a cluster of loosely affiliated and competing events instead: Summer Game Fest, run by Geoff Keighley of the Game Awards; the Xbox Games Showcase; indie-driven event Day of the Devs and many more. It all kicks off tomorrow, 6 June. The Guardian's journalism is independent.
Block-State Transformers
Fathi, Mahan, Pilault, Jonathan, Firat, Orhan, Pal, Christopher, Bacon, Pierre-Luc, Goroshin, Ross
State space models (SSMs) have shown impressive results on tasks that require modeling long-range dependencies and efficiently scale to long sequences owing to their subquadratic runtime complexity. Originally designed for continuous signals, SSMs have shown superior performance on a plethora of tasks, in vision and audio; however, SSMs still lag Transformer performance in Language Modeling tasks. In this work, we propose a hybrid layer named Block-State Transformer (BST), that internally combines an SSM sublayer for long-range contextualization, and a Block Transformer sublayer for short-term representation of sequences. We study three different, and completely parallelizable, variants that integrate SSMs and block-wise attention. We show that our model outperforms similar Transformer-based architectures on language modeling perplexity and generalizes to longer sequences. In addition, the Block-State Transformer demonstrates more than tenfold increase in speed at the layer level compared to the Block-Recurrent Transformer when model parallelization is employed.