Education
LOSCAR-SGD: Local SGD with Communication-Computation Overlap and Delay-Corrected Sparse Model Averaging
Maziane, Yassine, Mahran, Ammar, Maranjyan, Artavazd, Richtárik, Peter
Communication is a major bottleneck in distributed learning, especially in large-scale settings and in federated learning environments with slow links. Three standard ways to reduce this cost are communication compression, local training, and communication-computation overlap. Methods that combine these ingredients are used in practice and have been found to be effective for large-scale training, but there is little theory for methods that combine all three. We study a heterogeneous-compute setting in which different workers may take different numbers of local steps, and we propose LOSCAR-SGD, a Local SGD method that communicates only a sparse subset of model coordinates and continues optimizing while communication is in flight. A key ingredient is a delay-corrected merge rule that incorporates delayed synchronized information without discarding the progress made during the overlap phase. We give convergence guarantees for smooth non-convex objectives and show how sparsity, overlap, and worker heterogeneity affect the rate. To the best of our knowledge, this is the first theory for this combination of ingredients. Experiments further show that communication-computation overlap reduces training time and that the delay-corrected merge outperforms naive overwriting.
Screens would be banned until 2nd grade under draft LAUSD plan
Things to Do in L.A. Tap to enable a layout that focuses on the article. Children and parents at a recent L.A. Unified school board meeting where screen-time limits were discussed. This is read by an automated voice. Please report any issues or inconsistencies here . The L.A. Board of Education got its first look at proposed screen-time limits for students, including a total ban until secnd grade.
Inside the trans athlete podium controversy sending political shockwaves in California ahead of elections
Victor Wembanyama's historic game one performance was personal, Spurs star reveals in postgame interview Dana White says gnats at Trump's White House Rose Garden dinner raised concerns for outdoor UFC events High school athlete slams CIF's shared podium rule as humiliating response that fails female competitors Kuwaiti Muslim jiu-jitsu champion refuses Israeli athlete's handshake: 'We do not respect them at all' Caitlin Clark's fiery Fever teammate tells WNBA haters to relax with perfect three-word response Red Sox legend Jason Varitek's wife appears to take massive swipe at team after ugly ouster Reds vs. Phillies betting preview: Why Cincinnati is the play despite their 4-12 skid over 16 games Bubba Wallace'seeing red' after being wrecked, female driver rage-quits and cries & NASCAR missed the mark Taiwan warns US about China's regional ambitions as Trump weighs arms deal Nate Bargatze takes clean comedy to big screen with'The Breadwinner' Retired vice admiral on Iran standoff: Trump has'time on his hands' Jury dismisses Elon Musk's lawsuit against OpenAI and Sam Altman Strikes must resume if Iran fails to negotiate'in good faith': Brig Gen John Teichert Trace Gallagher: What does liberal America want? 'Rededicate 250' faith event draws thousands to DC OutKick contributor Riley Gaines discusses female high school athletes speaking out after a transgender participant won multiple events at a California track meet on'Fox & Friends.' MOORPARK, Calif. - On the morning of a day that would live in girls' sports infamy, a letter was handed out to coaches who entered a championship track and field meet at Moorpark High School. The letter announced that any girl who finished behind a biological male trans athlete would be bumped up by one spot on the podium, if that girl finished behind the trans athlete. The letter was dated May 16 -- the day of the event. The California Interscholastic Federation (CIF) will implement the pilot entry process introduced last season at the 2025 CIF Track and Field State Championships.
Top LAUSD academic chiefs leaving as test scores rise and FBI raid sidelines Carvalho
Things to Do in L.A. Tap to enable a layout that focuses on the article. Alberto Carvalho sits with third-grade students as he visits classrooms at Lenicia B. Weemes Elementary School on the first day of classes for LAUSD students in 2023. This is read by an automated voice. Please report any issues or inconsistencies here . Leaders who helped drive L.A. Unified's recent test-score gains are exiting as Supt.
The Enrollment Cliff Is Here. Which Schools Will Survive It?
The Enrollment Cliff Is Here. Which Schools Will Survive It? As the number of new high-school graduates drops, colleges will close, some will merge, and others may change beyond recognition. This series on the future of higher education started with a simple question: Should I still be contributing to my children's college funds? My first attempt to answer that question centered on the growing disillusionment with higher education in general.
Canonical Regularisation of Wide Feature-Learning Neural Networks
Whittle, George, Vaidhyanathan, Pranav, Ziomek, Juliusz, Ares, Natalia, Osborne, Maike A.
Wide neural networks in the feature-learning regime drive modern deep learning, and yet they remain far less studied than their kernel-regime counterparts. We consider a critical yet under-explored difference between these two regimes: the regulariser and prior implied by gradient flow training. This canonical regularisation property is well-studied in kernel regime networks -- of all the infinite global minima, gradient flow selects exactly the vanishing ridge solution -- and underpins the celebrated NN-GP correspondence, precisely allowing the modelling of noise during training. However, we prove ridge regularisation biases gradient flow in feature-learning regime networks, even in the infinitesimal limit of vanishing regularisation. Over training, ridge distorts the inductive bias of the network, with a particular damage done to pretrained networks where the implicit prior is informative. We resolve this by axiomatising the canonical regulariser as a regime-agnostic function-space energy and lift, which uniquely identifies ridge in the kernel regime, and crucially generalises to the feature-learning regime. By studying the Riemannian geometry of feature-learning networks, we derive geodesic ridge from our framework, generalising ridge to the feature-learning regime. Correspondingly, we prove the canonical function-space prior is a Riemannian Gibbs Process, generalising the more familiar Gaussian Process. As a practical contribution, we propose arc ridge as a minimax-robust, scalable surrogate to geodesic ridge, revealing a deep relationship between early stopping and canonical regularisation across learning regimes. Finally, we demonstrate the consequences of our theory empirically on both image processing and NLP transfer-learning problems.
Continuous Diffusion Scales Competitively with Discrete Diffusion for Language
Yang, Zhihan, Guo, Wei, Zhang, Shuibai, Sahoo, Subham Sekhar, Chen, Yongxin, Vahdat, Arash, Mardani, Morteza, Thickstun, John
While diffusion has drawn considerable recent attention from the language modeling community, continuous diffusion has appeared less scalable than discrete approaches. To challenge this belief we revisit Plaid, a likelihood-based continuous diffusion language model (DLM), and construct RePlaid by aligning the architecture of Plaid with modern discrete DLMs. In this unified setting, we establish the first scaling law for continuous DLMs that rivals discrete DLMs: RePlaid exhibits a compute gap of only $20\times$ compared to autoregressive models, outperforms Duo while using fewer parameters, and outperforms MDLM in the over-trained regime. We benchmark RePlaid against recent continuous DLMs: on OpenWebText, RePlaid achieves a new state-of-the-art PPL bound of $22.1$ among continuous DLMs and superior generation quality. These results suggest that continuous diffusion, when trained via likelihood, is a highly competitive and scalable alternative to discrete DLMs. Moreover, we offer theoretical insights to understand the advantage of likelihood-based training. We show that optimizing the noise schedule to minimize the ELBO's variance naturally yields linear cross-entropy (information loss) over time. This evenly distributes denoising difficulty without any case-specific time reparameterization. In addition, we find that optimizing embeddings via likelihood creates structured geometries and drives the most significant likelihood gain.
Third of university students in Great Britain think AI job losses will cause social unrest, poll finds
People attend a jobs fair in London. Only 24% of the members of public surveyed thought AI was a positive thing for humanity. People attend a jobs fair in London. Only 24% of the members of public surveyed thought AI was a positive thing for humanity. One in three university students think AI will wipe out jobs so rapidly it will trigger civil unrest, according to a survey by King's College London (KCL).
Agentic AI for Robot Teams
This presentation highlights recent efforts at the Johns Hopkins Applied Physics Laboratory to advance agentic AI for collaborative robotic teams. It begins by framing the core challenges of enabling autonomy, coordination, and adaptability across heterogeneous systems, then introduces a scalable architecture designed to support agentic behaviors in multi-robot environments. The talk concludes with key challenges encountered and practical lessons learned from ongoing research and development.
Harnessing Unimodality in Semiparametric Contextual Pricing via Oracle Price Map Learning
Fan, Yingying, Han, Yuxuan, Lv, Jinchi, Xu, Xiaocong, Zhou, Zhengyuan
We study contextual dynamic pricing in a semiparametric scalar-index valuation model where the latent value is $v_t=μ_\ast(\mathsf c_t)+ξ_t$, with an unknown utility map $μ_\ast$ and an unknown additive noise distribution. The key decision object is the one-dimensional oracle price map $u\mapsto p^\ast(u)$ induced by the scalar index $u=μ_\ast(\mathsf c)$ and the noise tail. Under the $β$-Hölder smoothness of the tail function for $β\geq 2$ and a revenue-geometry condition that gives a unique, stable, interior maximizer, this oracle map is itself $(β-1)$-smooth. We exploit such structure through $\mathsf{ORBIT}$, a modular coarse-to-fine policy that takes a scalar pilot index as input, localizes a benchmark price in each active bin, and learns a local polynomial approximation of the oracle map inside a trust region via bandit convex optimization. For the baseline linear utility model $μ_\ast(\mathsf c)=\mathsf c^\topθ_\ast$, an adaptive elliptical exploration scheme constructs the required scalar pilot online without distributional assumptions on the contexts. The resulting policy achieves regret $\widetilde{O}\big(T^{\frac{2β-1}{4β-3}}+\sqrt{dT}\big)$. For fixed $d$, we establish a matching lower bound in the horizon dependence, unveiling that the nonparametric oracle-map learning term is minimax sharp. The same scalar-pilot interface also yields extensions to sparse high-dimensional linear utility and nonparametric Hölder utility.