Goto

Collaborating Authors

 Government


Privacy-Accuracy Trade-offs in High-Dimensional LASSO under Perturbation Mechanisms

arXiv.org Machine Learning

We study privacy-preserving sparse linear regression in the high-dimensional regime, focusing on the LASSO estimator. We analyze two widely used mechanisms for differential privacy: output perturbation, which injects noise into the estimator, and objective perturbation, which adds a random linear term to the loss function. Using approximate message passing (AMP), we characterize the typical behavior of these estimators under random design and privacy noise. To quantify privacy, we adopt typical-case measures, including the on-average KL divergence, which admits a hypothesis-testing interpretation in terms of distinguishability between neighboring datasets. Our analysis reveals that sparsity plays a central role in shaping the privacy-accuracy trade-off: stronger regularization can improve privacy by stabilizing the estimator against single-point data changes. We further show that the two mechanisms exhibit qualitatively different behaviors. In particular, for objective perturbation, increasing the noise level can have non-monotonic effects, and excessive noise may destabilize the estimator, leading to increased sensitivity to data perturbations. Our results demonstrate that AMP provides a powerful framework for analyzing privacy-accuracy trade-offs in high-dimensional sparse models.


Silicon Valley Is in a Frenzy Over Bots That Build Themselves

The Atlantic - Technology

How close are we really to self-improving AI? Late last month, a large crowd gathered in downtown San Francisco to demand that the AI industry stop developing more powerful bots. Holding signs and banners reading Stop the AI Race and Don't Build Skynet, the protesters marched through the city and gave speeches outside the offices of Anthropic, OpenAI, and xAI. The crowd demanded that these companies halt efforts to create superintelligent machines--and, in particular, AI models that can develop future AI models. Such a technology, attendees said, could extinguish all human life. At AI protests and happy hours, inside start-ups and major companies, the tech world is in a frenzy over the same thing: Computers that make themselves smarter.


Caveman casino! Humans began gambling 12,000 YEARS ago, scientists say - as they discover ancient dice in the western Great Plains

Daily Mail - Science & tech

Sydney Sweeney's role is cut from The Devil Wears Prada 2 Driver who hit and killed jogger father-of-two sues victim's estate claiming incident left him with severe PTSD New'Hollywood dose' pill: A-listers hooked on'youth elixir' that dermatologists say is anti-aging, shrinks pores, smooths wrinkles... and even banishes rosacea Alarm over popular new coffee chain invading the US... as experts warn of chilling secret behind its $1.99 brew Vance grounded at White House as Iran peace talks in turmoil and Trump declares: 'I expect to be bombing' Jordon Hudson extends her control over Bill Belichick's empire with secret move that is set to leave his family and friends furious Ark of the Covenant's final resting place pinpointed by archaeologists as fresh search begins Life-threatening cantaloupe recall in four states upgraded to FDA's highest risk level... 'reasonable probability of death' Truth about your Mounjaro injection site: Our expert doctors reveal exactly where you should inject yourself for the best results, what to do if your weight loss has slowed down... and the areas you should NEVER jab Ritzy Bay Area town torn apart after teacher's daughter, 16, crashed car while speeding and killed four friends... then posted a TikTok video that poured fuel on the flames Beloved Republican mayor of small Great Plains town could be deported over'mistake' he insists was an innocent one Humiliating moment runner celebrates winning marathon... only to be pipped at the line by rival in brutal finish The new'posh' drug that's easier to order than Uber Eats - and why all my middle-class friends have ditched booze and cocaine for it: JANA HOCKING Why desperate Fergie's next move will be her biggest bombshell yet... and this is the only thing that can stop her: AMANDA PLATELL RED MORE: Man's best friend has been in Britain for 14,300 years Humans began gambling 12,000 years ago, experts say - after discovering dice that date back to the last Ice Age. A team from Colorado State University have unearthed the earliest evidence of two-sided dice crafted from small pieces of bone. They were originally found at an archaeological site on the western Great Plains of America, predating the current oldest known dice by more than 6,000 years. The discovery indicates that gambling and games of chance have been a persistent feature of North American culture since the end of the last Ice Age, experts say. 'Historians have traditionally treated dice and probability as Old World innovations,' researcher Robert Madden said.


Forecast collapse of transformer-based models under squared loss in financial time series

arXiv.org Machine Learning

We study trajectory forecasting under squared loss for time series with weak conditional structure, using highly expressive prediction models. Building on the classical characterization of squared-loss risk minimization, we emphasize regimes in which the conditional expectation of future trajectories is effectively degenerate, leading to trivial Bayes-optimal predictors (flat for prices and zero for returns in standard financial settings). In this regime, increased model expressivity does not improve predictive accuracy but instead introduces spurious trajectory fluctuations around the optimal predictor. These fluctuations arise from the reuse of noise and result in increased prediction variance without any reduction in bias. This provides a process-level explanation for the degradation of Transformerbased forecasts on financial time series. We complement these theoretical results with numerical experiments on high-frequency EUR/USD exchange rate data, analyzing the distribution of trajectory-level forecasting errors. The results show that Transformer-based models yield larger errors than a simple linear benchmark on a large majority of forecasting windows, consistent with the variance-driven mechanism identified by the theory.


Closed-form conditional diffusion models for data assimilation

arXiv.org Machine Learning

We propose closed-form conditional diffusion models for data assimilation. Diffusion models use data to learn the score function (defined as the gradient of the log-probability density of a data distribution), allowing them to generate new samples from the data distribution by reversing a noise injection process. While it is common to train neural networks to approximate the score function, we leverage the analytical tractability of the score function to assimilate the states of a system with measurements. To enable the efficient evaluation of the score function, we use kernel density estimation to model the joint distribution of the states and their corresponding measurements. The proposed approach also inherits the capability of conditional diffusion models of operating in black-box settings, i.e., the proposed data assimilation approach can accommodate systems and measurement processes without their explicit knowledge. The ability to accommodate black-box systems combined with the superior capabilities of diffusion models in approximating complex, non-Gaussian probability distributions means that the proposed approach offers advantages over many widely used filtering methods. We evaluate the proposed method on nonlinear data assimilation problems based on the Lorenz-63 and Lorenz-96 systems of moderate dimensionality and nonlinear measurement models. Results show the proposed approach outperforms the widely used ensemble Kalman and particle filters when small to moderate ensemble sizes are used.


Breaking Data Symmetry is Needed For Generalization in Feature Learning Kernels

arXiv.org Machine Learning

Grokking occurs when a model achieves high training accuracy but generalization to unseen test points happens long after that. This phenomenon was initially observed on a class of algebraic problems, such as learning modular arithmetic (Power et al., 2022). We study grokking on algebraic tasks in a class of feature learning kernels via the Recursive Feature Machine (RFM) algorithm (Radhakrishnan et al., 2024), which iteratively updates feature matrices through the Average Gradient Outer Product (AGOP) of an estimator in order to learn task-relevant features. Our main experimental finding is that generalization occurs only when a certain symmetry in the training set is broken. Furthermore, we empirically show that RFM generalizes by recovering the underlying invariance group action inherent in the data. We find that the learned feature matrices encode specific elements of the invariance group, explaining the dependence of generalization on symmetry.


OpenAI Is Doing Everything … Poorly

The Atlantic - Technology

The company's sudden decision to pull the plug on Sora is a sign of deeper trouble. When I opened Sora this morning, I was met with a flood of strange and disturbing AI-generated videos. On OpenAI's video app, I scrolled through fabricated scenes of the Iran war and a barrage of fake Donald Trumps blabbering about Jeffrey Epstein. In my least favorite clip, I watched a man deep-fry an infant. The app lets users create fairly realistic-looking AI-generated clips--including of their own likeness--and then post them on a TikTok-like feed.


OpenAI shutters AI video generator Sora in abrupt announcement

The Guardian

Tech firm'says goodbye' to Sora, made publicly available in 2024, just six months after its launch of a stand-alone app In an abrupt announcement on Tuesday, OpenAI said it was "saying goodbye" to its AI video generator Sora. The move comes just six months after the company's splashy launch of a stand-alone app with which people could make and share hyper-realistic AI videos in a scrolling social feed. "To everyone who created with Sora, shared it, and built community around it: thank you," the company wrote in a post on X . "What you made with Sora mattered, and we know this news is disappointing." OpenAI first made Sora publicly available in late 2024, but it wasn't until the company launched Sora 2 and its stand-alone app last September that the video generator reached mainstream attention.


Double Machine Learning for Static Panel Data with Instrumental Variables: New Method and Applications

arXiv.org Machine Learning

Panel data methods are widely used in empirical analysis to address unobserved heterogeneity, but causal inference remains challenging when treatments are endogenous and confounding variables high-dimensional and potentially nonlinear. Standard instrumental variables (IV) estimators, such as two-stage least squares (2SLS), become unreliable when instrument validity requires flexibly conditioning on many covariates with potentially non-linear effects. This paper develops a Double Machine Learning estimator for static panel models with endogenous treatments (panel IV DML), and introduces weak-identification diagnostics for it. We revisit three influential migration studies that use shift-share instruments. In these settings, instrument validity depends on a rich covariate adjustment. In one application, panel IV DML strengthens the predictive power of the instrument and broadly confirms 2SLS results. In the other cases, flexible adjustment makes the instruments weak, leading to substantially more cautious causal inference than conventional 2SLS. Monte Carlo evidence supports these findings, showing that panel IV DML improves estimation accuracy under strong instruments and delivers more reliable inference under weak identification.


Generalized Discrete Diffusion from Snapshots

arXiv.org Machine Learning

We introduce Generalized Discrete Diffusion from Snapshots (GDDS), a unified framework for discrete diffusion modeling that supports arbitrary noising processes over large discrete state spaces. Our formulation encompasses all existing discrete diffusion approaches, while allowing significantly greater flexibility in the choice of corruption dynamics. The forward noising process relies on uniformization and enables fast arbitrary corruption. For the reverse process, we derive a simple evidence lower bound (ELBO) based on snapshot latents, instead of the entire noising path, that allows efficient training of standard generative modeling architectures with clear probabilistic interpretation. Our experiments on large-vocabulary discrete generation tasks suggest that the proposed framework outperforms existing discrete diffusion methods in terms of training efficiency and generation quality, and beats autoregressive models for the first time at this scale. We provide the code along with a blog post on the project page : \href{https://oussamazekri.fr/gdds}{https://oussamazekri.fr/gdds}.