Goto

Collaborating Authors

 Genre


BASIS: Batchwise Advantage Estimation from Single-Rollout Information Sharing for LLM Reasoning

arXiv.org Machine Learning

Reinforcement learning with verifiable rewards has become a standard recipe for improving the reasoning abilities of large language models. Existing algorithms face a tradeoff between computational efficiency and sample efficiency in value estimation and policy learning. We introduce BASIS, a critic-free post-training algorithm designed to address this tradeoff. At each online training step, BASIS samples only one rollout per prompt, but leverages rich information across prompts in the entire batch to improve value function estimation. Our experiments demonstrate that BASIS reduces MSE in value function estimation by 69% compared to REINFORCE++, a representative single-rollout baseline, and achieves lower MSE with one rollout than group mean estimators with 8 rollouts. This improvement in value estimation translates to better policy optimization: using substantially less training time, BASIS achieves performance close to multi-rollout GRPO-type baselines and often outperforms single-rollout REINFORCE-type baselines.


From Scores to Gibbs Correctors: Accelerating Uniform-Rate Discrete Diffusion Models

arXiv.org Machine Learning

Discrete diffusion models have achieved strong empirical performance in text and other symbolic domains, but, especially for uniform-rate models, they often require many steps to generate a single sample. Existing acceleration methods either rely on training additional quantities or suffer from slow mixing. In this work, we propose a novel Gibbs-based corrector for discrete diffusion models, termed Gibbs-Accelerated Discrete Diffusion (GADD). GADD leverages the structure of the concrete score function to construct Gibbs posterior likelihoods directly, without requiring any additional training beyond standard score estimation. We show that GADD achieves an overall sampling complexity of $\mathcal{O}(\mathrm{polylog} (\varepsilon^{-1}))$, yielding the first such rate for diffusion-based samplers for uniform-rate discrete diffusion models. We also conduct numerical experiments demonstrating the practical advantages of GADD across synthetic data, zero-shot text sampling, and zero-shot conditional music generation. These results corroborate the theory and show that GADD consistently improves sample quality and wall-clock efficiency over standard baselines, including vanilla Euler methods and CTMC correctors. Beyond this, our theoretical analysis introduces a novel framework for analyzing predictor-corrector methods in discrete diffusion models, which may be of independent interest. Unlike existing approaches that rely on the Girsanov change-of-measure technique, our method is based on an induction argument that tracks error propagation across predictor iterations while accounting for inaccuracies in the corrector updates.


Sam Altman Says AI 'Jobs Apocalypse' He Once Predicted Probably Won't Happen. What Changed?

TIME - Tech

Sam Altman Says AI'Jobs Apocalypse' He Once Predicted Probably Won't Happen. OpenAI CEO Sam Altman speaks during the BlackRock Infrastructure Summit on March 11, 2026 in Washington, DC. OpenAI CEO Sam Altman speaks during the BlackRock Infrastructure Summit on March 11, 2026 in Washington, DC. Throughout his rise to becoming one of the most influential CEOs in artificial intelligence, OpenAI's Sam Altman made repeated bold assertions about the impact that the new technology would have on jobs. He has said that AI will "probably replace most of the jobs people do today," that entire job categories will be "totally, totally gone," and that those impacted by the dramatic shifts will "find all sorts of new things to do. Now, however, Altman appears to have changed his tune, saying he is "delighted to be wrong" about the impact AI would have on employment. I don't think we're going to have the kind of jobs apocalypse that some of the companies in our space advocate or talk about, he said during a virtual interview at a Commonwealth Bank of Australia (CBA) conference in Sydney on Tuesday. "I thought there would have been more impact on entry-level white-collar jobs being eliminated by now than has actually happened, Altman said.


Seed-size sea slug looks like an everything bagel

Popular Science

An undergraduate student first spotted the translucent species off the coast of Taiwan. More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. Breakthroughs, discoveries, and DIY tips sent six days a week. These are some of the ingredients that come together to make, a newly identified species of sea slug, or nudibranch, found swimming in Taiwan. "Taiwanese divers call it'sesame' in Chinese and it is also small like a sesame seed, hence the name," researchers explain in a statement .


Ever wish your dog could speak to you? AI collar can translate your pet's barks with 95% accuracy, experts claim

Daily Mail - Science & tech

Trump's secret NATO ultimatum sparks panic as US'pulls jets, bombers and EVERY submarine from Europe' Iraq war widow left speechless at Trump cabinet's actions after she made humble plea for someone to visit husband's grave on Memorial Day Condo bloodbath hits US hotspots as values plunge to lowest in decades and terrified investors issue doom-laden warning: 'Not just a price correction' Scandals plague the'horse girls' of America's'spoiled brat capital': Insiders lift lid on VIP world hit by vile claims and furious backlash Lisa Rinna gets political as she SLAMS Spencer Pratt's run for LA mayor while taking a jab at Donald Trump Kyle Busch's bitter NASCAR rival reveals heartbreaking sign he'wasn't well' in final meeting before he died I'm a doctor, and treat men with premature ejaculation. Furious followers demand REAL story from heiress Belle Burden as she's accused of lying about her finances in divorce memoir Spencer Pratt fires back at The Price Is Right host Drew Carey with Epstein jab after he called LA mayor hopeful a'serial scammer' Ever wish your dog could speak to you? AI collar can translate your pet's barks with 95% accuracy, experts claim The half-price Hamptons: Insiders reveal America's new sanctuary, where the beaches are untouched and a'quiet luxury' charm endures So many of my female friends are resorting to a risky new sex taboo to spice up their marriages. You'll know women secretly doing it too... but we simply can't let this become normal: JANA HOCKING Donald Trump fires back at Joe Rogan's criticism of UFC White House event... amid podcaster's slating of president he endorsed Danielle Fishel, 45, was everyone's favorite girlfriend in Boy Meets World, see her now in rare appearance I got addicted to the stimulant that Trump insiders are secretly using... it can obliterate your sexual performance and ruined my life When Alex suffered a mortifying accident in bed with her new partner, she put it down to an embarrassing one-off. Little did she know she had a condition which is silently affecting thousands of women in their 50s and 60s... Ever wish your dog could speak to you? AI collar can translate your pet's barks with 95% accuracy, experts claim If you've ever wondered what your dog's barks really mean, a new ' AI collar' claims to translate their noises with remarkable accuracy. Chinese startup Meng Xiaoyi has launched a device that it alleges can translate animal sounds into human language.


Extremely rare 1924 Olympic gold medal up for auction

Popular Science

The medals were the first to feature the iconic interlocking rings. More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. The medals were designed by sculptor André Rivaud. Breakthroughs, discoveries, and DIY tips sent six days a week. An extremely rare piece of Olympics history hits the auction block this week.


Musk and Altman's AI rivalry reaches boiling point as IPO race heats up

The Guardian

Elon Musk attends Donald Trump's inauguration in Washington DC on 20 January 2025. Sam Altman attends a press conference at the White House on 21 January 2025. Elon Musk attends Donald Trump's inauguration in Washington DC on 20 January 2025. Sam Altman attends a press conference at the White House on 21 January 2025. Musk and Altman's AI rivalry reaches boiling point as IPO race heats up Let's recap a whirlwind five days that may determine the future of AI.


Spotify is adding long-form articles to its audiobook library

Engadget

Its first rollout includes over 650 long-form narrated articles. Spotify is expanding its offerings with a pretty wide selection of narrated long-form magazine articles from several publications that are most likely already familiar to you. The audio streaming service has announced that it's adding over 650 long-form articles to its audiobook library. While all the pieces it added are in the English language only, they will be available in all of Spotify's regions where audiobooks are available. The articles included in this rollout include pieces from and .


Score-Repellent Monte Carlo: Toward Efficient Non-Markovian Sampler with Constant Memory in General State Spaces

arXiv.org Machine Learning

History-dependent sampling can reduce long-run Monte Carlo variance by discouraging redundant revisits, but existing schemes typically encode history through empirical measure on finite state spaces, which is infeasible in high-dimensional discrete configuration spaces or ill-posed in continuous domains. We propose Score-Repellent Monte Carlo (SRMC) framework that summarizes trajectory history by a running average of score evaluations in $\mathbb{R}^d$, where $d$ is the dimension of the score and state representation. This history is converted into a surrogate target through an exponential score tilt, indexed with $α$ that represents the strength of repellence in controlling the magnitude of the history-based repulsion. The surrogate family is normalization-free in the standard MCMC sense, yielding a generic wrapper: at each iteration, any base kernel targeting $π$ can instead be run on the current surrogate $π_{θ_n}$ while the history is updated online. We analyze the coupled evolution of the history recursion and Monte Carlo estimators using stochastic approximation with controlled Markovian noise, establishing almost sure convergence and a joint central limit theorem. We further identify regimes in which the asymptotic covariance decreases as $α$ increases, with scaling $O(1/α)$, extending the near-zero-variance effect of finite-state history-dependent samplers to general state spaces with constant memory. Experiments on continuous targets and discrete energy-based models demonstrate improved estimator variance and mode coverage, while retaining $O(d)$ memory usage and modest per-iteration overhead.


Efficient Preference Poisoning Attack on Offline RLHF

arXiv.org Machine Learning

Offline Reinforcement Learning from Human Feedback (RLHF) pipelines such as Direct Preference Optimization (DPO) train on a pre-collected preference dataset, which makes them vulnerable to preference poisoning attack. We study label flip attacks against log-linear DPO. We first illustrate that flipping one preference label induces a parameter-independent shift in the DPO gradient. Using this key property, we can then convert the targeted poisoning problem into a structured binary sparse approximation problem. To solve this problem, we develop two attack methods: Binary-Aware Lattice Attack (BAL-A) and Binary Matching Pursuit Attack (BMP-A). BAL-A embeds the binary flip selection problem into a binary-aware lattice and applies Lenstra-Lenstra-Lovász reduction and Babai's nearest plane algorithm; we provide sufficient conditions that enforce binary coefficients and recover the minimum-flip objective. BMP-A adapts binary matching pursuit to our non-normalized gradient dictionary and yields coherence-based recovery guarantees and robustness (impossibility) certificates for $K$-flip budgets. Experiments on synthetic dictionaries and the Stanford Human Preferences dataset validate the theory and highlight how dictionary geometry governs attack success.