Goto

Collaborating Authors

 ais


AIS: Adaptive Importance Sampling for Quantized RL

arXiv.org Machine Learning

Reinforcement learning (RL) for large language models (LLMs) is dominated by the cost of rollout generation, which has motivated the use of low-precision rollouts (e.g., FP8) paired with a BF16 trainer to improve throughput and reduce memory pressure. This introduces a rollout-training mismatch that biases the policy gradient and can cause training to collapse outright on reasoning benchmarks. We show that the mismatch is non-stationary and acts as a double-edged sword: early in training it provides a stochastic exploration bonus, exposing the gradient to trajectories the trainer would otherwise under-sample, but the same perturbation transitions into a destabilizing source of bias as the policy concentrates. To solve this, we propose Adaptive Importance Sampling (AIS), a correction framework that adjusts the strength of its intervention on a per-batch basis. AIS combines three real-time diagnostics, namely weight reliability, divergence severity, and variance amplification, into a single mixing coefficient that interpolates between the uncorrected and fully importance-weighted gradients, suppressing the destabilizing component of the mismatch while preserving its exploratory benefit. We integrate AIS into GRPO and evaluate it on the diffusion-based LLaDA-8B-Instruct and the autoregressive Qwen3-8B and Qwen3.5-9B across mathematical reasoning and planning benchmarks. AIS matches the BF16 baseline on most tasks while retaining the 1.5 to 2.76x rollout speedup of FP8.



AI Is Getting Scary Good at Making Predictions

The Atlantic - Technology

Even superforecasters are guessing that they'll soon be obsolete. To live in time is to wonder what will happen next. In every human society, there are people who obsess over the world's patterns to predict the future. In antiquity, they told kings which stars would appear at nightfall. Today they build the quantitative models that nudge governments into opening spigots of capital.


LearningOptimalFlowsfor Non-EquilibriumImportanceSampling

Neural Information Processing Systems

Onthetheory side,wediscuss howtotailorthevelocity fieldtothetargetandestablish general conditions under which the proposed estimator is a perfect estimator with zerovariance.




The office block where AI 'doomers' gather to predict the apocalypse

The Guardian

In a building in central Berkeley, not far from the university campus, a group of modern-day Cassandras are looking into concerns around the latest AI models. In a building in central Berkeley, not far from the university campus, a group of modern-day Cassandras are looking into concerns around the latest AI models. The office block where AI'doomers' gather to predict the apocalypse On the other side of San Francisco bay from Silicon Valley, where the world's biggest technology companies tear towards superhuman artificial intelligence, looms a tower from which fearful warnings emerge. At 2150 Shattuck Avenue, in the heart of Berkeley, is the home of a group of modern-day Cassandras who rummage under the hood of cutting-edge AI models and predict what calamities may be unleashed on humanity - from AI dictatorships to robot coups. Here you can hear an AI expert express sympathy with an unnerving idea: San Francisco may be the new Wuhan, the Chinese city where Covid originated and wreaked havoc on the world.


Co-Generation with GANs using AIS based HMC

Neural Information Processing Systems

Inferring the most likely configuration for a subset of variables of a joint distribution given the remaining ones -- which we refer to as co-generation -- is an important challenge that is computationally demanding for all but the simplest settings. This task has received a considerable amount of attention, particularly for classical ways of modeling distributions like structured prediction. In contrast, almost nothing is known about this task when considering recently proposed techniques for modeling high-dimensional distributions, particularly generative adversarial nets (GANs). Therefore, in this paper, we study the occurring challenges for co-generation with GANs. To address those challenges we develop an annealed importance sampling based Hamiltonian Monte Carlo co-generation algorithm. The presented approach significantly outperforms classical gradient based methods on a synthetic and on the CelebA and LSUN datasets.


Are AlphaZero-like Agents Robust to Adversarial Perturbations?

Neural Information Processing Systems

The success of AlphaZero (AZ) has demonstrated that neural-network-based Go AIs can surpass human performance by a large margin. Given that the state space of Go is extremely large and a human player can play the game from any legal state, we ask whether adversarial states exist for Go AIs that may lead them to play surprisingly wrong actions.In this paper, we first extend the concept of adversarial examples to the game of Go: we generate perturbed states that are ``semantically'' equivalent to the original state by adding meaningless moves to the game, and an adversarial state is a perturbed state leading to an undoubtedly inferior action that is obvious even for Go beginners. However, searching the adversarial state is challenging due to the large, discrete, and non-differentiable search space. To tackle this challenge, we develop the first adversarial attack on Go AIs that can efficiently search for adversarial states by strategically reducing the search space. This method can also be extended to other board games such as NoGo. Experimentally, we show that the actions taken by both Policy-Value neural network (PV-NN) and Monte Carlo tree search (MCTS) can be misled by adding one or two meaningless stones; for example, on 58\% of the AlphaGo Zero self-play games, our method can make the widely used KataGo agent with 50 simulations of MCTS plays a losing action by adding two meaningless stones. We additionally evaluated the adversarial examples found by our algorithm with amateur human Go players, and 90\% of examples indeed lead the Go agent to play an obviously inferior action.


What will your life look like in 2035?

The Guardian

What will your life look like in 2035? When AIs become consistently more capable than humans, life could change in strange ways. It could happen in the next few years, or a little longer. If and when it comes, our domestic routines - trips to the doctor, farming, work and justice systems - could all look very different. The'AI' doctor will see you now In 2035, AIs are more than co-pilots in medicine, they have become the frontline for much primary care.