Goto

Collaborating Authors

 forcing



Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

Huang, Xun, Li, Zhengqi, He, Guande, Zhou, Mingyuan, Shechtman, Eli

arXiv.org Artificial Intelligence

We introduce Self Forcing, a novel training paradigm for autoregressive video diffusion models. It addresses the longstanding issue of exposure bias, where models trained on ground-truth context must generate sequences conditioned on their own imperfect outputs during inference. Unlike prior methods that denoise future frames based on ground-truth context frames, Self Forcing conditions each frame's generation on previously self-generated outputs by performing autoregressive rollout with key-value (KV) caching during training. This strategy enables supervision through a holistic loss at the video level that directly evaluates the quality of the entire generated sequence, rather than relying solely on traditional frame-wise objectives. To ensure training efficiency, we employ a few-step diffusion model along with a stochastic gradient truncation strategy, effectively balancing computational cost and performance. We further introduce a rolling KV cache mechanism that enables efficient autoregressive video extrapolation. Extensive experiments demonstrate that our approach achieves real-time streaming video generation with sub-second latency on a single GPU, while matching or even surpassing the generation quality of significantly slower and non-causal diffusion models. Project website: http://self-forcing.github.io/


MaskTune: Mitigating Spurious Correlations by Forcing to Explore

Neural Information Processing Systems

A fundamental challenge of over-parameterized deep learning models is learning meaningful data representations that yield good performance on a downstream task without over-fitting spurious input features. This work proposes MaskTune, a masking strategy that prevents over-reliance on spurious (or a limited number of) features. MaskTune forces the trained model to explore new features during a single epoch finetuning by masking previously discovered features. MaskTune, unlike earlier approaches for mitigating shortcut learning, does not require any supervision, such as annotating spurious features or labels for subgroup samples in a dataset. Our empirical results on biased MNIST, CelebA, Waterbirds, and ImagenNet-9L datasets show that MaskTune is effective on tasks that often suffer from the existence of spurious correlations.


Regional Ocean Forecasting with Hierarchical Graph Neural Networks

Holmberg, Daniel, Clementi, Emanuela, Roos, Teemu

arXiv.org Artificial Intelligence

Accurate ocean forecasting systems are vital for understanding marine dynamics, which play a crucial role in environmental management and climate adaptation strategies. Traditional numerical solvers, while effective, are computationally expensive and time-consuming. Recent advancements in machine learning have revolutionized weather forecasting, offering fast and energy-efficient alternatives. Building on these advancements, we introduce SeaCast, a neural network designed for high-resolution, medium-range ocean forecasting. SeaCast employs a graph-based framework to effectively handle the complex geometry of ocean grids and integrates external forcing data tailored to the regional ocean context. Our approach is validated through experiments at a high spatial resolution using the operational numerical model of the Mediterranean Sea provided by the Copernicus Marine Service, along with both numerical and data-driven atmospheric forcings.


Professor Forcing: A New Algorithm for Training Recurrent Networks

Neural Information Processing Systems

The Teacher Forcing algorithm trains recurrent networks by supplying observed sequence values as inputs during training and using the network's own one-stepahead predictions to do multi-step sampling. We introduce the Professor Forcing algorithm, which uses adversarial domain adaptation to encourage the dynamics of the recurrent network to be the same when training the network and when sampling from the network over multiple time steps. We apply Professor Forcing to language modeling, vocal synthesis on raw waveforms, handwriting generation, and image generation. Empirically we find that Professor Forcing acts as a regularizer, improving test likelihood on character level Penn Treebank and sequential MNIST. We also find that the model qualitatively improves samples, especially when sampling for a large number of time steps. This is supported by human evaluation of sample quality. Trade-offs between Professor Forcing and Scheduled Sampling are discussed. We produce T-SNEs showing that Professor Forcing successfully makes the dynamics of the network during training and sampling more similar.


Parallel Attention Forcing for Machine Translation

Dou, Qingyun, Gales, Mark

arXiv.org Artificial Intelligence

Attention-based autoregressive models have achieved state-of-the-art performance in various sequence-to-sequence tasks, including Text-To-Speech (TTS) and Neural Machine Translation (NMT), but can be difficult to train. The standard training approach, teacher forcing, guides a model with the reference back-history. During inference, the generated back-history must be used. This mismatch limits the evaluation performance. Attention forcing has been introduced to address the mismatch, guiding the model with the generated back-history and reference attention. While successful in tasks with continuous outputs like TTS, attention forcing faces additional challenges in tasks with discrete outputs like NMT. This paper introduces the two extensions of attention forcing to tackle these challenges. (1) Scheduled attention forcing automatically turns attention forcing on and off, which is essential for tasks with discrete outputs. (2) Parallel attention forcing makes training parallel, and is applicable to Transformer-based models. The experiments show that the proposed approaches improve the performance of models based on RNNs and Transformers.


US Is Forcing a Chinese Firm to Sell Gay Dating App Grindr

WIRED

The US government says a Chinese gaming company's ownership of the gay dating app Grindr poses a national security risk, according to a report from Reuters. Beijing Kunlun Tech acquired a 60 percent stake in Grindr in 2016 and bought the rest in 2018. But, Reuters reports, the Chinese firm didn't clear the acquisition with the agency known as the Committee on Foreign Investment in the United States, or CFIUS, which evaluates the national security impacts of foreign investments in US companies. Kunlun is now seeking to sell Grindr following the CFIUS assessment, according to Reuters. Grindr declined to comment; CFIUS and Kunlun did not respond to requests for comment.


Google Home, Now Forcing You To Listen To Adverts With Your Coffee

Forbes - Tech

NEW YORK, NY - OCTOBER 20: People visit the new Google pop-up shop in the SoHo neighborhood on October 20, 2016 in New York City. The shop lets people try out new Google products such as the Pixel phone, Google Home, and Daydream VR. The products will be available for purchase offsite at Verizon and Best Buy retail stores. Imagine the scene, you're relaxing at home about to head out for the day so you ask Google "what's the weather like" or "what meetings do I have" and what you get back is a spoken-word advert hastily stuck in the middle of your itinerary. New Beauty & the Beast promo is one way Google could monetize Home.