AITopics

Country:

North America > United States > Illinois (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Belgium (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)

Neural Information Processing SystemsFeb-12-2026, 10:12:07 GMT

Bootstrapped Transformer for Offline Reinforcement Learning Kerong Wang Shanghai Jiao Tong University Hanye Zhao Shanghai Jiao Tong University Xufang Luo Microsoft Research Asia Kan Ren

The work was conducted during the internship of Kerong Wang and Hanye Zhao at Microsoft Research.

machine learning, reinforcement learning, trajectory, (15 more...)

Country: Asia > China > Shanghai > Shanghai (0.76)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsDec-24-2025, 08:22:31 GMT

NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis

Infinite visual synthesis aims to generate high-resolution images, long-duration videos, and even visual generation of infinite size. Some recent work tried to solve this task by first dividing data into processable patches and then training the models on them without considering the dependencies between patches. However, since they fail to model global dependencies between patches, the quality and consistency of the generation can be limited. To address this issue, we propose NUWA-Infinity, a patch-level \emph{``render-and-optimize''} strategy for infinite visual synthesis. Given a large image or a long video, NUWA-Infinity first splits it into non-overlapping patches and uses the ordered patch chain as a complete training instance, a rendering model autoregressively predicts each patch based on its contexts. Once a patch is predicted, it is optimized immediately and its hidden states are saved as contexts for the next \emph{``render-and-optimize''} process. This brings two advantages: ($i$) The autoregressive rendering process with information transfer between contexts provides an implicit global probabilistic distribution modeling; ($ii$) The timely optimization process alleviates the optimization stress of the model and helps convergence. Based on the above designs, NUWA-Infinity shows a strong synthesis ability on high-resolution images and long-duration videos.

autoregressive generation, infinite visual synthesis, nuwa-infinity, (7 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.59)

Neural Information Processing SystemsOct-10-2025, 11:20:15 GMT

Cascade Speculative Drafting for Even Faster LLM Inference

Cascade optimizes time allocation in drafting for improved efficiency.

cascade, draft model, drafting, (15 more...)

Country:

North America > United States > Illinois (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Belgium (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Tymkow, Ryan T., Schnapp, Benjamin D., Valipour, Mojtaba, Ghodshi, Ali

Symbolic-Diffusion: Deep Learning Based Symbolic Regression with D3PM Discrete Token Diffusion

arXiv.org Artificial IntelligenceOct-10-2025

Symbolic regression refers to the task of finding a closed-form mathematical expression to fit a set of data points. Genetic programming based techniques are the most common algorithms used to tackle this problem, but recently, neural-network based approaches have gained popularity. Most of the leading neural-network based models used for symbolic regression utilize transformer-based autoregressive models to generate an equation conditioned on encoded input points. However, autoregressive generation is limited to generating tokens left-to-right, and future generated tokens are conditioned only on previously generated tokens. Motivated by the desire to generate all tokens simultaneously to produce improved closed-form equations, we propose Symbolic Diffusion, a D3PM based discrete state-space diffusion model which simultaneously generates all tokens of the equation at once using discrete token diffusion. Using the bivariate dataset developed for SymbolicGPT, we compared our diffusion-based generation approach to an autoregressive model based on SymbolicGPT, using equivalent encoder and transformer architectures. We demonstrate that our novel approach of using diffusion-based generation for symbolic regression can offer comparable and, by some metrics, improved performance over autoregressive generation in models using similar underlying architectures, opening new research opportunities in neural-network based symbolic regression.

artificial intelligence, machine learning, symbolic regression, (15 more...)

2510.0757

Country: North America > Canada (0.15)

Genre:

Research Report > New Finding (0.68)
Research Report > Promising Solution (0.67)
Research Report > Experimental Study (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Szewczyk, Konrad, Fernández, Daniel Gallo, Townsend, James

Linear RNNs for autoregressive generation of long music samples

arXiv.org Artificial IntelligenceOct-6-2025

Directly learning to generate audio waveforms in an autoregressive manner is a challenging task, due to the length of the raw sequences and the existence of important structure on many different timescales. Traditional approaches based on recurrent neural networks, as well as causal convolutions and self-attention, have only had limited success on this task. However, recent work has shown that deep state space models, also referred to as linear RNNs, can be highly efficient in this context. In this work, we push the boundaries of linear RNNs applied to raw audio modeling, investigating the effects of different architectural choices and using context-parallelism to enable training on sequences up to one minute (1M tokens) in length. We present a model, HarmonicRNN, which attains state of the art log-likelihoods and perceptual metrics on small-scale datasets.

artificial intelligence, machine learning, natural language, (16 more...)

2510.02401

Country: Europe (0.15)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

arXiv.org Artificial IntelligenceAug-20-2025

QuickMerge++: Fast Token Merging with Autoregressive Prior

Liu, Dong, Yu, Yanxuan

As generative models scale to larger inputs across language, vision, and video domains, the cost of token-level computation has become a key bottleneck. While prior work suggests that only a subset of tokens significantly influence downstream predictions, most token selection methods are static, modality-specific, or incompatible with autoregressive generation. In this paper, we propose QuickMerge, a lightweight token merging framework designed for efficient next-token prediction. QuickMerge dynamically selects a reduced number of tokens based on attention norm magnitude, guided by an entropy-based budget estimator. To preserve autoregressive compatibility, we introduce a lightweight transformer prior trained over the merged token sequence. By combining semantic salience estimation, flexible token budgets, and AR alignment, QuickMerge enables accurate generation with fewer tokens. We evaluate QuickMerge across multi-modality domains, demonstrating consistent improvements in compute-accuracy tradeoffs. Specifically, QuickMerge reduces token counts sustantially while matching as well as exceeding the performance of learned tokenizers and fixed-patch baselines.

arxiv preprint arxiv, machine learning, natural language, (17 more...)

2508.13204

Genre: Research Report (1.00)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsAug-15-2025, 15:18:53 GMT

CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers Ming Ding

A lion man is typing in the o ffi ce. A beautiful girl is hugging a husky. A lion teacher wearing a suit is in front of a blackboard. A robot is riding under the blue and cloudy sky. Several youths are talking in a bar. A young woman is taking photos.

arxiv preprint arxiv, latexit sha1, mask region, (15 more...)

Country: North America > United States > California > Alameda County > Oakland (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.30)

arXiv.org Artificial IntelligenceJul-17-2025

Quantize More, Lose Less: Autoregressive Generation from Residually Quantized Speech Representations

Han, Yichen, Hao, Xiaoyang, Chen, Keming, Xiong, Weibo, He, Jun, Zhang, Ruonan, Cao, Junjie, Liu, Yue, Li, Bowen, Zhang, Dongrui, Xia, Hui, Fu, Huilei, Jia, Kai, Guo, Kaixuan, Jin, Mingli, Meng, Qingyun, Ma, Ruidong, Fang, Ruiqian, Guo, Shaotong, Li, Xuhui, Xiang, Yang, Zhang, Ying, Liu, Yulong, Li, Yunfeng, Zhang, Yuyi, Zhou, Yuze, Wang, Zhen, Chen, Zhaowen

Text-to-speech (TTS) synthesis has seen renewed progress under the discrete modeling paradigm. Existing autoregressive approaches often rely on single-codebook representations, which suffer from significant information loss. Even with post-hoc refinement techniques such as flow matching, these methods fail to recover fine-grained details (e.g., prosodic nuances, speaker-specific timbres), especially in challenging scenarios like singing voice or music synthesis. We propose QTTS, a novel TTS framework built upon our new audio codec, QDAC. The core innovation of QDAC lies in its end-to-end training of an ASR-based auto-regressive network with a GAN, which achieves superior semantic feature disentanglement for scalable, near-lossless compression. QTTS models these discrete codes using two innovative strategies: the Hierarchical Parallel architecture, which uses a dual-AR structure to model inter-codebook dependencies for higher-quality synthesis, and the Delay Multihead approach, which employs parallelized prediction with a fixed delay to accelerate inference speed. Our experiments demonstrate that the proposed framework achieves higher synthesis quality and better preserves expressive content compared to baseline. This suggests that scaling up compression via multi-codebook modeling is a promising direction for high-fidelity, general-purpose speech and audio generation.

large language model, machine learning, natural language, (17 more...)

2507.12197

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.51)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)