Country
ADifference-of-Convex Functions Approach to Energy-Based Iterative Reasoning
While energy-based models have recently proven to be a powerful framework for learning to reason with neural networks, their practical application is still limited by computational cost. That is, existing methods for energy-based iterative reasoning suffer from computational bottlenecks by relying on expensive optimization routines during training and especially during inference.
Focus-Then-Reuse: Fast Adaptation in Visual Perturbation Environments
Visual reinforcement learning has shown promise in various real-world applications. However, deploying policies in complex real-world environments with visual perturbations remains a significant challenge. We notice that humans tend to filter information at the object level prior to decision-making, facilitating efficient skill transfer across different contexts. Inspired by this, we introduce Focus-ThenReuse (FTR), a method utilizing a novel object selection mechanism to focus on task-relevant objects, and directly reuse the simulation-trained policy on them.
QFFT, Question-Free Fine-Tuning for Adaptive Reasoning
Recent advancements in Long Chain-of-Thought (CoT) reasoning models have improved performance on complex tasks, but they suffer from overthinking, which generates redundant reasoning steps, especially for simple questions. This paper revisits the reasoning patterns of Long and Short CoT models, observing that the Short CoT patterns offer concise reasoning efficiently, while the Long CoT patterns excel in challenging scenarios where the Short CoT patterns struggle. To enable models to leverage both patterns, we propose Question-Free Fine-Tuning (QFFT), a fine-tuning approach that removes the input question during training and learns exclusively from Long CoT responses. This approach enables the model to adaptively employ both reasoning patterns: it prioritizes the Short CoT patterns and activates the Long CoT patterns only when necessary. Experiments on various mathematical datasets demonstrate that QFFT reduces average response length by more than 50%, while achieving performance comparable to Supervised FineTuning (SFT). Additionally, QFFT exhibits superior performance compared to SFT in noisy, out-of-domain, and low-resource scenarios.
Decomposition based Loss Function for Time Series Forecasting
Time series forecasting holds significant value in various domains such as economics, traffic, energy, and AIOps, as accurate predictions facilitate informed decision-making. However, the existing Mean Squared Error (MSE) loss function sometimes fails to accurately capture the seasonality or trend within the forecasting horizon, even when decomposition modules are used in the forward propagation to model the trend and seasonality separately. To address these challenges, we propose a simple yet effective Decomposition-Based Loss function called DBLoss. This method uses exponential moving averages to decompose the time series into seasonal and trend components within the forecasting horizon, and then calculates the loss for each of these components separately, followed by weighting them. As a general loss function, DBLoss can be combined with any deep learning forecasting model. Extensive experiments demonstrate that DBLoss significantly improves the performance of state-of-the-art models across diverse real-world datasets and provides a new perspective on the design of time series loss functions.
E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis
Recent advancements in speech synthesis technology have enriched our daily lives, with high-quality and human-like audio widely adopted across real-world applications. However, malicious exploitation like voice-cloning fraud poses severe security risks. Existing defense techniques struggle to address the production large language model (LLM)-based speech synthesis. While previous studies have considered the protection for fine-tuning synthesizers, they assume manually annotated transcripts. Given the labor intensity of manual annotation, end-to-end (E2E) systems leveraging automatic speech recognition (ASR) to generate transcripts are becoming increasingly prevalent, e.g., voice cloning via commercial APIs.
Conditional Panoramic Image Generation via Masked Autoregressive Modeling
Recent progress in panoramic image generation has underscored two critical limitations in existing approaches. First, most methods are built upon diffusion models, which are inherently ill-suited for equirectangular projection (ERP) panoramas due to the violation of the identically and independently distributed (i.i.d.) Gaussian noise assumption caused by their spherical mapping. Second, these methods often treat text-conditioned generation (text-to-panorama) and imageconditioned generation (panorama outpainting) as separate tasks, relying on distinct architectures and task-specific data. In this work, we propose a unified framework, Panoramic AutoRegressive model (PAR), which leverages masked autoregressive modeling to address these challenges.
Aligning Text-to-Image Diffusion Models to Human Preference by Classification
Text-to-image diffusion models are typically trained on large-scale web data, often resulting in outputs that misalign with human preferences. Inspired by preference learning in large language models, we propose ABC (Alignment by Classification), a simple yet effective framework for aligning diffusion models with human preferences. In contrast to prior DPO-based methods that depend on suboptimal supervised fine-tuned (SFT) reference models, ABC assumes access to an ideal reference model perfectly aligned with human intent and reformulates alignment as a classification problem. Under this classification view, we recognize that preference data naturally forms a semi-supervised classification setting. To address this, we propose a data augmentation strategy that transforms preference comparisons into fully supervised training signals. We then introduce a classification-based ABC loss to guide alignment. Our alignment by classification approach could effectively steer the diffusion model toward the behavior of the ideal reference. Experiments on various diffusion models show that our ABC consistently outperforms existing baselines, offering a scalable and robust solution for preference-based text-to-image fine-tuning. Code is available at https://github.com/dailongquan/abc.
Large language models can learn and generalize steganographic chain-of-thought under process supervision
Chain-of-thought (CoT) reasoning not only enhances large language model performance but also provides critical insights into decision-making processes, marking it as a useful tool for monitoring model intent and planning. However, recent works have shown that banning the mention of a specific example of reward hacking causes obfuscation of the undesired reasoning traces but the persistence of the undesired behavior, threatening the reliability of CoT monitoring. We provide an extension to these results with regard to the ability of models to learn a specific type of obfuscated reasoning: steganography. First, we show that penalizing the use of specific strings within load-bearing reasoning traces causes models to substitute alternative strings. Crucially, this does not alter the underlying method by which the model performs the task, demonstrating that the model can learn to steganographically encode its reasoning. We further demonstrate that models can generalize an encoding scheme. When the penalized strings belong to an overarching class, the model learns not only to substitute strings seen in training, but also develops a general encoding scheme for all members of the class which it can apply to held-out testing strings.
US judge dismisses Musk's xAI trade secret lawsuit against OpenAI
US judge dismisses Musk's xAI trade secret lawsuit against OpenAI A United States federal judge has dismissed a lawsuit by Elon Musk's artificial intelligence company xAI that accused rival Sam Altman's OpenAI of stealing trade secrets for chatbots. US District Judge Rita Lin in San Francisco said on Monday that xAI failed to show that OpenAI induced former xAI senior engineer Xuechen Li to divulge confidential information related to its Grok chatbot, or that OpenAI engineers knew Li might have disclosed any. She dismissed an earlier version in February. The lawsuit originally filed last September focused on broader alleged misappropriation of confidential information, including source code, by xAI employees who left for jobs at OpenAI. Monday's decision is Musk's second legal loss against OpenAI in four weeks. On May 18, a federal jury ruled against Musk, the world's richest person, in his $150bn lawsuit accusing OpenAI and Altman of "stealing a charity" by betraying the company's original mission as a nonprofit to enrich themselves.