AITopics | Genre

Collaborating Authors

Genre

NTKMTL: Mitigating Task Imbalance in Multi-Task Learning from Neural Tangent Kernel Perspective

Neural Information Processing SystemsJun-23-2026, 07:04:09 GMT

Multi-Task Learning (MTL) enables a single model to learn multiple tasks simultaneously, leveraging knowledge transfer among tasks for enhanced generalization, and has been widely applied across various domains. However, task imbalance remains a major challenge in MTL. Although balancing the convergence speeds of different tasks is an effective approach to address this issue, it is highly challenging to accurately characterize the training dynamics and convergence speeds of multiple tasks within the complex MTL system. To this end, we attempt to analyze the training dynamics in MTL by leveraging Neural Tangent Kernel (NTK) theory and propose a new MTL method, NTKMTL. Specifically, we introduce an extended NTK matrix for MTL and adopt spectral analysis to balance the convergence speeds of multiple tasks, thereby mitigating task imbalance. Based on the approximation via shared representation, we further propose NTKMTL-SR, achieving training efficiency while maintaining competitive performance. Extensive experiments demonstrate that our methods achieve state-of-the-art performance across a wide range of benchmarks, including both multi-task supervised learning and multi-task reinforcement learning.

artificial intelligence, deep learning, machine learning, (15 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

OnlineSplatter: Pose-Free Online 3DReconstruction for Free-Moving Objects

Neural Information Processing SystemsJun-23-2026, 07:04:00 GMT

Free-mo ticularly ving without object reliable reconstruction pose or depth from cues monocular and under video arbitrary remains object challenging, motion.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots (0.67)
(3 more...)

Add feedback

HiFC: High-efficiency Flash-based KVCache Swapping for Scaling LLMInference

Neural Information Processing SystemsJun-23-2026, 07:03:33 GMT

Large-language-model inference with long contexts often produces key-value (KV) caches whose footprint exceeds the capacity of high-bandwidth memory on a GPU. Prior LLM inference frameworks such as vLLM mitigate this pressure by swapping KV cache pages to host DRAM. However, the high cost of large DRAM pools makes this solution economically unattractive. Although offloading to SSDs can be a cost-effective way to expand memory capacity relative to DRAM, conventional frameworks such as FlexGen experience a substantial throughput drop since the data path that routes SSD traffic through CPU to GPU is severely bandwidth-constrained. To overcome these limitations, we introduce HiFC, a novel DRAM-free swapping scheme that enables direct access to SSD-resident memory with low latency and high effective bandwidth. HiFC stores KV pages in pseudoSLC (pSLC) regions of commodity NVMe SSDs, sustaining high throughput under sequential I/O and improving write endurance by up to 8 . Leveraging GPU Direct Storage, HiFC enables direct transfers between SSD and GPU, bypassing host DRAM and alleviating PCIe bottlenecks. HiFC employs fine-grained block mapping to confine writes to high-performance pSLC zones, stabilizing latency and throughput under load. HiFC achieves inference throughput comparable to DRAMbased swapping under diverse long-context workloads, such as NarrativeQA, while significantly lowering the memory expansion cost of a GPU server system by 4.5 over three years.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Banking & Finance (0.93)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

TITAN: ATrajectory-Informed Technique for Adaptive Parameter Freezing in Large-Scale VQE

Neural Information Processing SystemsJun-23-2026, 07:03:12 GMT

Variational quantum Eigensolver (VQE) is a leading candidate for harnessing quantum computers to advance quantum chemistry and materials simulations, yet its training efficiency deteriorates rapidly for large Hamiltonians. Two issues underlie this bottleneck: (i) the no-cloning theorem imposes a linear growth in circuit evaluations with the number of parameters per gradient step; and (ii) deeper circuits encounter barren plateaus (BPs), leading to exponentially increasing measurement overheads. To address these challenges, here we propose a deep learning framework, dubbed TITAN, which identifies and freezes inactive parameters of a given ansätze at initialization for a specific class of Hamiltonians, reducing the optimization overhead without sacrificing accuracy. The motivation of TITAN starts with our empirical findings that a subset of parameters consistently has negligible influence on training dynamics. Its design combines a theoretically grounded data construction strategy, ensuring each training example is informative and BP-resilient, with an adaptive neural architecture that generalizes across ansätze of varying sizes. Across benchmark transverse-field Ising models, Heisenberg models, and multiple molecule systems up to 30qubits, TITAN achieves up to 3 faster convergence and 40-60%fewer circuit evaluations than state-of-the-art baselines, while matching or surpassing their estimation accuracy. By proactively trimming parameter space, TITAN lowers hardware demands and offers a scalable path toward utilizing VQE to advance practical quantum chemistry and materials science.

artificial intelligence, hamiltonian, machine learning, (16 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Industry: Energy (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Entropy Rectifying Guidance for Diffusion and Flow Models

Neural Information Processing SystemsJun-23-2026, 06:58:44 GMT

Guidance techniques are commonly used in diffusion and flow models to improve image quality and input consistency for conditional generative tasks such as classconditional and text-to-image generation. In particular, classifier-free guidance (CFG) is the most widely adopted guidance technique. It results, however, in trade-offs across quality, diversity and consistency: improving some at the expense of others. While recent work has shown that it is possible to disentangle thesefactors to some extent, such methods come with an overhead of requiring an additional (weaker) model, or require more forward passes per sampling step. In this paper, we propose Entropy Rectifying Guidance (ERG), a simple and effective guidance method based on inference-time changes in the attention mechanism of state-of-the-art diffusion transformer architectures, which allows for simultaneousimprovements over image quality, diversity and prompt consistency. ERG is more general than CFG and similar guidance techniques, as it extends to unconditional sampling. We show that ERG results in significant improvements in various tasks, including text-to-image, class-conditional and unconditional image generation. We also show that ERG can be seamlessly combined with other recent guidance methods such as CADS and APG, further improving generation results.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

Europe (0.46)
North America > Canada (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry:

Media (0.46)
Information Technology (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

EgoVid-5M: ALarge-Scale Video-Action Dataset for Egocentric Video Generation

Neural Information Processing SystemsJun-23-2026, 06:57:43 GMT

Video generation has emerged as a promising tool for world simulation, leveraging visual data to replicate real-world environments. Within this context, egocentric video generation, which centers on the human perspective, holds significant potential for enhancing applications in virtual reality, augmented reality, and gaming. However, the generation of egocentric videos presents substantial challenges due to the dynamic nature of egocentric viewpoints, the intricate diversity of actions, and the complex variety of scenes encountered. Existing datasets are inadequate for addressing these challenges effectively. To bridge this gap, we present EgoVid-5M, the first high-quality dataset specifically curated for egocentric video generation. EgoVid-5M encompasses 5 million egocentric video clips and is enriched with detailed action annotations, including 5M high-level textual descriptions and 65K fine-grained kinematic control annotations. To ensure the integrity and usability of the dataset, we implement a sophisticated data cleaning pipeline designed to maintain frame consistency, action coherence, and motion smoothness under egocentric conditions. Furthermore, we introduce EgoDreamer, which is capable of generating egocentric videos driven simultaneously by action descriptions and kinematic control signals. The EgoVid-5M dataset, associated action annotations, and all data cleansing metadata will be released for the advancement of research in egocentric video generation.

artificial intelligence, data quality, video generation, (16 more...)

Neural Information Processing Systems

Country: Asia (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.86)
Information Technology > Data Science > Data Quality > Data Cleaning (0.54)

Add feedback

Learning from Reward-Free Offline Data: ACase for Planning with Latent Dynamics Models

Neural Information Processing SystemsJun-23-2026, 06:57:20 GMT

A long-standing goal in AI is to develop agents capable of solving diverse tasks across a range of environments, including those never seen during training. Two dominant paradigms address this challenge: (i) reinforcement learning (RL), which learns policies via trial and error, and (ii) optimal control, which plans actions using a known or learned dynamics model. However, their comparative strengths in the offline setting--where agents must learn from reward-free trajectories--remain underexplored. In this work, we systematically evaluate RL and control-based methods on a suite of navigation tasks, using offline datasets of varying quality. On the RL side, we consider goal-conditioned and zero-shot methods. On the control side, we train a latent dynamics model using the Joint Embedding Predictive Architecture (JEPA) and employ it for planning. We investigate how factors such as data diversity, trajectory quality, and environment variability influence the performance of these approaches. Our results show that model-free RL benefits most from large amounts of high-quality data, whereas model-based planning generalizes better to unseen layouts and is more data-efficient, while achieving trajectory stitching performance comparable to leading model-free methods. Notably, planning with a latent dynamics model proves to be a strong approach for handling suboptimal offline data and adapting to diverse environments.

machine learning, reinforcement learning, trajectory, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.86)

Industry: Leisure & Entertainment > Games (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

RANK++LETR: Learn to Rank and Optimize Candidates for Line Segment Detection

Neural Information Processing SystemsJun-23-2026, 06:48:11 GMT

It is observed that the confidence score may fail to reflect the predicting quality accurately in previous proposal-based line segment detection methods, since the scores and the line locations are predicted simultaneously. We find that the line segment detection performance can be further improved by learning-based line candidate ranking and optimizing strategy. To this end, we build a novel end-to-end line detecting model named RANK++LETR upon deformable DETR architecture, where the encoder is used to select the line candidates while the decoder is applied to rank and optimize these candidates. We design line-aware deformable attention (LADA) module in which attention positions are distributed in a long narrow area and can align well with the elongated geometry of line segments. Moreover, we innovatively apply ranking-based supervision in line segment detection task with the design of contiguous labels according to the detection quality. Experimental results demonstrate that our method outperforms previous SOTA methods in prediction accuracy and gets faster inferring speed than other Transformer-based methods.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
(2 more...)

Add feedback

3b00db522fbd628390f41a010d0eaf1f-Paper-Conference.pdf

Neural Information Processing SystemsJun-23-2026, 06:47:20 GMT

Explicit noise-level conditioning is widely regarded as essential for the effective operation of Graph Diffusion Models (GDMs). In this work, we challenge this assumption by investigating whether denoisers can implicitly infer noise levels directly from corrupted graph structures, potentially eliminating the need for explicit noise conditioning. To this end, we develop a theoretical framework centered on Bernoulli edge-flip corruptions and extend it to encompass more complex scenarios involving coupled structure-attribute noise. Extensive empirical evaluations on both synthetic and real-world graph datasets, using models such as GDSS and DiGress, provide strong support for our theoretical findings. Notably, unconditional GDMs achieve performance comparable or superior to their conditioned counterparts, while also offering reductions in parameters (4 6%) and computation time (8 10%). Our results suggest that the high-dimensional nature of graph data itself often encodes sufficient information for the denoising process, opening avenues for simpler, more efficient GDM architectures.

artificial intelligence, bayesian inference, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.92)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Data Science (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
(2 more...)

Add feedback

Flow-GRPO: Training Flow Matching Models via Online RL

Neural Information Processing SystemsJun-23-2026, 06:46:35 GMT

We propose Flow-GRPO, the first method to integrate online policy gradient reinforcement learning (RL) into flow matching models. Our approach uses two key strategies: (1) an ODE-to-SDE conversion that transforms a deterministic Ordinary Differential Equation (ODE) into an equivalent Stochastic Differential Equation (SDE) that matches the original model's marginal distribution at all timesteps, enabling statistical sampling for RL exploration; and (2) a Denoising Reduction strategy that reduces training denoising steps while retaining the original number of inference steps, significantly improving sampling efficiency without sacrificing performance. Empirically, Flow-GRPO is effective across multiple text-to-image tasks. For compositional generation, RL-tuned SD3.5-M generates nearly perfect object counts, spatial relations, and fine-grained attributes, increasing GenEval accuracy from 63%to 95%. In visual text rendering, accuracy improves from 59%to 92%, greatly enhancing text generation. Flow-GRPO also achieves substantial gains in human preference alignment. Notably, very little reward hacking occurred, meaning rewards did not increase at the cost of appreciable image quality or diversity degradation.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback