AITopics | Genre

Collaborating Authors

Genre

Interactive and Hybrid Imitation Learning: Provably Beating Behavior Cloning

Neural Information Processing SystemsJun-16-2026, 21:14:09 GMT

Imitation learning (IL) is a paradigm for learning sequential decision-making policies from experts, leveraging offline demonstrations, interactive annotations, or both. Recent advances show that when annotation cost is tallied per trajectory, Behavior Cloning (BC)--which relies solely on offline demonstrations--cannot be improved in general, leaving limited conditions for interactive methods such as DAgger to help. We revisit this conclusion and prove that when the annotation cost is measured per state, algorithms using interactive annotations can provably outperform BC. Specifically: (1) we show that STAGGER, a one-sample-per-round variant of DAgger, provably beats BC under low-recovery-cost settings; (2) we initiate the study of hybrid IL where the agent learns from offline demonstrations and interactive annotations. We propose WARM-STAGGER whose learning guarantee is not much worse than using either data source alone.

annotation, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre:

Research Report > Experimental Study (1.00)
Workflow (0.93)
Research Report > New Finding (0.67)

Industry:

Education (0.46)
Information Technology (0.46)
Transportation > Ground > Road (0.46)
Automobiles & Trucks (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

Add feedback

Adversarial Paraphrasing: AUniversal Attack for Humanizing AI-Generated Text

Neural Information Processing SystemsJun-16-2026, 21:12:49 GMT

The increasing capabilities of Large Language Models (LLMs) have raised concerns about their misuse in AI-generated plagiarism and social engineering. While various AI-generated text detectors have been proposed to mitigate these risks, many remain vulnerable to simple evasion techniques such as paraphrasing. However, recent detectors have shown greater robustness against such basic attacks. In this work, we introduce Adversarial Paraphrasing, a training-free attack framework that universally humanizes any AI-generated text to evade detection more effectively. Our approach leverages an off-the-shelf instruction-following LLM to paraphrase AI-generated content under the guidance of an AI text detector, producing adversarial examples that are specifically optimized to bypass detection. Extensive experiments show that our attack is both broadly effective and highly transferable across several detection systems. For instance, compared to simple paraphrasing attack--which, ironically, increases the true positive at 1% false positive (T@1%F) by 8.57% on RADAR and 15.03% on Fast-DetectGPT--adversarial paraphrasing, guided by OpenAI-RoBERTa-Large, reduces T@1%F by 64.49% on RADAR and a striking 98.96% on Fast-DetectGPT. Across a diverse set of detectors--including neural network-based, watermark-based, and zero-shot approaches--our attack achieves an average T@1%F reduction of 87.88% under the guidance of OpenAI-RoBERTa-Large. We also analyze the tradeoff between text quality and attack success to find that our method can significantly reduce detection rates, with mostly a slight degradation in text quality. Our adversarial setup highlights the need for more robust and resilient detection strategies in the light of increasingly sophisticated evasion techniques.

detector, large language model, machine learning, (20 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom (0.68)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Consumer Health (1.00)
Government > Regional Government (1.00)
(11 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.45)

Add feedback

Rao-Blackwell Gradient Estimators for Equivariant Denoising Diffusion

Neural Information Processing SystemsJun-16-2026, 21:11:31 GMT

In domains such as molecular and protein generation, physical systems exhibit inherent symmetries that are critical to model. Two main strategies have emerged for learning invariant distributions: designing equivariant network architectures and using data augmentation to approximate equivariance. While equivariant architectures preserve symmetry by design, they often involve greater complexity and pose optimization challenges. Data augmentation, on the other hand, offers flexibility but may fall short in fully capturing symmetries. Our framework enhances both approaches by reducing training variance and providing a provably lower-variance gradient estimator.

machine learning, natural language, qgt, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.46)
Europe > Germany (0.28)
Europe > Austria (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Realms for Integrated Agent Intelligence

Neural Information Processing SystemsJun-16-2026, 21:03:47 GMT

AI agents today are mostly siloed -- they either retrieve and reason over vast amount of digital information and knowledge obtained online; or interact with the physical world through embodied perception, planning and action -- but rarely both. This separation limits their ability to solve tasks that require integrated physical and digital intelligence, such as cooking from online recipes, navigating with dynamic map data, or interpreting real-world landmarks using web knowledge. We introduce EMBODIEDWEBAGENTS, a novel paradigm for AI agents that fluidly bridge embodiment and web-scale reasoning.

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country:

Asia (0.93)
North America > United States > California (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Retail (0.67)
Information Technology (0.67)
Consumer Products & Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Show-o2: Improved Native Unified Multimodal Models

Neural Information Processing SystemsJun-16-2026, 21:03:28 GMT

This paper presents improved native unified multimodal models, i.e., Show-o2, that leverage autoregressive modeling and flow matching. Built upon a 3D causal variational autoencoder space, unified visual representations are constructed through a dual-path of spatial (-temporal) fusion, enabling scalability across image and video modalities while ensuring effective multimodal understanding and generation. Based on a language model, autoregressive modeling and flow matching are natively applied to the language head and flow head, respectively, to facilitate text token prediction and image/video generation. A two-stage training recipe is designed to effectively learn and scale to larger models. The resulting Show-o2 models demonstrate versatility in handling a wide range of multimodal understanding and generation tasks across diverse modalities, including text, images, and videos. Code and models are released at https://github.com/showlab/Show-o.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Driven Adaptive Video with Prior Task Awareness

Neural Information Processing SystemsJun-16-2026, 21:02:13 GMT

Recent advances in visual tokenizers have demonstrated their effectiveness for multimodal large language models and autoregressive generative models. However, most existing visual tokenizers rely on a fixed downsampling rate at a given visual resolution, and consequently produce a constant number of visual tokens, ignoring the fact that visual information of varying complexity warrant different token budgets. Motivated by this observation, we propose an adaptive video tokenizer "VaporTok" with two core contributions: Probabilistic Taildrop: We introduce a novel taildrop mechanism that learns a truncation index sampling distribution conditioned on visual complexity of the video.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

On-Policy Optimization with Group Equivalent Preference for Multi-Programming Language Understanding

Neural Information Processing SystemsJun-16-2026, 21:02:05 GMT

Large language models (LLMs) achieve remarkable performance in code generation tasks. However, a significant performance disparity persists between popular programming languages (e.g., Python, C++) and others. To address this capability gap, we leverage the code translation task to train LLMs, thereby facilitating the transfer of coding proficiency across diverse programming languages. Moreover, we introduce OORL for training, a novel reinforcement learning (RL) framework that integrates on-policy and off-policy strategies. Within OORL, on-policy RL is applied during code translation, guided by a rule-based reward signal derived from unit tests. Complementing this coarse-grained rule-based reward, we propose Group Equivalent Preference Optimization (GEPO), a novel preference optimization method.

large language model, natural language, programming language, (17 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

MODEM: AMorton-Order Degradation Estimation Mechanism for Adverse Weather Image Recovery

Neural Information Processing SystemsJun-16-2026, 21:01:52 GMT

Restoring images degraded by adverse weather remains a significant challenge due to the highly non-uniform and spatially heterogeneous nature of weather-induced artifacts, e.g., fine-grained rain streaks versus widespread haze. Accurately estimating the underlying degradation can intuitively provide restoration models with more targeted and effective guidance, enabling adaptive processing strategies. To this end, we propose a Morton-Order Degradation Estimation Mechanism (MODEM) for adverse weather image restoration. Central to MODEM is the Morton-Order 2D-Selective-Scan Module (MOS2D), which integrates Morton-coded spatial ordering with selective state-space models to capture long-range dependencies while preserving local structural coherence. Complementing MOS2D, we introduce a Dual Degradation Estimation Module (DDEM) that disentangles and estimates both global and local degradation priors.

artificial intelligence, machine learning, restoration, (14 more...)

Neural Information Processing Systems

Country:

Asia > China (0.28)
North America > United States (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Communications > Networks (0.94)
(2 more...)

Add feedback

Learning to Instruct for Visual Instruction Tuning 64.50 34.63

Neural Information Processing SystemsJun-16-2026, 20:57:53 GMT

We propose L2T, an advancement of visual instruction tuning (VIT).

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

Asia (0.93)
North America > United States (0.46)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)
Overview (0.67)

Industry:

Health & Medicine (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)

Add feedback

Adaptive Classifier-Free Guidance via Dynamic Low-Confidence Masking

Neural Information Processing SystemsJun-16-2026, 20:57:37 GMT

Classifier-Free Guidance (CFG) significantly enhances controllability in generative models by interpolating conditional and unconditional predictions. However, standard CFG often employs a static unconditional input, which can be suboptimal for iterative generation processes where model uncertainty varies dynamically. We introduce Adaptive Classifier-Free Guidance (A-CFG), a novel method that tailors the unconditional input by leveraging the model's instantaneous predictive confidence. At each step of an iterative (masked) diffusion language model, A-CFG identifies tokens in the currently generated sequence for which the model exhibits low confidence. These tokens are temporarily re-masked to create a dynamic, localized unconditional input.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
(2 more...)

Add feedback