Goto

Collaborating Authors

 curriculum


Ditch the Denoiser: Emergence of Noise Robustness in Self-Supervised Learning from Data Curriculum

Neural Information Processing Systems

Self-Supervised Learning (SSL) has become a powerful solution to extract rich representations from unlabeled data. Yet, SSL research is mostly focused on clean, curated and high-quality datasets. As a result, applying SSL on noisy data remains a challenge, despite being crucial to applications such as astrophysics, medical imaging, geophysics or finance. In this work, we present a fully selfsupervised framework that enables noise-robust representation learning without requiring a denoiser at inference or downstream fine-tuning. Our method first trains an SSL denoiser on noisy data, then uses it to construct a denoised-tonoisy data curriculum (i.e., training first on denoised, then noisy samples) for pretraining a SSL backbone (e.g., DINOv2), combined with a teacher-guided regularization that anchors noisy embeddings to their denoised counterparts. This process encourages the model to internalize noise robustness. Notably, the denoiser can be discarded after pretraining, simplifying deployment. On ImageNet-1k with ViT-B under extreme Gaussian noise (ฯƒ = 255, SNR = 0.72 dB), our method improves linear probing accuracy by 4.8% over DINOv2, demonstrating that denoiser-free robustness can emerge from noise-aware pretraining.


Imagined Autocurricula

Neural Information Processing Systems

Training agents to act in embodied environments typically requires vast training data or access to accurate simulation, neither of which exists for many cases in the real world. Instead, world models are emerging as an alternative-leveraging offline, passively collected data, they make it possible to generate diverse worlds for training agents in simulation. In this work, we harness world models to generate "imagined" environments to train robust agents capable of generalizing to novel task variations. One of the challenges in doing this is ensuring the agent trains on useful generated data. We thus propose a novel approach IMAC (Imagined Autocurricula) leveraging Unsupervised Environment Design (UED), which induces an automatic curriculum over generated worlds. In a series of challenging, procedurally generated environments, we show it is possible to achieve strong transfer performance on held-out environments having trained only inside a world model learned from a narrower dataset. We believe this opens the path to utilizing larger-scale, foundation world models for generally capable agents.


Systematic Reward Gap Optimization for Mitigating VLMHallucinations

Neural Information Processing Systems

A core difficulty lies in precisely characterizing and strategically manipulating the overall reward gap configuration, that is, the deliberate design of how to shape these reward gaps within each preference pair across the data. To address this, we introduce Topic-level Preference Rewriting (TPR), a novel framework designed for the systematic optimization of reward gap configuration. Through selectively replacing semantic topics within VLM responses with model's own resampled candidates for targeted rewriting, TPR can provide topic-level control over fine-grained semantic details. This precise control enables advanced data curation strategies, such as progressively adjusting the difficulty of rejected responses, thereby sculpting an effective reward gap configuration that guides the model to overcome challenging hallucinations. Comprehensive experiments demonstrate TPR achieves state-of-the-art performance on multiple hallucination benchmarks, outperforming previous methods by an average of 20%. Notably, it significantly reduces hallucinations by up to 93% on ObjectHal-Bench, and also exhibits superior data efficiency towards robust and cost-effective VLM alignment.


Aligning by Misaligning: Boundary-aware Curriculum Learning for Multimodal Alignment

Neural Information Processing Systems

Most multimodal models treat every negative pair alike, ignoring the ambiguous negatives that differ from the positive by only a small detail. We propose BoundaryA ware Curriculum with Local Attention(BACL), a lightweight add-on that turns these borderline cases into a curriculum signal. ABoundary-aware Negative Sampler gradually raises difficulty, while a Contrastive Local Attention loss highlights where the mismatch occurs. The two modules are fully differentiable and work with any off-the-shelf dual encoder. Theory predicts a fast O(1/n) error rate; practice shows up to +32 % R@1 over CLIP and new SOTA on four large-scale benchmarks, all without extra labels.


Learning from Demonstrations via Capability-Aware Goal Sampling

Neural Information Processing Systems

Despite its promise, imitation learning often fails in long-horizon environments where perfect replication of demonstrations is unrealistic and small errors can accumulate catastrophically. We introduce Cago (Capability-Aware Goal Sampling), a novel learning-from-demonstrations method that mitigates the brittle dependence on expert trajectories for direct imitation. Unlike prior methods that rely on demonstrations only for policy initialization or reward shaping, Cago dynamically tracks the agent's competence along expert trajectories and uses this signal to select intermediate steps--goals that are just beyond the agent's current reach--to guide learning. This results in an adaptive curriculum that enables steady progress toward solving the full task. Empirical results demonstrate that Cago significantly improves sample efficiency and final performance across a range of sparse-reward, goal-conditioned tasks, consistently outperforming existing learning from-demonstrations baselines.


The Right to Red-Team: Adversarial AILiteracy as a Civic Imperative in K-12 Education

Neural Information Processing Systems

The increasing societal integration of Large Language Models (LLMs) and agentbased AI demands a new civic competency: adversarial reasoning. This position paper argues that K-12 AI education must move beyond passive literacy to actively equip students with skills in responsible adversarial prompting and ethical system "hacking." Such capabilities are essential for citizens to critically probe AI systems, understand their inherent limitations, identify manipulative patterns, and hold them accountable. We posit that cultivating a generation skilled in "red-teaming" AI is vital for maintaining transparency, preventing undue influence, and fostering a democratic engagement with these transformative technologies.


Heterogeneous Adversarial Play in Interactive Environments

Neural Information Processing Systems

Self-play constitutes a fundamental paradigm for autonomous skill acquisition, whereby agents iteratively enhance their capabilities through self-directed environmental exploration (Silver et al., 2018). Conventional self-play frameworks exploit agent symmetry within zero-sum competitive settings (Balduzzi et al., 2019), yet this approach proves inadequate for open-ended learning scenarios characterized by inherent asymmetry. Human pedagogical systems exemplify asymmetric instructional frameworks wherein educators systematically construct challenges calibrated to individual learners' developmental trajectories (Bobbitt, 1918; Bengio et al., 2009). The principal challenge resides in operationalizing these asymmetric, adaptive pedagogical mechanisms within artificial systems capable of autonomously synthesizing appropriate curricula without predetermined task hierarchies. Here we present Heterogeneous Adversarial Play (HAP), an adversarial Automatic Curriculum Learning (ACL) framework that formalizes teacher-student interactions as a minimax optimization wherein task-generating instructor and problem-solving learner co-evolve through adversarial dynamics. In contrast to prevailing ACL methodologies that employ static curricula or unidirectional task selection mechanisms, HAP establishes a bidirectional feedback system wherein instructors continuously recalibrate task complexity in response to real-time learner performance metrics. Experimental validation across multi-task learning domains demonstrates that our framework achieves performance parity with state-of-the-art (SOTA) baselines while generating curricula that enhance learning efficacy in both artificial agents and human subjects.


Sample Efficient Multi Round Generative Data Augmentation for Long Tail Instance Segmentation

Neural Information Processing Systems

Data synthesis has become increasingly crucial for long-tail instance segmentation tasks to mitigate class imbalance and high annotation costs. Previous methods have primarily prioritized the selection of data from a pre-generated image object pool, which frequently leads to the inefficient utilization of generated data. To address this inefficiency, we propose a collaborative approach that incorporates feedback from an instance segmentation model to guide the augmentation process. Specifically, the diffusion model uses feedback to generate objects that exhibit high uncertainty. The number and size of synthesized objects for each class are dynamically adjusted based on the model state to improve learning in underrepresented classes. This augmentation process is further strengthened by running multiple rounds, allowing feedback to be refined throughout training. In summary, multi-round collaborative augmentation (MRCA) enhances sample efficiency by providing optimal synthetic data at the right moment. Our framework requires only 6% of the data generation needed by state-of-the-art methods while outperforming them.


MyoChallenge 2024: ANew Benchmark for Physiological Dexterity and Agility in Bionic Humans

Neural Information Processing Systems

Recent advancements in bionic prosthetic technology offer transformative opportunities to restore mobility and functionality for individuals with missing limbs. Users of bionic limbs, or bionic humans, learn to seamlessly integrate prosthetic extensions into their motor repertoire, regaining critical motor abilities. The remarkable movement generalization and environmental adaptability demonstrated by these individuals highlight motor intelligence capabilities unmatched by current artificial intelligence systems. Addressing these limitations, MyoChallenge'24 at NeurIPS 2024 established a benchmark for human-robot coordination with an emphasis on joint control of both biological and mechanical limbs. The competition featured two distinct tracks: a manipulation task utilizing the myoMPL model, integrating a virtual biological arm and the Modular Prosthetic Limb (MPL) for a passover task; and a locomotion task using the novel myoOSL model, combining a bilateral virtual biological leg with a trans-femoral amputation and the Open Source Leg (OSL) to navigate varied terrains. Marking the third iteration of the MyoChallenge, the event attracted over 50 teams with more than 290 submissions all around the globe, with diverse participants ranging from independent researchers to high school students. The competition facilitated the development of several state-of-the-art control algorithms for bionic musculoskeletal systems, leveraging techniques such as imitation learning, muscle synergy, and model-based reinforcement learning that significantly surpassed our proposed baseline performance by a factor of 10. By providing the open-source simulation framework of MyoSuite, standardized tasks, and physiologically realistic models, MyoChallenge serves as a reproducible testbed and benchmark for bridging ML and biomechanics.


REASONINGGYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Neural Information Processing Systems

This comple procedural xity, generation unlike most approach previous allo reasoning ws for continuous datasets, which evaluation are typically across >o varying difficulty levels. Our experimental results demonstrate the efficacy of RG in both eFigletvaluatingfonandts reinforcement learning of reasoning models. Question: What word does this say?