Goto

Collaborating Authors

 checklist


Checklists Are Better Than Reward Models For Aligning Language Models

Neural Information Processing Systems

Language models must be adapted to understand and follow user instructions. Reinforcement learning is widely used to facilitate this - typically using fixed criteria such as "helpfulness" and "harmfulness". In our work, we instead propose using flexible, instruction-specific criteria as a means of broadening the impact that reinforcement learning can have in eliciting instruction following. We propose "Reinforcement Learning from Checklist Feedback" (RLCF). From instructions, we extract checklists and evaluate how well responses satisfy each item--using both AI judges and specialized verifier programs--then combine these scores to compute rewards for RL. We compare RLCF with other alignment methods on top of a strong instruction following model (Qwen2.5-7B-Instruct)


60ea0211b38a3ccd7a241f523dc7cf63-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing Systems

Below we describe a few other prevalent multi-label datasets and explain how the ML48S differs800 from them, hence they were excluded from comparison in this paper.801 PASCALVOC [11] was created for object detection and classification, covering 20 basic-level802 classes across 4,574 images, with most images containing a single prominent object. This dataset is803 much smaller than ML48S and also contains much fewer classes which are all coarse-grained.804 VG500 is a modification of the Visual Genome dataset [19], a dataset focused on dense annotations805 linking images to respective captions. This dataset is not intended to be bounded by categories806 but has open-vocabulary annotations.


WEB-SHEPHERD: Advancing PRMs for Reinforcing Web Agents

Neural Information Processing Systems

Web navigation is a unique domain that can automate many repetitive real-life tasks beyond and typical is challenging multimodal as lar it ge requires language long-horizon model (MLLM) sequential tasks.


Doctor Approved: Generating Medically Accurate Skin Disease Images through AI-Expert Feedback

Neural Information Processing Systems

Paucity of medical data severely limits the generalizability of diagnostic ML models, as the full spectrum of disease variability can not be represented by a small clinical dataset. To address this, diffusion models (DMs) have been considered as a promising avenue for synthetic image generation and augmentation. However, they frequently produce medically inaccurate images, deteriorating the model performance. Expert domain knowledge is critical for synthesizing images that correctly encode clinical information, especially when data is scarce and quality outweighs quantity. Existing approaches for incorporating human feedback, such as reinforcement learning (RL) and Direct Preference Optimization (DPO), rely on robust reward functions or demand labor-intensive expert evaluations. Recent progress in Multimodal Large Language Models (MLLMs) reveals their strong visual reasoning capabilities, making them adept candidates as evaluators. In this work, we propose a novel framework, coined MAGIC (Medically Accurate Generation of Images through AI-Expert Collaboration), that synthesizes clinically accurate skin disease images for data augmentation.


SatBird: Bird Species Distribution Modeling with Remote Sensing and Citizen Science Data

Neural Information Processing Systems

Biodiversity is declining at an unprecedented rate, impacting ecosystem services necessary to ensure food, water, and human health and well-being. Understanding the distribution of species and their habitats is crucial for conservation policy planning. However, traditional methods in ecology for species distribution models (SDMs) generally focus either on narrow sets of species or narrow geographical areas and there remain significant knowledge gaps about the distribution of species. A major reason for this is the limited availability of data traditionally used, due to the prohibitive amount of effort and expertise required for traditional field monitoring. The wide availability of remote sensing data and the growing adoption of citizen science tools to collect species observations data at low cost offer an opportunity for improving biodiversity monitoring and enabling the modelling of complex ecosystems. We introduce a novel task for mapping bird species to their habitats by predicting species encounter rates from satellite images, and present SatBird1, a satellite dataset of locations in the USA with labels derived from presence-absence observation data from the citizen science database eBird, considering summer (breeding) and winter seasons. We also provide a dataset in Kenya representing low-data regimes. We additionally provide environmental data and species range maps for each location.


Checklist

Neural Information Processing Systems

For all authors... (a) Do the main claims made in the abstract and introduction accurately reflect the paper's contributions and scope? While MARL algorithms may be implemented for potentially harmful applications, we do not believe this work uniquely enables such applications. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] In the supplemental material (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? If you used crowdsourcing or conducted research with human subjects... (a) Did you include the full text of instructions given to participants and screenshots, if applicable? [N/A] (b) Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable? [N/A] (c) Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation? Our allocation proposal network and Q network are illustrated in Figures 7 and 8. Low-level action utility functions and mixing networks are similar to those described in Iqbal et al. [10] with the only 13 difference being a replacement of the RNN layers with standard fully connected layers.





Checklist

Neural Information Processing Systems

When drawing 1,000 z from the priorp(z)of the latent space learned by PLAS, only4%of the samples are decoded as high-return actions, while inLAPO,45%ofthedecoded actions arehigh-return actions.