Goto

Collaborating Authors

 example


Appendix ATask Definitions

Neural Information Processing Systems

Table 3 outlines the and reasoning tasks included in the MMPerspective benchmark. Sample cases and representative questions are included to illustrate the task format and input style. We also show examples of perspective-invariant image operations for robustness evaluation in Figure 17, including cropping, masking, flipping, and rotation. Where is the vanishing point in this image? Critical Line Perception (CLP) 123 Figure 9 Determine which of the highlighted lines is the horizon line. Which line highlighted in the image is the Horizon Line?


2 Preliminaries Computational graphLet A be a deterministic algorithm and letFA be a set of deterministic primitiveoperations that can be used byA during execution. Given an inputx, wedefine the

Neural Information Processing Systems

We analyze the capabilities of Transformer language models in learning compositional discrete tasks. To this end, we evaluate training LLaMA models and prompting GPT-4 and Gemini on four tasks demanding to learn a composition of several discrete sub-tasks. In particular, we measure how well these models can reuse primitives observable in the sub-tasks to learn the composition task.


User-Level Differential Privacy With Few Examples Per User

Neural Information Processing Systems

STOC 2023] obtained generic algorithms that work for various learning tasks. However, their focus was on the *example-rich* regime, where the users have so many examples that each user could themselves solve the problem. In this work we consider the *example-scarce* regime, where each user has only a few examples, and obtain the following results:* For approximate-DP, we give a generic transformation of any item-level DP algorithm to a user-level DP algorithm. Roughly speaking, the latter gives a (multiplicative) savings of $O_{\varepsilon,\delta}(\sqrt{m})$ in terms of the number of users required for achieving the same utility, where $m$ is the number of examples per user. This algorithm, while recovering most known bounds for specific problems, also gives new bounds, e.g., for PAC learning.


Accountability in Offline Reinforcement Learning: Explaining Decisions with a Corpus of Examples

Neural Information Processing Systems

Learning controllers with offline data in decision-making systems is an essential area of research due to its potential to reduce the risk of applications in real-world systems. However, in responsibility-sensitive settings such as healthcare, decision accountability is of paramount importance, yet has not been adequately addressed by the literature.This paper introduces the Accountable Offline Controller (AOC) that employs the offline dataset as the Decision Corpus and performs accountable control based on a tailored selection of examples, referred to as the Corpus Subset. AOC operates effectively in low-data scenarios, can be extended to the strictly offline imitation setting, and displays qualities of both conservation and adaptability.We assess AOC's performance in both simulated and real-world healthcare scenarios, emphasizing its capability to manage offline control tasks with high levels of performance while maintaining accountability.


On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law

Neural Information Processing Systems

Out-of-distribution (OOD) testing is increasingly popular for evaluating a machine learning system's ability to generalize beyond the biases of a training set. OOD benchmarks are designed to present a different joint distribution of data and labels between training and test time. VQA-CP has become the standard OOD benchmark for visual question answering, but we discovered three troubling practices in its current use. First, most published methods rely on explicit knowledge of the construction of the OOD splits. They often rely on yes'' when the common training answer was ``no''.


Surrogate Modelling of Proton Dose with Monte Carlo Dropout Uncertainty Quantification

arXiv.org Machine Learning

Accurate proton dose calculation using Monte Carlo (MC) is computationally demanding in workflows like robust optimisation, adaptive replanning, and probabilistic inference, which require repeated evaluations. To address this, we develop a neural surrogate that integrates Monte Carlo dropout to provide fast, differentiable dose predictions along with voxelwise predictive uncertainty. The method is validated through a series of experiments, starting with a one-dimensional analytic benchmark that establishes accuracy, convergence, and variance decomposition. Two-dimensional bone-water phantoms, generated using TOPAS Geant4, demonstrate the method's behavior under domain heterogeneity and beam uncertainty, while a three-dimensional water phantom confirms scalability for volumetric dose prediction. Across these settings, we separate epistemic (model) from parametric (input) contributions, showing that epistemic variance increases under distribution shift, while parametric variance dominates at material boundaries. The approach achieves significant speedups over MC while retaining uncertainty information, making it suitable for integration into robust planning, adaptive workflows, and uncertainty-aware optimisation in proton therapy.


Can Models Learn Skill Composition from Examples?

Neural Information Processing Systems

As large language models (LLMs) become increasingly advanced, their ability to exhibit compositional generalization---the capacity to combine learned skills in novel ways not encountered during training---has garnered significant attention. This type of generalization, particularly in scenarios beyond training data, is also of great interest in the study of AI safety and alignment. A recent study introduced the Skill-Mix evaluation, where models are tasked with composing a short paragraph demonstrating the use of a specified k -tuple of language skills. While small models struggled with composing even with k 3, larger models like GPT-4 performed reasonably well with k 5 and 6 .In this paper, we employ a setup akin to Skill-Mix to evaluate the capacity of smaller models to learn compositional generalization from examples. Utilizing a diverse set of language skills---including rhetorical, literary, reasoning, theory of mind, and common sense---GPT was used to generate text samples that exhibit random subsets of k skills.



Towards Safer Operations: An Expert-involved Dataset of High-Pressure Gas Incidents for Preventing Future Failures

arXiv.org Artificial Intelligence

This paper introduces a new IncidentAI dataset for safety prevention. Different from prior corpora that usually contain a single task, our dataset comprises three tasks: named entity recognition, cause-effect extraction, and information retrieval. The dataset is annotated by domain experts who have at least six years of practical experience as high-pressure gas conservation managers. We validate the contribution of the dataset in the scenario of safety prevention. Preliminary results on the three tasks show that NLP techniques are beneficial for analyzing incident reports to prevent future failures. The dataset facilitates future research in NLP and incident management communities. The access to the dataset is also provided (the IncidentAI dataset is available at: https://github.com/Cinnamon/incident-ai-dataset).


Coding with ChatGPT (GPT-3.5 and GPT-4) --A Quick Guide

#artificialintelligence

Given the new oracle that is ChatGPT, you may often find yourself tasked with creating prompts for various applications. One of the most significant challenges in this regard is crafting prompts that effectively communicate your requirements and elicit the desired response. In this article, I will provide a comprehensive guide on how to write high-quality prompts for software development, specifically for the ChatGPT language model. Our aim is to help you improve your skills as a prompt engineer, moving beyond generic advice and offering practical tips and examples. To create effective prompts, it is essential to understand the AI language model you are working with.