Goto

Collaborating Authors

 datasheet


Supplementary Materials for MEQA: A Benchmark for Multi-hop Event-centric Question Answering with Explanations

Neural Information Processing Systems

We utilize an open and widely used data format, i.e., JSON format, for the MEQA dataset. "context": "Roadside IED kills Russian major general [...]", # The context of the question "question": "Who died before AI-monitor reported it online?", "What event contains Al-Monitor is the communicator? "What event is after #1 has a victim? "Who died in the #2? major general,local commander,lieutenant general" We present a list of Datasheets [Gebru et al., 2021] for the MEQA dataset, synthesizing many of the For what purpose was the dataset created?


Supplementary Material and Datasheet: Off to new Shores: A Dataset & Benchmark for (near-)coastal Flood Inundation Forecasting Contents

Neural Information Processing Systems

This supplementary document follows the Datasheets for Datasets template of (8) to document the Global Flood Forecasting (GFF) dataset and its creation. Further resources are provided: in the accompanying publication https://arxiv.org/abs/2409.18591 in the GitHub repository https://github.com/Multihuntr/GFF


1 Datasheet for QM1B

Neural Information Processing Systems

As recommended by the NeurIPS dataset and benchmark track, we documented QM1B and intended uses through the Datasheets for Datasets framework [1]. The goal of dataset datasheets as outlined by [1] is to provide a standardized process for documentating datasets. The authors of [1] present a list of carefully selected questions which dataset authors should answer. We hope our answers to these questions will facilitate better communication between us (the dataset creators) and future users of QM1B. For what purpose was the dataset created? Prior gaussian-based Density Functional Theory (DFT) datasets contained fewer than 20 million training examples.


NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating Large Language Models in Offensive Security Motivation

Neural Information Processing Systems

For what purpose was the dataset created? Was there a specific task in mind? Was there a specific gap that needed to be filled? The dataset was created to evaluate the effectiveness of large language models (LLMs) in solving Capture the Flag (CTF) challenges within the domain of offensive security. There was a specific need to thoroughly assess the capabilities of LLMs in this context, as their potential for handling such tasks had not been systematically evaluated. The goal was to develop a scalable, open-source benchmark database specifically designed for these applications. This dataset includes diverse CTF challenges from popular competitions, with metadata to support LLM testing and adaptive learning. The dataset addresses a critical gap by providing a comprehensive resource for the systematic evaluation of LLMs' performance in real-world cybersecurity tasks. The development of this dataset and the accompanying automated framework allows for the continuous improvement and refinement of LLM-based approaches to vulnerability detection and resolution. By making the dataset open-source, the project aims to foster further research and development in this area, providing an ideal platform for developing, testing, and refining LLM-based approaches to cybersecurity challenges. Who created the dataset (e.g., which team, research group) and on behalf of which entity (e.g., company, institution, organization)? The students listed above compiled and validated these challenges from all previous global CSAW competitions by manually checking their setup and ensuring they remain solvable despite software changes. This work was conducted in collaboration with the OSIRIS Lab and the Center for Cybersecurity at NYU, which organize CSAW and attract global participation[1].



Datasheet - SCAMPS

Neural Information Processing Systems

Datasheet for SCAMPS Dataset Synthetics for Camera Measurement of Physiological SignalsMotivationFor what purpose was the dataset created? Was there a specificgap that needed to be filled? CompositionWhat do the instances that comprise the datasetrepresent (e.g., documents, photos, people,countries)? If the dataset is asample, then what is the larger set? However, we created a broadrange of physiological parameters and appearancecharacteristics, which is one of the advantages ofcreating data from a simulation.


Supplemental Materials

Neural Information Processing Systems

We bear all responsibility in case of violation of rights, etc., and confirmation of the data license. This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International This license permits sharing and adapting the work provided it is not used for commercial purposes and appropriate credit is given. Please refer to Section 3 for our hosting plan. In this section, we use the framework of Datasheets for Datasets [? ] to form a datasheet for CRAG, For what purpose was the dataset created? Was there a specific task in mind?


Evaluating LLM-based Workflows for Switched-Mode Power Supply Design

Nau, Simon, Krummenauer, Jan, Zimmermann, André

arXiv.org Artificial Intelligence

Large language models (LLMs) have great potential to enhance productivity in many disciplines, such as software engineering. However, it is unclear to what extent they can assist in the design process of electronic circuits. This paper focuses on the application of LLMs to switched-mode power supply (SMPS) design for printed circuit boards (PCBs). We present multiple LLM-based workflows that combine reasoning, retrieval-augmented generation (RAG), and a custom toolkit that enables the LLM to interact with SPICE simulations to estimate the impact of circuit modifications. Two benchmark experiments are presented to analyze the performance of LLM-based assistants for different design tasks, including parameter tuning, topology adaption and optimization of SMPS circuits. Experiment results show that SPICE simulation feedback and current LLM advancements, such as reasoning, significantly increase the solve rate on 269 manually created benchmark tasks from 15% to 91%. Furthermore, our analysis reveals that most parameter tuning design tasks can be solved, while limits remain for certain topology adaption tasks. Our experiments offer insights for improving current concepts, for example by adapting text-based circuit representations


Supplementary Materials for MEQA: A Benchmark for Multi-hop Event-centric Question Answering with Explanations

Neural Information Processing Systems

We utilize an open and widely used data format, i.e., JSON format, for the MEQA dataset. "context": "Roadside IED kills Russian major general [...]", # The context of the question "question": "Who died before AI-monitor reported it online?", "What event contains Al-Monitor is the communicator? "What event is after #1 has a victim? "Who died in the #2? major general,local commander,lieutenant general" We present a list of Datasheets [Gebru et al., 2021] for the MEQA dataset, synthesizing many of the For what purpose was the dataset created?


Supplementary Material and Datasheet: Off to new Shores: A Dataset & Benchmark for (near-)coastal Flood Inundation Forecasting Contents

Neural Information Processing Systems

This supplementary document follows the Datasheets for Datasets template of (8) to document the Global Flood Forecasting (GFF) dataset and its creation. Further resources are provided: in the accompanying publication https://arxiv.org/abs/2409.18591 in the GitHub repository https://github.com/Multihuntr/GFF