Not enough data to create a plot.
Try a different view from the menu above.
The State of Data at An Assessment of Development Practices in the and Benchmarks Track
If labels are obtained from elsewhere: documentation discusses where they were obtained from, how they were reused, and how the collected annotations and labels are combined with existing ones. DATA QUALITY 10 Suitability Suitability is a measure of a dataset's Documentation discusses how the dataset Documentation discusses how quality with regards to the purpose is appropriate for the defined purpose.
The State of Data Curation at NeurIPS: An Assessment of Dataset Development Practices in the Datasets and Benchmarks Track
Data curation is a field with origins in librarianship and archives, whose scholarship and thinking on data issues go back centuries, if not millennia. The field of machine learning is increasingly observing the importance of data curation to the advancement of both applications and fundamental understanding of machine learning models - evidenced not least by the creation of the Datasets and Benchmarks track itself. This work provides an analysis of recent dataset development practices at NeurIPS through the lens of data curation. We present an evaluation framework for dataset documentation, consisting of a rubric and toolkit developed through a thorough literature review of data curation principles. We use the framework to systematically assess the strengths and weaknesses in current dataset development practices of 60 datasets published in the NeurIPS Datasets and Benchmarks track from 2021-2023.
A Bandit Regret Bound Analysis
Before diving into details, we first explain the overall idea and structure of our proof. After that, we prove that Lemma 2. The first term of (18) comes from (10), and the second term is from Cauchy inequality. The main structure of this proof is similar to proposition 3, section C in Eluder dimension's paper, and we will only point out the subtle details that makes the difference. Apart from the notations section 3, we add more symbols for the regret analysis. B.1 Main Proof sketch The overall structure is similar to bandits, the main difference here is that we need to take care of the transition dynamics.
A Broader impact
Our work proposes a novel acquisition function for Bayesian optimization. The approach is foundational and does not have direct societal or ethical consequences. However, JES will be used in the development of applications for a wide range of areas and thus indirectly contribute to their impacts on society. As an algorithm that can be used for HPO, JES intends to cut resource expenditure associated with model training, while increasing their performance. This can help reduce the environmental footprint of machine learning research.
Autoformalizing Mathematical Statements by Symbolic Equivalence and Semantic Consistency Zenan Li1 Yifan Wu2 Zhaoyu Li3 Xinming Wei 2
Autoformalization, the task of automatically translating natural language descriptions into a formal language, poses a significant challenge across various domains, especially in mathematics. Recent advancements in large language models (LLMs) have unveiled their promising capabilities to formalize even competition-level math problems. However, we observe a considerable discrepancy between pass@1 and pass@k accuracies in LLM-generated formalizations. To address this gap, we introduce a novel framework that scores and selects the best result from k autoformalization candidates based on two complementary self-consistency methods: symbolic equivalence and semantic consistency. Elaborately, symbolic equivalence identifies the logical homogeneity among autoformalization candidates using automated theorem provers, and semantic consistency evaluates the preservation of the original meaning by informalizing the candidates and computing the similarity between the embeddings of the original and informalized texts. Our extensive experiments on the MATH and miniF2F datasets demonstrate that our approach significantly enhances autoformalization accuracy, achieving up to 0.22-1.35x
NAS-Bench-x11 and the Power of Learning Curves, Colin White 2
While early research in neural architecture search (NAS) required extreme computational resources, the recent releases of tabular and surrogate benchmarks have greatly increased the speed and reproducibility of NAS research. However, two of the most popular benchmarks do not provide the full training information for each architecture. As a result, on these benchmarks it is not possible to run many types of multi-fidelity techniques, such as learning curve extrapolation, that require evaluating architectures at arbitrary epochs. In this work, we present a method using singular value decomposition and noise modeling to create surrogate benchmarks, NAS-Bench-111, NAS-Bench-311, and NAS-Bench-NLP11, that output the full training information for each architecture, rather than just the final validation accuracy. We demonstrate the power of using the full training information by introducing a learning curve extrapolation framework to modify single-fidelity algorithms, showing that it leads to improvements over popular single-fidelity algorithms which claimed to be state-of-the-art upon release.
A The Contract Bridge Game
The game of Contract Bridge is played with a standard 52-card deck (4 suits,,, and, with 13 cards in each suit) and 4 players (North, East, South, West). North-South and East-West are two competitive teams. Each player is dealt with 13 cards. There are two phases during the game, namely bidding and playing. After the game, scoring is done based on the won tricks in the playing phase and whether it matches with the contract made in the bidding phase. An example of contract bridge bidding and playing in shown in Figure 1.