Not enough data to create a plot.
Try a different view from the menu above.
A Broader impact
Our work proposes a novel acquisition function for Bayesian optimization. The approach is foundational and does not have direct societal or ethical consequences. However, JES will be used in the development of applications for a wide range of areas and thus indirectly contribute to their impacts on society. As an algorithm that can be used for HPO, JES intends to cut resource expenditure associated with model training, while increasing their performance. This can help reduce the environmental footprint of machine learning research.
Autoformalizing Mathematical Statements by Symbolic Equivalence and Semantic Consistency Zenan Li1 Yifan Wu2 Zhaoyu Li3 Xinming Wei 2
Autoformalization, the task of automatically translating natural language descriptions into a formal language, poses a significant challenge across various domains, especially in mathematics. Recent advancements in large language models (LLMs) have unveiled their promising capabilities to formalize even competition-level math problems. However, we observe a considerable discrepancy between pass@1 and pass@k accuracies in LLM-generated formalizations. To address this gap, we introduce a novel framework that scores and selects the best result from k autoformalization candidates based on two complementary self-consistency methods: symbolic equivalence and semantic consistency. Elaborately, symbolic equivalence identifies the logical homogeneity among autoformalization candidates using automated theorem provers, and semantic consistency evaluates the preservation of the original meaning by informalizing the candidates and computing the similarity between the embeddings of the original and informalized texts. Our extensive experiments on the MATH and miniF2F datasets demonstrate that our approach significantly enhances autoformalization accuracy, achieving up to 0.22-1.35x
NAS-Bench-x11 and the Power of Learning Curves, Colin White 2
While early research in neural architecture search (NAS) required extreme computational resources, the recent releases of tabular and surrogate benchmarks have greatly increased the speed and reproducibility of NAS research. However, two of the most popular benchmarks do not provide the full training information for each architecture. As a result, on these benchmarks it is not possible to run many types of multi-fidelity techniques, such as learning curve extrapolation, that require evaluating architectures at arbitrary epochs. In this work, we present a method using singular value decomposition and noise modeling to create surrogate benchmarks, NAS-Bench-111, NAS-Bench-311, and NAS-Bench-NLP11, that output the full training information for each architecture, rather than just the final validation accuracy. We demonstrate the power of using the full training information by introducing a learning curve extrapolation framework to modify single-fidelity algorithms, showing that it leads to improvements over popular single-fidelity algorithms which claimed to be state-of-the-art upon release.
A The Contract Bridge Game
The game of Contract Bridge is played with a standard 52-card deck (4 suits,,, and, with 13 cards in each suit) and 4 players (North, East, South, West). North-South and East-West are two competitive teams. Each player is dealt with 13 cards. There are two phases during the game, namely bidding and playing. After the game, scoring is done based on the won tricks in the playing phase and whether it matches with the contract made in the bidding phase. An example of contract bridge bidding and playing in shown in Figure 1.
e64f346817ce0c93d7166546ac8ce683-AuthorFeedback.pdf
We thank reviewers (R1,R2,R3,R5) for their insightful comments. We thank R5 for pointing out that the "decomposition challenges" in IIG are critical for equilibrium construction where Therefore, our paper could have stronger implications than we expect. We disagree with R2 that the tabular form of JPS indeed has theoretical guarantees, as appreciated by other reviewers. Full game AI is a future work. R5 makes a great point that similarity exists between our policy-change density (Eqn.
Estimating Training Data Influence by Tracing Gradient Descent
We introduce a method called TracIn that computes the influence of a training example on a prediction made by the model. The idea is to trace how the loss on the test point changes during the training process whenever the training example of interest was utilized. We provide a scalable implementation of TracIn via: (a) a first-order gradient approximation to the exact computation, (b) saved checkpoints of standard training procedures, and (c) cherry-picking layers of a deep neural network. In contrast with previously proposed methods, TracIn is simple to implement; all it needs is the ability to work with gradients, checkpoints, and loss functions.