Information Technology
Edge-exchangeable graphs and sparsity
Diana Cai, Trevor Campbell, Tamara Broderick
Many popular network models rely on the assumption of (vertex) exchangeability, in which the distribution of the graph is invariant to relabelings of the vertices. However, the Aldous-Hoover theorem guarantees that these graphs are dense or empty with probability one, whereas many real-world graphs are sparse. We present an alternative notion of exchangeability for random graphs, which we call edge exchangeability, in which the distribution of a graph sequence is invariant to the order of the edges. We demonstrate that edge-exchangeable models, unlike models that are traditionally vertex exchangeable, can exhibit sparsity. To do so, we outline a general framework for graph generative models; by contrast to the pioneering work of Caron and Fox [12], models within our framework are stationary across steps of the graph sequence. In particular, our model grows the graph by instantiating more latent atoms of a single random measure as the dataset size increases, rather than adding new atoms to the measure.
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
We introduce Buffer of Thoughts (BoT), a novel and versatile thought-augmented reasoning approach for enhancing accuracy, efficiency and robustness of large language models (LLMs). Specifically, we propose meta-buffer to store a series of informative high-level thoughts, namely thought-template, distilled from the problem-solving processes across various tasks. Then for each problem, we retrieve a relevant thought-template and adaptively instantiate it with specific reasoning structures to conduct efficient reasoning. To guarantee the scalability and stability, we further propose buffer-manager to dynamically update the meta-buffer, thus enhancing the capacity of meta-buffer as more tasks are solved. We conduct extensive experiments on 10 challenging reasoning-intensive tasks, and achieve significant performance improvements over previous SOTA methods: 11% on Game of 24, 20% on Geometric Shapes and 51% on Checkmate-in-One. Further analysis demonstrate the superior generalization ability and model robustness of our BoT, while requiring only 12% of the cost of multi-query prompting methods (e.g., tree/graph of thoughts) on average. Notably, we find that our Llama3-8B + BoT has the potential to surpass Llama3-70B model.
Bridging The Gap between Low-rank and Orthogonal Adaptation via Householder Reflection Adaptation Shen Yuan 1
While following different technical routes, both low-rank and orthogonal adaptation techniques can efficiently adapt large-scale pre-training models in specific tasks or domains based on a small piece of trainable parameters. In this study, we bridge the gap between these two techniques, proposing a simple but effective adaptation method based on Householder reflections. Given a pre-trained model, our method fine-tunes its layers by multiplying each frozen weight matrix with an orthogonal matrix constructed by a chain of learnable Householder reflections (HRs). This HR-based orthogonal fine-tuning is equivalent to an adaptive low-rank adaptation. Moreover, we show that the orthogonality of the reflection planes corresponding to the HRs impacts the model capacity and regularity. The analysis motivates us to regularize the orthogonality of the HRs, leading to different implementations of the proposed Householder reflection adaptation (HRA) method. Compared with state-of-the-art methods, HRA achieves superior performance with fewer learnable parameters when adapting large language models and conditional image generators. The code of the experiments is available at https://github.com/ DaShenZi721/HRA, and the method has been merged into the PEFT package.
Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning
Harsh Gupta, R. Srikant, Lei Ying
We study two time-scale linear stochastic approximation algorithms, which can be used to model well-known reinforcement learning algorithms such as GTD, GTD2, and TDC. We present finite-time performance bounds for the case where the learning rate is fixed. The key idea in obtaining these bounds is to use a Lyapunov function motivated by singular perturbation theory for linear differential equations. We use the bound to design an adaptive learning rate scheme which significantly improves the convergence rate over the known optimal polynomial decay rule in our experiments, and can be used to potentially improve the performance of any other schedule where the learning rate is changed at pre-determined time instants.
e354fd90b2d5c777bfec87a352a18976-AuthorFeedback.pdf
We thank all the reviewers for their encouraging comments. In both these cases, τ is effectively zero. Liu et al. shows how GTD-class algorithms can be formally derived using a primal-dual saddle point Sutton et al. presents a (single time-scale) variant of linear TD learning, which they call emphatic TD and show that They also provide an asymptotic convergence analysis to the set of local optima. If the paper is accepted, we will work further on improving the clarity of the work.
Algorithms and matching lower bounds for approximately-convex optimization
In recent years, a rapidly increasing number of applications in practice requires optimizing non-convex objectives, like training neural networks, learning graphical models, maximum likelihood estimation. Though simple heuristics such as gradient descent with very few modifications tend to work well, theoretical understanding is very weak. We consider possibly the most natural class of non-convex functions where one could hope to obtain provable guarantees: functions that are "approximately convex", i.e. functions f: R
< 0.01, and 75th-percentiles of the total number of gradient descent steps used (across all networks G
Shown are percentages of "successful" solutions ˆx We thank the reviewers for carefully reading our paper and providing insightful and constructive comments. We will update Table 1 of the original manuscript to display this new comparison. S(x, θ, τ) is just the set of neurons that are close to zero before ReLU thresholding. These are the neurons for which the signs could change after a small change of the network input x. This case is not covered by Theorem 3.1, because y is Please see our comment starting on line 157.
Exclusively Penalized Q-learning for Offline Reinforcement Learning Yonghyeon Jo Jungmo Kim Sanghyeon Lee Seungyul Han
Constraint-based offline reinforcement learning (RL) involves policy constraints or imposing penalties on the value function to mitigate overestimation errors caused by distributional shift. This paper focuses on a limitation in existing offline RL methods with penalized value function, indicating the potential for underestimation bias due to unnecessary bias introduced in the value function. To address this concern, we propose Exclusively Penalized Q-learning (EPQ), which reduces estimation bias in the value function by selectively penalizing states that are prone to inducing estimation errors. Numerical results show that our method significantly reduces underestimation bias and improves performance in various offline control tasks compared to other offline RL methods.
A Supplementary Material
A.1 Data Cards Table 2 shows the extracted documentation parameters from Kaggle and HuggingFace, which we categorized according to Datasheets [40]. On HuggingFace, we find information about the annotation creators (e.g., crowdsource, experts, ml-generated) or specific task categories (e.g., image-classification, image-to-text, text-to-image). Such parameters can be used to filter results when searching on HuggingFace, potentially enabling systematic analysis of a specific task or tag. On Kaggle, we notice that some important parameters shown in the dataset website such as temporal and geospatial coverage, data collection methodology, provenance, DOI citation, and update frequency cannot be automatically extracted with their API, so we manually included them. Kaggle automatically computes a usability score, which is associated with the tag "well-documented", and used for ranking results when searching for a dataset.