Not enough data to create a plot.
Try a different view from the menu above.
Rethinking Optimal Transport in Offline Reinforcement Learning
We propose a novel algorithm for offline reinforcement learning using optimal transport. Typically, in offline reinforcement learning, the data is provided by various experts and some of them can be sub-optimal. To extract an efficient policy, it is necessary to stitch the best behaviors from the dataset. To address this problem, we rethink offline reinforcement learning as an optimal transport problem. And based on this, we present an algorithm that aims to find a policy that maps states to a partial distribution of the best expert actions for each given state. We evaluate the performance of our algorithm on continuous control problems from the D4RL suite and demonstrate improvements over existing methods.
Robust Distributed Learning: Tight Error Bounds and Breakdown Point under Data Heterogeneity
The theory underlying robust distributed learning algorithms, designed to resist adversarial machines, matches empirical observations when data is homogeneous. Under data heterogeneity however, which is the norm in practical scenarios, established lower bounds on the learning error are essentially vacuous and greatly mismatch empirical observations. This is because the heterogeneity model considered is too restrictive and does not cover basic learning tasks such as least-squares regression. We consider in this paper a more realistic heterogeneity model, namely (G, B)-gradient dissimilarity, and show that it covers a larger class of learning problems than existing theory.
Subject-driven Text-to-Image Generation via Preference-based Reinforcement Learning
Text-to-image generative models have recently attracted considerable interest, enabling the synthesis of high-quality images from textual prompts. However, these models often lack the capability to generate specific subjects from given reference images or to synthesize novel renditions under varying conditions. Methods like DreamBooth and Subject-driven Text-to-Image (SuTI) have made significant progress in this area. Yet, both approaches primarily focus on enhancing similarity to reference images and require expensive setups, often overlooking the need for efficient training and avoiding overfitting to the reference images. In this work, we present the λ-Harmonic reward function, which provides a reliable reward signal and enables early stopping for faster training and effective regularization.
CEDe: Supplementary material Rodrigo Hormazabal
Was there a specific task in mind? Was there a specific gap that needed to be filled? AI-based materials design is a rapidly growing area of research in the field of Chemistry. However, experimental data is scarce, while the obtention and indexation of data still constitute a major bottleneck in the materials' discovery process. Researchers mainly access information by extracting data from scientific documents, such as papers and patents.[2] Molecular images have been, and currently are, the preferred format for publishing discoveries and detailing structural information about new compounds. More recently, machine learning-based approaches have been explored for the same task.[4] However, until now, even state-of-the-art models have struggled to perform on par with traditional approaches due to being sample-inefficient.
Towards Understanding Extrapolation: a Causal Lens Lingjing Kong 1 Guangyi Chen 1,2 Haoxuan Li2
However, practical scenarios often involve only a handful of target samples, potentially lying outside the training support, which requires the capability of extrapolation. In this work, we aim to provide a theoretical understanding of when extrapolation is possible and offer principled methods to achieve it without requiring an on-support target distribution. To this end, we formulate the extrapolation problem with a latent-variable model that embodies the minimal change principle in causal mechanisms. Under this formulation, we cast the extrapolation problem into a latent-variable identification problem. We provide realistic conditions on shift properties and the estimation objectives that lead to identification even when only one off-support target sample is available, tackling the most challenging scenarios. Our theory reveals the intricate interplay between the underlying manifold's smoothness and the shift properties. We showcase how our theoretical results inform the design of practical adaptation algorithms.
Pgx: Hardware-Accelerated Parallel Game Simulators for Reinforcement Learning
We propose Pgx, a suite of board game reinforcement learning (RL) environments written in JAX and optimized for GPU/TPU accelerators. By leveraging JAX's auto-vectorization and parallelization over accelerators, Pgx can efficiently scale to thousands of simultaneous simulations over accelerators. In our experiments on a DGX-A100 workstation, we discovered that Pgx can simulate RL environments 10-100x faster than existing implementations available in Python. Pgx includes RL environments commonly used as benchmarks in RL research, such as backgammon, chess, shogi, and Go. Additionally, Pgx offers miniature game sets and baseline models to facilitate rapid research cycles. We demonstrate the efficient training of the Gumbel AlphaZero algorithm with Pgx environments. Overall, Pgx provides high-performance environment simulators for researchers to accelerate their RL experiments. Pgx is available at https://github.com/sotetsuk/pgx.
df334022279996b07e0870a629c18857-Paper-Conference.pdf
Deep learning sometimes appears to work in unexpected ways. In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network consisting of a sequence of first-order approximations telescoping out into a single empirically operational tool for practical analysis. Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena in the literature - including double descent, grokking, linear mode connectivity, and the challenges of applying deep learning on tabular data - highlighting that this model allows us to construct and extract metrics that help predict and understand the a priori unexpected performance of neural networks. We also demonstrate that this model presents a pedagogical formalism allowing us to isolate components of the training process even in complex contemporary settings, providing a lens to reason about the effects of design choices such as architecture & optimization strategy, and reveals surprising parallels between neural network learning and gradient boosting.
A Experiments Supplement
Since most loss values falls within the range of [0.1, 10], we evaluate how the model accuracy and fairness change w.r.t. Figure 1 shows the change of fairness (equalized odds) under different cutoff value. A.2 Sensitivity of Validation Size We show the effect of validation size on accuracy and equalized odds in Fig.. As shown in the figures, when the validation size is larger than 10% of training size, the model's performance becomes stable in terms of accuracy and fairness. During validation, we freeze the contrastive encoder and train a downstream linear classifier g with parameter ω for classification task. Figure 4: Change of accuracy as validation size varies.
Self-Supervised Fair Representation Learning without Demographics
Fairness has become an important topic in machine learning. Generally, most literature on fairness assumes that the sensitive information, such as gender or race, is present in the training set, and uses this information to mitigate bias. However, due to practical concerns like privacy and regulation, applications of these methods are restricted. Also, although much of the literature studies supervised learning, in many real-world scenarios, we want to utilize the large unlabelled dataset to improve the model's accuracy. Can we improve fair classification without sensitive information and without labels?