Information Technology
Nearly Minimax Optimal Regret for Multinomial Logistic Bandit
In this paper, we study the contextual multinomial logistic (MNL) bandit problem in which a learning agent sequentially selects an assortment based on contextual information, and user feedback follows an MNL choice model. There has been a significant discrepancy between lower and upper regret bounds, particularly regarding the maximum assortment size K. Additionally, the variation in reward structures between these bounds complicates the quest for optimality. Under uniform rewards, where all items have the same expected reward, we establish a regret lower bound of ฮฉpd? T {Kq and propose a constant-time algorithm, OFU-MNL+, that achieves a matching upper bound of รpd? T {Kq. We also provide instancedependent minimax regret bounds under uniform rewards. Under non-uniform rewards, we prove a lower bound of ฮฉpd? T q and an upper bound of รpd? T q, also achievable by OFU-MNL+. Our empirical studies support these theoretical findings. To the best of our knowledge, this is the first work in the contextual MNL bandit literature to prove minimax optimality -- for either uniform or non-uniform reward setting -- and to propose a computationally efficient algorithm that achieves this optimality up to logarithmic factors.
Discretely Beyond 1/e: Guided Combinatorial Algorithms for Submodular Maximization
For constrained, not necessarily monotone submodular maximization, all known approximation algorithms with ratio greater than 1/e require continuous ideas, such as queries to the multilinear extension of a submodular function and its gradient, which are typically expensive to simulate with the original set function. For combinatorial algorithms, the best known approximation ratios for both size and matroid constraint are obtained by a simple randomized greedy algorithm of Buchbinder et al. [9]: 1/e 0.367 for size constraint and 0.281 for the matroid constraint in O(kn) queries, where k is the rank of the matroid. In this work, we develop the first combinatorial algorithms to break the 1/e barrier: we obtain approximation ratio of 0.385 in O(kn) queries to the submodular set function for size constraint, and 0.305 for a general matroid constraint. These are achieved by guiding the randomized greedy algorithm with a fast local search algorithm. Further, we develop deterministic versions of these algorithms, maintaining the same ratio and asymptotic time complexity. Finally, we develop a deterministic, nearly linear time algorithm with ratio 0.377.
Learning Superconductivity from Ordered and Disordered Material Structures Rui Jiao 2,3
Superconductivity is a fascinating phenomenon observed in certain materials under certain conditions. However, some critical aspects of it, such as the relationship between superconductivity and materials' chemical/structural features, still need to be understood. Recent successes of data-driven approaches in material science strongly inspire researchers to study this relationship with them, but a corresponding dataset is still lacking.
LN None LN (Ba et al.)
Dear Reviewers R1, R2, and R3: Thank you for your comments and suggestions to improve our paper. In contrast, BRN's statistical estimates are based on batches Note on Batch Size [R1]: ON never requires batching. After tuning its hyperparameters, we observed that it performs worse (Figure 2). ON Hyperparameters [R1, R3]: ON removes the batch size parameter and introduces two decay rate parameters. We will include this figure in the paper's appendix.
Rule Based Rewards for Language Model Safety
Reinforcement learning based fine-tuning of large language models (LLMs) on human preferences has been shown to enhance both their capabilities and safety behavior. However, in cases related to safety, without precise instructions to human annotators, the data collected may cause the model to become overly cautious, or to respond in an undesirable style, such as being judgmental. Additionally, as model capabilities and usage patterns evolve, there may be a costly need to add or relabel data to modify safety behavior. We propose a novel preference modeling approach that utilizes AI feedback and only requires a small amount of human data. Our method, Rule Based Rewards (RBR), uses a collection of rules for desired or undesired behaviors (e.g.
GenRec: Unifying Video Generation and Recognition with Diffusion Models
Video diffusion models are able to generate high-quality videos by learning strong spatial-temporal priors on large-scale datasets. In this paper, we aim to investigate whether such priors derived from a generative process are suitable for video recognition, and eventually joint optimization of generation and recognition. Building upon Stable Video Diffusion, we introduce GenRec, the first unified framework trained with a random-frame conditioning process so as to learn generalized spatial-temporal representations. The resulting framework can naturally supports generation and recognition, and more importantly is robust even when visual inputs contain limited information. Extensive experiments demonstrate the efficacy of GenRec for both recognition and generation. In particular, GenRec achieves competitive recognition performance, offering 75.8% and 87.2% accuracy on SSV2 and K400, respectively. GenRec also performs the best on class-conditioned image-to-video generation, achieving 46.5 and 49.3 FVD scores on SSV2 and EK-100 datasets. Furthermore, GenRec demonstrates extraordinary robustness in scenarios that only limited frames can be observed.
Learning Nonsymmetric Determinantal Point Processes
Mike Gartrell, Victor-Emmanuel Brunel, Elvis Dohmatob, Syrine Krichene
Determinantal point processes (DPPs) have attracted substantial attention as an elegant probabilistic model that captures the balance between quality and diversity within sets. DPPs are conventionally parameterized by a positive semi-definite kernel matrix, and this symmetric kernel encodes only repulsive interactions between items. These so-called symmetric DPPs have significant expressive power, and have been successfully applied to a variety of machine learning tasks, including recommendation systems, information retrieval, and automatic summarization, among many others. Efficient algorithms for learning symmetric DPPs and sampling from these models have been reasonably well studied. However, relatively little attention has been given to nonsymmetric DPPs, which relax the symmetric constraint on the kernel. Nonsymmetric DPPs allow for both repulsive and attractive item interactions, which can significantly improve modeling power, resulting in a model that may better fit for some applications. We present a method that enables a tractable algorithm, based on maximum likelihood estimation, for learning nonsymmetric DPPs from data composed of observed subsets. Our method imposes a particular decomposition of the nonsymmetric kernel that enables such tractable learning algorithms, which we analyze both theoretically and experimentally. We evaluate our model on synthetic and real-world datasets, demonstrating improved predictive performance compared to symmetric DPPs, which have previously shown strong performance on modeling tasks associated with these datasets.
cae82d4350cc23aca7fc9ae38dab38ab-AuthorFeedback.pdf
We thank the reviewers for their insightful comments and detailed analysis of our work. Furthermore, the first term on the right side of Eq. 12 may be Regarding the time complexity of the low-rank representation, we see from Eq. 12 that the time complexity required to We will add some text to the camera-ready version of our paper to make this point clear. Learning signed determinantal point processes through the principal minor assignment problem. Learning determinantal point processes by corrective negative sampling.