lemma 1
Learning Partitions from Context
In this paper, we study the problem of learning the structure of a discrete set of N tokens based on their interactions with other tokens. We focus on a setting where the tokens can be partitioned into a small number of classes, and there exists a real-valued function f defined on certain sets of tokens. This function, which captures the interactions between tokens, depends only on the class memberships of its arguments. The goal is to recover the class memberships of all tokens from a finite number of samples of f. We begin by analyzing this problem from both complexity-theoretic and information-theoretic viewpoints. We prove that it is NP-complete in general, and for random instances, we show that on the order of N ln(N) samples, implying very sparse interactions, suffice to identify the partition.
Geometric Analysis of Nonlinear Manifold Clustering Tianjiao Ding
Manifold clustering is an important problem in motion and video segmentation, natural image clustering, and other applications where high-dimensional data lie on multiple, low-dimensional, nonlinear manifolds. While current state-ofthe-art methods on large-scale datasets such as CIFAR provide good empirical performance, they do not have any proof of theoretical correctness. In this work, we propose a method that clusters data belonging to a union of nonlinear manifolds.
Learning Supervised PageRank with Gradient-Based and Gradient-Free Optimization Methods
Lev Bogolubsky, Pavel Dvurechenskii, Alexander Gasnikov, Gleb Gusev, Yurii Nesterov, Andrei M. Raigorodskii, Aleksey Tikhonov, Maksim Zhukovskii
In this paper, we consider a non-convex loss-minimization problem of learning Supervised PageRank models, which can account for features of nodes and edges. We propose gradient-based and random gradient-free methods to solve this problem. Our algorithms are based on the concept of an inexact oracle and unlike the state-ofthe-art gradient-based method we manage to provide theoretically the convergence rate guarantees for both of them. Finally, we compare the performance of the proposed optimization methods with the state of the art applied to a ranking task.
Nearly Minimax Optimal Regret for Multinomial Logistic Bandit
In this paper, we study the contextual multinomial logistic (MNL) bandit problem in which a learning agent sequentially selects an assortment based on contextual information, and user feedback follows an MNL choice model. There has been a significant discrepancy between lower and upper regret bounds, particularly regarding the maximum assortment size K. Additionally, the variation in reward structures between these bounds complicates the quest for optimality. Under uniform rewards, where all items have the same expected reward, we establish a regret lower bound of Ωpd? T {Kq and propose a constant-time algorithm, OFU-MNL+, that achieves a matching upper bound of Õpd? T {Kq. We also provide instancedependent minimax regret bounds under uniform rewards. Under non-uniform rewards, we prove a lower bound of Ωpd? T q and an upper bound of Õpd? T q, also achievable by OFU-MNL+. Our empirical studies support these theoretical findings. To the best of our knowledge, this is the first work in the contextual MNL bandit literature to prove minimax optimality -- for either uniform or non-uniform reward setting -- and to propose a computationally efficient algorithm that achieves this optimality up to logarithmic factors.
Paths to Equilibrium in Games
In multi-agent reinforcement learning (MARL) and game theory, agents repeatedly interact and revise their strategies as new data arrives, producing a sequence of strategy profiles. This paper studies sequences of strategies satisfying a pairwise constraint inspired by policy updating in reinforcement learning, where an agent who is best responding in one period does not switch its strategy in the next period. This constraint merely requires that optimizing agents do not switch strategies, but does not constrain the non-optimizing agents in any way, and thus allows for exploration. Sequences with this property are called satisficing paths, and arise naturally in many MARL algorithms. A fundamental question about strategic dynamics is such: for a given game and initial strategy profile, is it always possible to construct a satisficing path that terminates at an equilibrium? The resolution of this question has implications about the capabilities or limitations of a class of MARL algorithms. We answer this question in the affirmative for normal-form games. Our analysis reveals a counterintuitive insight that reward deteriorating strategic updates are key to driving play to equilibrium along a satisficing path.
b096577e264d1ebd6b41041f392eec23-AuthorFeedback.pdf
We thank the reviewers for taking the time to carefully read the paper and their constructive comments. We think this might be feasible. To Reviewer#1 Thank you for your detailed comments. Please also see the revision plan to Reviewer#2. NAG, TMM and G-TM (optimal tuning), and provide the guarantee of TMM (Eq.(11) in [7]) in Section 3.1; (ii) we will About the flawed guarantee, thanks for pointing out the intermediate inequality.
Balancing Context Length and Mixing Times for Reinforcement Learning at Scale
Due to the recent remarkable advances in artificial intelligence, researchers have begun to consider challenging learning problems such as learning to generalize behavior from large offline datasets or learning online in non-Markovian environments. Meanwhile, recent advances in both of these areas have increasingly relied on conditioning policies on large context lengths. A natural question is if there is a limit to the performance benefits of increasing the context length if the computation needed is available. In this work, we establish a novel theoretical result that links the context length of a policy to the time needed to reliably evaluate its performance (i.e., its mixing time) in large scale partially observable reinforcement learning environments that exhibit latent sub-task structure. This analysis underscores a key tradeoff: when we extend the context length, our policy can more effectively model non-Markovian dependencies, but this comes at the cost of potentially slower policy evaluation and as a result slower downstream learning. Moreover, our empirical results highlight the relevance of this analysis when leveraging Transformer based neural networks. This perspective will become increasingly pertinent as the field scales towards larger and more realistic environments, opening up a number of potential future directions for improving the way we design learning agents.
Supplementary Material: Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning A Proof of Theorem 1, r 2 R n 0, c 2 R m 0
In this section, we present the formal proof of Theorem 1. To this end, we interpret DARP as a coordinate ascent algorithm of the Lagrangian dual of its original objective (1), and discuss the necessary and sufficient condition of correct convergence of DARP, i.e., convergence to the optimal solution of (1). Now, we will show that DARP is indeed a coordinate ascent algorithm for the dual of the above optimization. To this end, we formulate the Lagrangian dual of (3). In addition, the optimal objective value of (3) is equivalent to that of (4), i.e., the strong duality holds.
Supplementary Material of Rational neural networks
Thus, xr(x) is a rational approximant to |x| of type at most (k + 1, k). Let 0 < l < 1 be a real number and consider the sign function on the domain [ 1, l] [l, 1], i.e., We refer to such r(x) as the Zolotarev sign function. Moreover, since xr(x) 0 for x [ 1, 1] (see [2, Equation (12)]) we have max ||x| xr(x)| max |x| l. One finds that l = 4 exp( π k/2) and the result follows immediately. The proof of Lemma 1 is a direct consequence of the previous lemma and the properties of Zolotarev sign functions.