Plotting

Measuring Mutual Policy Divergence for Multi-Agent Sequential Exploration Badong Chen

Neural Information Processing Systems

Sequential updating scheme was thus proposed, naturally diversifying agents by encouraging agents to learn from preceding ones. However, the exploration strategy in sequential scheme has not been investigated. Benefiting from updating one-by-one, agents have the access to the information from preceding agents. Thus, in this work, we propose to exploit the preceding information to enhance exploration and heterogeneity sequentially. We present Multi-Agent Divergence Policy Optimization (MADPO), equipped with mutual policy divergence maximization framework. We quantify the discrepancies between episodes to enhance exploration and between agents to heterogenize agents, termed intra-agent divergence and inter-agent divergence. To address the issue that traditional divergence measurements lack stability and directionality, we propose to employ the conditional Cauchy-Schwarz divergence to provide entropy-guided exploration incentives. Extensive experiments show that the proposed method outperforms state-of-the-art sequential updating approaches in two challenging multi-agent tasks with various heterogeneous scenarios. Source code is available at https://github.com/hwdou6677/MADPO.


Hyperbolic Graph Neural Networks

Neural Information Processing Systems

Learning from graph-structured data is an important task in machine learning and artificial intelligence, for which Graph Neural Networks (GNNs) have shown great promise. Motivated by recent advances in geometric representation learning, we propose a novel GNN architecture for learning representations on Riemannian manifolds with differentiable exponential and logarithmic maps. We develop a scalable algorithm for modeling the structural properties of graphs, comparing Euclidean and hyperbolic geometry. In our experiments, we show that hyperbolic GNNs can lead to substantial improvements on various benchmark datasets.


103303dd56a731e377d01f6a37badae3-AuthorFeedback.pdf

Neural Information Processing Systems

Reviewer 1 We thank the reviewer for the insightful and encouraging comments. Regarding the choice of logm/expm vs. any other pair of invertible functions Logarithmic and exponential maps are Regarding the difference to hyperbolic NNs The work of Ganea et al. is indeed related to our work and we tried to Reviewer 2 We thank the reviewer for the insightful and encouraging comments. Regarding code release Thank you for raising this issue. We do indeed plan to fully open-source our code on Github. Regaring the time complexity of our method The computational demand of our method is similar to standard GCNs.




The Label Complexity of Active Learning from Observational Data

Neural Information Processing Systems

Counterfactual learning from observational data involves learning a classifier on an entire population based on data that is observed conditioned on a selection policy. This work considers this problem in an active setting, where the learner additionally has access to unlabeled examples and can choose to get a subset of these labeled by an oracle. Prior work on this problem uses disagreement-based active learning, along with an importance weighted loss estimator to account for counterfactuals, which leads to a high label complexity. We show how to instead incorporate a more efficient counterfactual risk minimizer into the active learning algorithm. This requires us to modify both the counterfactual risk to make it amenable to active learning, as well as the active learning process to make it amenable to the risk. We provably demonstrate that the result of this is an algorithm which is statistically consistent as well as more label-efficient than prior work.


Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations Alexander Hรคgele

Neural Information Processing Systems

Scale has become a main ingredient in obtaining strong machine learning models. As a result, understanding a model's scaling properties is key to effectively designing both the right training setup as well as future generations of architectures. In this work, we argue that scale and training research has been needlessly complex due to reliance on the cosine schedule, which prevents training across different lengths for the same model size. We investigate the training behavior of a direct alternative -- constant learning rate and cooldowns -- and find that it scales predictably and reliably similar to cosine. Additionally, we show that stochastic weight averaging yields improved performance along the training trajectory, without additional training costs, across different scales. Importantly, with these findings we demonstrate that scaling experiments can be performed with significantly reduced compute and GPU hours by utilizing fewer but reusable training runs.


data points changes the norms of all vectors, while the norms are very important quantities in the

Neural Information Processing Systems

Re assumption 1: Shifting the data points is a good idea, but it might cause problems. In our current work, we focus on theory and datasets satisfying assumption 1. We will rephrase the sentence as follows: "In these scenarios, In the present work, we aim to improve the efficiency of the MIPS problem in algorithmic perspective. GPU can process multiple queries in parallel. Algorithm 1) and the indices of visited vertices can be arbitrarily large.


Practical Private Mean and Covariance Estimation

Neural Information Processing Systems

We present simple differentially private estimators for the mean and covariance of multivariate sub-Gaussian data that are accurate at small sample sizes. We demonstrate the effectiveness of our algorithms both theoretically and empirically using synthetic and real-world datasets--showing that their asymptotic error rates match the state-of-the-art theoretical bounds, and that they concretely outperform all previous methods. Specifically, previous estimators either have weak empirical accuracy at small sample sizes, perform poorly for multivariate data, or require the user to provide strong a priori estimates for the parameters.


Factor Group-Sparse Regularization for Efficient Low-Rank Matrix Recovery

Neural Information Processing Systems

This paper develops a new class of nonconvex regularizers for low-rank matrix recovery. Many regularizers are motivated as convex relaxations of the matrix rank function. Our new factor group-sparse regularizers are motivated as a relaxation of the number of nonzero columns in a factorization of the matrix. These nonconvex regularizers are sharper than the nuclear norm; indeed, we show they are related to Schatten-p norms with arbitrarily small 0 < p 1. Moreover, these factor group-sparse regularizers can be written in a factored form that enables efficient and effective nonconvex optimization; notably, the method does not use singular value decomposition. We provide generalization error bounds for low-rank matrix completion which show improved upper bounds for Schatten-p norm reglarization as p decreases. Compared to the max norm and the factored formulation of the nuclear norm, factor group-sparse regularizers are more efficient, accurate, and robust to the initial guess of rank. Experiments show promising performance of factor group-sparse regularization for low-rank matrix completion and robust principal component analysis.