Goto

Collaborating Authors

elaborate on the algorithm description accordingly

Neural Information Processing Systems

We thank all reviewers for their valuable feedback and comments. Please find our responses below. Reviewer 1 - Explanation in the introduction: we strive for clarity and we appreciate this comment. We thank the reviewer for pointing this out. This can be done in many ways as discussed in Appendix C. The theoretical value used for the bounds is rather conservative however.


Fair and Welfare-Efficient Constrained Multi-matchings under Uncertainty

Neural Information Processing Systems

We study fair allocation of constrained resources, where a market designer optimizes overall welfare while maintaining group fairness. In many large-scale settings, utilities are not known in advance, but are instead observed after realizing the allocation. We therefore estimate agent utilities using machine learning. Optimizing over estimates requires trading-off between mean utilities and their predictive variances. We discuss these trade-offs under two paradigms for preference modeling - in the stochastic optimization regime, the market designer has access to a probability distribution over utilities, and in the robust optimization regime they have access to an uncertainty set containing the true utilities with high probability. We discuss utilitarian and egalitarian welfare objectives, and we explore how to optimize for them under stochastic and robust paradigms. We demonstrate the efficacy of our approaches on three publicly available conference reviewer assignment datasets. The approaches presented enable scalable constrained resource allocation under uncertainty for many combinations of objectives and preference models.



A Proof of Theorem 1 Let f be a layer of SMP: f(U, Y, A)[i,:,: ] = u(U, U j, y

Neural Information Processing Systems

We first present the formal version of the theorem: Theorem 3 (Representation power - formal). The fact that embeddings produced by isomorphic graphs are permutations one of another is a consequence of equivariance, so we are left to prove the first point. To do so, we will first ignore the features and prove that there is an SMP that maps the initial one-hot encoding of each node to an embedding that allows to reconstruct the adjacency matrix. The case of attributed graphs and the statement of the theorem will then follow easily. Consider a simple connected graph G = (V, E).


Small steps no more: Global convergence of stochastic gradient bandits for arbitrary learning rates

Neural Information Processing Systems

We provide a new understanding of the stochastic gradient bandit algorithm by showing that it converges to a globally optimal policy almost surely using any constant learning rate. This result demonstrates that the stochastic gradient algorithm continues to balance exploration and exploitation appropriately even in scenarios where standard smoothness and noise control assumptions break down. The proofs are based on novel findings about action sampling rates and the relationship between cumulative progress and noise, and extend the current understanding of how simple stochastic gradient methods behave in bandit settings.


Revisiting Non-Parametric Matching Cost Volumes for Robust and Generalizable Stereo Matching Kelvin Cheng and Christopher Healey

Neural Information Processing Systems

Stereo matching is a classic challenging problem in computer vision, which has recently witnessed remarkable progress by Deep Neural Networks (DNNs). This paradigm shift leads to two interesting and entangled questions that have not been addressed well. First, it is unclear whether stereo matching DNNs that are trained from scratch really learn to perform matching well.


Decoupled Kullback-Leibler Divergence Loss Xiaojuan Qi

Neural Information Processing Systems

In this paper, we delve deeper into the Kullback-Leibler (KL) Divergence loss and mathematically prove that it is equivalent to the Decoupled Kullback-Leibler (DKL) Divergence loss that consists of 1) a weighted Mean Square Error (wMSE) loss and 2) a Cross-Entropy loss incorporating soft labels. Thanks to the decomposed formulation of DKL loss, we have identified two areas for improvement. Firstly, we address the limitation of KL/DKL in scenarios like knowledge distillation by breaking its asymmetric optimization property. This modification ensures that the wMSE component is always effective during training, providing extra constructive cues. Secondly, we introduce class-wise global information into KL/DKL to mitigate bias from individual samples. With these two enhancements, we derive the Improved Kullback-Leibler (IKL) Divergence loss and evaluate its effectiveness by conducting experiments on CIFAR-10/100 and ImageNet datasets, focusing on adversarial training, and knowledge distillation tasks. The proposed approach achieves new state-of-the-art adversarial robustness on the public leaderboard -- RobustBench and competitive performance on knowledge distillation, demonstrating the substantial practical merits. Our code is available at https://github.com/jiequancui/DKL.


060fd70a06ead2e1079d27612b84aff4-AuthorFeedback.pdf

Neural Information Processing Systems

Results are presented in Fig a. Full details will be provided Experiments Please allow us to first justify the use of the HIL experiment. All of the following points will be clarified in the revised manuscript (V2). 'gridding' continuous state/action spaces in order to apply DP-based methods, citing relevant literature. Re: the approximations in 4.1, we attempted to discuss each approximation Re: the outliers in Fig 2a, This is an interesting question. This is why the cost of greedy and RRL differ at the first epoch.


Appendix A Reminders about integral probability metrics (P, Q) = sup

Neural Information Processing Systems

Let (X, Σ) be a measurable space. In the context of Section 4.1, we have (at least) the following instantiations of Assumption 4.2: (i) Assume the reward is bounded by r We provide a proof for Lemma 4.1 for completeness. The proof is essentially the same as that for [44, Lemma 4.3]. Now we prove Theorem 4.2. We first note that a two-sided bound follows from Lemma 4.1: |η We outline the practical MOPO algorithm in Algorithm 2. To answer question (3), we conduct a thorough ablation study on MOPO.


submission, we have tested MOPO on a non-MuJoCo environment: an HIV treatment simulator slightly modified

Neural Information Processing Systems

We thank all the reviewers for the constructive feedback. R1:"fairly limited in terms of applicability... the ability to extend this work to more general settings?" The task simulates the sequential decision making in HIV treatment. We show results in Table 1, where MOPO outperforms BEAR and achieves almost the buffer max score. HIV treatment results, averaged over 3 random seeds.