Goto

Collaborating Authors

 iter


SURF: Steering the Scalarization Weight to Uniformly Traverse the Pareto Front

arXiv.org Machine Learning

Scalarization is widely used in multi-objective optimization owing to its simplicity and scalability. In many applications, the goal is to generate solutions that represent diverse user preferences, ideally with uniform coverage of the Pareto front (PF). However, uniformly sampling scalarization weights usually induces non-uniform coverage of the PF. We explain this mismatch through a geometric analysis of the scalarization path. As the scalarization weight varies, the corresponding solutions trace the PF with a generally non-uniform traversal speed. This speed induces an arc-length cumulative distribution function (CDF); inverting this CDF map yields a principled rule for selecting weights that produce uniform PF coverage. Building on this insight, we propose SURF (Sampling Uniformly along the PaReto Front). For structured problems, including bi-objective bandits, we derive closed-form expressions for this CDF map and the resulting PF-aware weight sampling rule. For general problems, SURF alternates between CDF reconstruction and weight sampling. Theoretically, we show that under provable conditions, SURF converges linearly to an unavoidable finite-sampling floor. Empirically, experiments on bandits, multi-objective-gymnasium, and multi-objective LLM alignment demonstrate that SURF efficiently achieves more uniform PF coverage than baselines.


ASimple Decentralized Cross-Entropy Method

Neural Information Processing Systems

Cross-Entropy Method (CEM) is commonly used for planning in model-based reinforcement learning (MBRL) where a centralized approach is typically utilized to update the sampling distribution based on only the top-k operations' results on samples. In this paper, we show that such a centralized approach makes CEM vulnerable to local optima, thus impairing its sample efficiency. To tackle this issue, we propose Decentralized CEM (DecentCEM), a simple but effective improvement over classical CEM, by using an ensemble of CEM instances running independently from one another, and each performing a local improvement of its own sampling distribution. We provide both theoretical and empirical analysis to demonstrate the effectiveness of this simple decentralized approach. We empirically show that, compared to the classical centralized approach using either a single or even a mixture of Gaussian distributions, our DecentCEM finds the global optimum much more consistently thus improves the sample efficiency. Furthermore, we plug in our DecentCEM in the planning problem of MBRL, and evaluate our approach in several continuous control environments, with comparison to the stateof-art CEM based MBRL approaches (PETS and POPLIN). Results show sample efficiency improvement by simply replacing the classical CEM module with our DecentCEM module, while only sacrificing a reasonable amount of computational cost. Lastly, we conduct ablation studies for more in-depth analysis.


AMatNet variants

Neural Information Processing Systems

A.1 Multiple data matrices A combinatorial optimization problem can be presented with multiple (f) relationship features between two groups of items. In FFSP, for example, a production cost could be different for each process that one has to take into account for scheduling in addition to the processing time for each pair of the job and the machine. When there are f number of matrices that need to be encoded (D1, D2, ..., Df), MatNet can be easily expended to accommodate such problems by using the mixed-score attention shown in Figure A.1 instead of the one in Figure 2(b). "Trainable element-wise function" block in Figure A.1 is now an MLP with f + 1 input nodes and 1 output node. A.2 Alternative encoding sequences Equation (2) in the main text describes the application of FA and FB in the graph attentional layer of MatNet that happens in parallel.


0d5bd023a3ee11c7abca5b42a93c4866-Supplemental.pdf

Neural Information Processing Systems

To compute the discrepancy term dst, we add a per-location domain classifier h tw ˆ . It W consti semantic tutes map corresponds to the either source or target domain. On the other hand, hˆ predicts the Bird-Eye View binary segmentation map. In figure 9.1 we show the Lift-Splat Adapt diagram. Our training strategy requires little modification to the original architecture, e.g.







isanunbiasedstochasticgradientdescentupdateruleforthefollowingempiricalrisk: R(θ) = X

Neural Information Processing Systems

This section contains the theoretical analysis of the loss functions of offline experience replay (Proposition 2),augmented experience replay (Proposition 3),andonline experience replay with reservoirsampling(Proposition1). For all experiments, we use the learning rate of 0.1 following the same setting as in Aljundi et al. [2019], Shimetal.[2021], This paper uses Randaugment [Cubuk et al., 2020], which is an auto augmentation method. It randomly selectsP augmentation operators from a set of 14 operators and applies them to the images. ToapplyBPGintheOCLenvironment,weproposeto determine the better/worse action set based on the feedback in the form of current memory batch accuracyAM,which reflects the memory overfitting level of the CL agent.