Plotting

 Information Technology


Direct Unlearning Optimization for Robust and Safe Text-to-Image Models

Neural Information Processing Systems

Recent advancements in text-to-image (T2I) models have unlocked a wide range of applications but also present significant risks, particularly in their potential to generate unsafe content. To mitigate this issue, researchers have developed unlearning techniques to remove the model's ability to generate potentially harmful content. However, these methods are easily bypassed by adversarial attacks, making them unreliable for ensuring the safety of generated images. In this paper, we propose Direct Unlearning Optimization (DUO), a novel framework for removing Not Safe For Work (NSFW) content from T2I models while preserving their performance on unrelated topics. DUO employs a preference optimization approach using curated paired image data, ensuring that the model learns to remove unsafe visual concepts while retaining unrelated features. Furthermore, we introduce an output-preserving regularization term to maintain the model's generative capabilities on safe content. Extensive experiments demonstrate that DUO can robustly defend against various state-of-the-art red teaming methods without significant performance degradation on unrelated topics, as measured by FID and CLIP scores. Our work contributes to the development of safer and more reliable T2I models, paving the way for their responsible deployment in both closed-source and open-source scenarios. CAUTION: This paper includes model-generated content that may contain offensive or distressing material.


GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping

Neural Information Processing Systems

Generating novel views from a single image remains a challenging task due to the complexity of 3D scenes and the limited diversity in the existing multi-view datasets to train a model on. Recent research combining large-scale text-to-image (T2I) models with monocular depth estimation (MDE) has shown promise in handling in-the-wild images. In these methods, an input view is geometrically warped to novel views with estimated depth maps, then the warped image is inpainted by T2I models.


Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates

Neural Information Processing Systems

Recent works have shown that stochastic gradient descent (SGD) achieves the fast convergence rates of full-batch gradient descent for over-parameterized models satisfying certain interpolation conditions. However, the step-size used in these works depends on unknown quantities and SGD's practical performance heavily relies on the choice of this step-size. We propose to use line-search techniques to automatically set the step-size when training models that can interpolate the data. In the interpolation setting, we prove that SGD with a stochastic variant of the classic Armijo line-search attains the deterministic convergence rates for both convex and strongly-convex functions. Under additional assumptions, SGD with Armijo line-search is shown to achieve fast convergence for non-convex functions. Furthermore, we show that stochastic extra-gradient with a Lipschitz line-search attains linear convergence for an important class of non-convex functions and saddle-point problems satisfying interpolation. To improve the proposed methods' practical performance, we give heuristics to use larger step-sizes and acceleration. We compare the proposed algorithms against numerous optimization methods on standard classification tasks using both kernel methods and deep networks. The proposed methods result in competitive performance across all models and datasets, while being robust to the precise choices of hyper-parameters.


A Further discussion

Neural Information Processing Systems

A.1 Cost for incentivization We justify the way in which LIO accounts for the cost of incentivization as follows. Recall that this cost is incurred in the objective for LIO's incentive function (see (5) and (6)), instead of being accounted in the total reward (1) that is maximized by LIO's policy. Fundamentally, the reason is that the cost should be incurred only by the part of the agent that is directly responsible for incentivization. In LIO, the policy and incentive function are separate modules: while the former takes regular actions to maximize external rewards, only the latter produces incentives that directly and actively shape the behavior of other agents. The policy is decoupled from incentivization, and it would be incorrect to penalize it for the behavior of the incentive function. Instead, we need to attribute the cost directly to the incentive function parameters via (6).


ad7ed5d47b9baceb12045a929e7e2f66-AuthorFeedback.pdf

Neural Information Processing Systems

We appreciate R1's recognition of the novelty of our contribution to MARL and the potential impact on a We address R1's two concerns below. "give-reward" actions are direct applications of conventional RL (which have been applied to multi-agent incentivization M = 3 agents are incentivized to cooperate despite penalties of 1 each. Reviewer 2. We appreciate R2's positive feedback on our quantitative results and we are glad that our behavioral Figure 6b where the agent gives nonzero reward for "fire cleaning beam but miss" after 40k steps, one reason is that the Figure 6a), so it may have "forgotten" the difference between successful and unsuccessful usage of the cleaning beam. As demonstrated more clearly in the Escape Room results (e.g. Reviewer 3. We thank R3 for recognizing our contribution to the general class of opponent-shaping algorithms.


Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis Pengyi Jiao 1 Xingchao Yang 2

Neural Information Processing Systems

Volumetric rendering-based methods, like NeRF, excel in HDR view synthesis from RAW images, especially for nighttime scenes. They suffer from long training times and cannot perform real-time rendering due to dense sampling requirements. The advent of 3D Gaussian Splatting (3DGS) enables real-time rendering and faster training. However, implementing RAW image-based view synthesis directly using 3DGS is challenging due to its inherent drawbacks: 1) in nighttime scenes, extremely low SNR leads to poor structure-from-motion (SfM) estimation in distant views; 2) the limited representation capacity of the spherical harmonics (SH) function is unsuitable for RAW linear color space; and 3) inaccurate scene structure hampers downstream tasks such as refocusing. To address these issues, we propose LE3D (Lighting Every darkness with 3DGS). Our method proposes Cone Scatter Initialization to enrich the estimation of SfM and replaces SH with a Color MLP to represent the RAW linear color space. Additionally, we introduce depth distortion and near-far regularizations to improve the accuracy of scene structure for downstream tasks. These designs enable LE3D to perform real-time novel view synthesis, HDR rendering, refocusing, and tone-mapping changes. Compared to previous volumetric rendering-based methods, LE3D reduces training time to 1% and improves rendering speed by up to 4,000 times for 2K resolution images in terms of FPS.


Tetrahedron Splatting for 3D Generation Chun Gu1 Zeyu Yang 1 Zijie Pan

Neural Information Processing Systems

As a flexible representation, NeRF has been first adopted for 3D representation. With density-based volumetric rendering, it however suffers both intensive computational overhead and inaccurate mesh extraction. Using a signed distance field and Marching Tetrahedra, DMTet allows for precise mesh extraction and real-time rendering but is limited in handling large topological changes in meshes, leading to optimization challenges. Alternatively, 3D Gaussian Splatting (3DGS) is favored in both training and rendering efficiency while falling short in mesh extraction. In this work, we introduce a novel 3D representation, Tetrahedron Splatting (TeT-Splatting), that supports easy convergence during optimization, precise mesh extraction, and real-time rendering simultaneously. This is achieved by integrating surface-based volumetric rendering within a structured tetrahedral grid while preserving the desired ability of precise mesh extraction, and a tile-based differentiable tetrahedron rasterizer.


253f7b5d921338af34da817c00f42753-AuthorFeedback.pdf

Neural Information Processing Systems

Summary We would like to thank the entire review team for their efforts and insightful comments. DZPS18] ([DZPS18] refers to arXiv:1810.02054) approach zero (i.e., 0) as the sample size n . ImageNet dataset has 14 million images. For those applications, a non-diminishing convergence rate is more desirable. By Eq. (4), we know ŷ Response to the concern on fixed second layer.


Appendices Table 1: Explanation of the notations tV uq the value (Q-) function at the beginning of the k-th episode; tV

Neural Information Processing Systems

For any fixed n, we apply Lemma 9 with y " iɛ and x " p2 y logp In the case α " 1, it holds that ÿ B.1 Proof of Proposition 4 We prove Q Firstly, the conclusion holds when k " 1. Let ps, a, hq be fixed. We apply Azuma's inequality again to obtain that with probability at least p1 pq, it holds that The proof then is completed by (37). T n ` 2Hι n. (42) We now bound ř For the second case, by Hoeffding's inequality, with probability p1 pq it holds that Q B.2 Proof of Lemma 5 First, by Hoeffding's inequality, for every k and h, we have that Therefore, we only need to prove (48), and the rest of the proof is devoted to establishing (48). We now bound the first term of (53).


Almost Optimal Model-Free Reinforcement Learning via Reference-Advantage Decomposition

Neural Information Processing Systems

We study the reinforcement learning problem in the setting of finite-horizon episodic Markov Decision Processes (MDPs) with S states, A actions, and episode length H.