DI-MaskDINO: A Joint Object Detection and Instance Segmentation Model Zhixiong Nan 1

Neural Information Processing Systems

This paper is motivated by an interesting phenomenon: the performance of object detection lags behind that of instance segmentation (i.e., performance imbalance) when investigating the intermediate results from the beginning transformer decoder layer of MaskDINO (i.e., the SOTA model for joint detection and segmentation). This phenomenon inspires us to think about a question: will the performance imbalance at the beginning layer of transformer decoder constrain the upper bound of the final performance?


Supplementary file for MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures

Neural Information Processing Systems

Organization The supplementary file is organized as follows. In section A, we show additional results and analysis of the robustness and calibration experiments. In section B, we visualize how the perturbations look like in the latent feature space. In section C, we provide the details of the datasets, network architectures, and experimental setups. We use EoT [3] + PGD attack of 200 steps with some range of ɛ and the inner-learning rate is set to 0.025ɛ for l We also compare with adversarial training baselines, which take 30 projected gradient descent steps at training. The ɛ value used for adversarial training for each dataset is written in the Figure 1 and Figure 6 in the paper.


MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures

Neural Information Processing Systems

Regularization and transfer learning are two popular techniques to enhance model generalization on unseen data, which is a fundamental problem of machine learning. Regularization techniques are versatile, as they are task-and architecture-agnostic, but they do not exploit a large amount of data available. Transfer learning methods learn to transfer knowledge from one domain to another, but may not generalize across tasks and architectures, and may introduce new training cost for adapting to the target task. To bridge the gap between the two, we propose a transferable perturbation, MetaPerturb, which is meta-learned to improve generalization performance on unseen data. MetaPerturb is implemented as a set-based lightweight network that is agnostic to the size and the order of the input, which is shared across the layers. Then, we propose a meta-learning framework, to jointly train the perturbation function over heterogeneous tasks in parallel. As MetaPerturb is a set-function trained over diverse distributions across layers and tasks, it can generalize to heterogeneous tasks and architectures. We validate the efficacy and generality of MetaPerturb trained on a specific source domain and architecture, by applying it to the training of diverse neural architectures on heterogeneous target datasets against various regularizers and fine-tuning. The results show that the networks trained with MetaPerturb significantly outperform the baselines on most of the tasks and architectures, with a negligible increase in the parameter size and no hyperparameters to tune.


ColJailBreak: Collaborative Generation and Editing for Jailbreaking Text-to-Image Deep Generation

Neural Information Processing Systems

DALL E) can produce high-quality images based on input language descriptions. These models incorporate a black-box safety filter to prevent the generation of unsafe or unethical content, such as violent, criminal, or hateful imagery. Recent jailbreaking methods generate adversarial prompts capable of bypassing safety filters and producing unsafe content, exposing vulnerabilities in influential commercial models. However, once these adversarial prompts are identified, the safety filter can be updated to prevent the generation of unsafe images. In this work, we propose an effective, simple, and difficult-to-detect jailbreaking solution: generating safe content initially with normal text prompts and then editing the generations to embed unsafe content.


A Proof of Theorem

Neural Information Processing Systems

First, we prepare some lemmas. From Eq. (25), the dynamics in Eq. (26) is equivalent to Eq. (22) and Eq. In the following, we prove Theorem 1 using the above lemmas. B.1 Neural likelihood example We perform an experiment with a complex posterior, wherein the likelihood is defined with a randomly initialized neural network f The results are shown in Figure 5. The left three columns show the density visualizations of the ground truth or approximation posteriors of VI methods; the right two columns show the visualizations of 2D histograms and samples obtained using ALD.


84c578f202616448a2f80e6f56d5f16d-AuthorFeedback.pdf

Neural Information Processing Systems

Yes, the total complexity is proportional to the number of aggregated tasks. Add experiments to compare ANIL and MAML and w.r.t. the size B of samples: Thanks for the suggestion! Why sample size in inner-loop is not taken into analysis, as Fallah et al. [4] does: Great question! This setting has also been considered in Rajeswaran et al. [24], Ji et al. [13]. We will clarify it in the revision.


Training Code Language Models with Comprehensive Semantics Reasoning

Neural Information Processing Systems

Code Large Language Models (Code LLMs) have excelled at tasks like code completion but often miss deeper semantics such as execution effects and dynamic states. This paper aims to bridge the gap between Code LLMs' reliance on static text data and the need for semantic understanding for complex tasks like debugging and program repair. We introduce a novel strategy, monologue reasoning, to train Code LLMs to reason comprehensive semantics, encompassing high-level functional descriptions, local execution effects of individual statements, and overall input/output behavior, thereby linking static code text with dynamic execution states.


ETO: Efficient Transformer-based Local Feature Matching by Organizing Multiple Homography Hypotheses Junjie Ni1 Guofeng Zhang 1 Guanglin Li1 Yijin Li

Neural Information Processing Systems

Recent developments have led to the emergence of transformer-based approaches for local feature matching, resulting in enhanced accuracy of matches. However, the time required for transformer-based feature enhancement is excessively long, which limits their practical application. In this paper, we propose methods to reduce the computational load of transformers during both the coarse matching and refinement stages. During the coarse matching phase, we organize multiple homography hypotheses to approximate continuous matches. Each hypothesis encompasses several features to be matched, significantly reducing the number of features that require enhancement via transformers. In the refinement stage, we reduce the bidirectional self-attention and cross-attention mechanisms to unidirectional cross-attention, thereby substantially decreasing the cost of computation. Overall, our method demonstrates at least 4 times faster compared to other transformerbased feature matching algorithms. Comprehensive evaluations on other open datasets such as Megadepth, YFCC100M, ScanNet, and HPatches demonstrate our method's efficacy, highlighting its potential to significantly enhance a wide array of downstream applications.


Adapting to Misspecification in Contextual Bandits

Neural Information Processing Systems

A major research direction in contextual bandits is to develop algorithms that are computationally efficient, yet support flexible, general-purpose function approximation. Algorithms based on modeling rewards have shown strong empirical performance, yet typically require a well-specified model, and can fail when this assumption does not hold. Can we design algorithms that are efficient and flexible, yet degrade gracefully in the face of model misspecification? We introduce a new family of oracle-efficient algorithms for ε-misspecified contextual bandits that adapt to unknown model misspecification--both for finite and infinite action settings. Given access to an online oracle for square loss regression, our algorithm attains optimal regret and--in particular--optimal dependence on the misspecification level, with no prior knowledge. Specializing to linear contextual bandits with infinite actions in d dimensions, we obtain the first algorithm that achieves the optimal Õ(d T + ε dT) regret bound for unknown ε. On a conceptual level, our results are enabled by a new optimization-based perspective on the regression oracle reduction framework of Foster and Rakhlin [21], which we believe will be useful more broadly.


Adapting to Misspecification in Contextual Bandits

Neural Information Processing Systems

A major research direction in contextual bandits is to develop algorithms that are computationally efficient, yet support flexible, general-purpose function approximation. Algorithms based on modeling rewards have shown strong empirical performance, yet typically require a well-specified model, and can fail when this assumption does not hold. Can we design algorithms that are efficient and flexible, yet degrade gracefully in the face of model misspecification? We introduce a new family of oracle-efficient algorithms for ε-misspecified contextual bandits that adapt to unknown model misspecification--both for finite and infinite action settings. Given access to an online oracle for square loss regression, our algorithm attains optimal regret and--in particular--optimal dependence on the misspecification level, with no prior knowledge. Specializing to linear contextual bandits with infinite actions in d dimensions, we obtain the first algorithm that achieves the optimal Õ(d T + ε dT) regret bound for unknown ε. On a conceptual level, our results are enabled by a new optimization-based perspective on the regression oracle reduction framework of Foster and Rakhlin [20], which we believe will be useful more broadly.