Not enough data to create a plot.
Try a different view from the menu above.
AdvAD: Exploring Non-Parametric Diffusion for Imperceptible Adversarial Attacks
Imperceptible adversarial attacks aim to fool DNNs by adding imperceptible perturbation to the input data. Previous methods typically improve the imperceptibility of attacks by integrating common attack paradigms with specifically designed perception-based losses or the capabilities of generative models. In this paper, we propose Adversarial Attacks in Diffusion (AdvAD), a novel modeling framework distinct from existing attack paradigms.
A Appendix
A.1 Acetylacetone Dataset: Additional Experiments We ran additional experiments with the acetylacetone dataset introduced in [3] to further investigate the generalization capabilities of MACE [3]. Figure 4 shows the energy predictions of BOTNet [3], NequIP [5], MACE, and (linear) ACE [33] for two trajectories on the acetylacetone's potential energy surface (PES). The left panel shows the energy profile for a rotation around an O-C-C-C dihedral angle. Since the training set only contains dihedral angles below 30 (see lower panel), accurate predictions for angles up to 180 require significant extrapolation capabilities. Also the energy barrier of the rotation is with 1 eV well outside the energy range of the training set which is sampled at 300 K. It can be seen that all models solve this task surprisingly well.
E2E-MFD: Towards End-to-End Synchronous Multimodal Fusion Detection
Multimodal image fusion and object detection are crucial for autonomous driving. While current methods have advanced the fusion of texture details and semantic information, their complex training processes hinder broader applications. Addressing this challenge, we introduce E2E-MFD, a novel end-to-end algorithm for multimodal fusion detection.
Field-wise Learning for Multi-field Categorical Data Zhibin Li
We propose a new method for learning with multi-field categorical data. Multi-field categorical data are usually collected over many heterogeneous groups. These groups can reflect in the categories under a field. The existing methods try to learn a universal model that fits all data, which is challenging and inevitably results in learning a complex model. In contrast, we propose a field-wise learning method leveraging the natural structure of data to learn simple yet efficient one-to-one field-focused models with appropriate constraints.
7078971350bcefbc6ec2779c9b84a9bd-AuthorFeedback.pdf
We appreciate all reviewers' valuable comments, and greatly encouraged by the positive comments, e.g. the problem To Reviewer 1 & 2 on missing recent related methods. We would include these results in supplementary materials of the final version. To Reviewer 1: Q1 - The improvement of your method is not impressive. Our model improves Logloss by 0.002 and produces more Some columns contain site_id and ad_id so the dimensionality is very large. To Reviewer 2: Q1 - The proposed method lacks technique contributions.
DeiSAM: Segment Anything with Deictic Prompting Manuel Brack 1,2
Large-scale, pre-trained neural networks have demonstrated strong capabilities in various tasks, including zero-shot image segmentation. To identify concrete objects in complex scenes, humans instinctively rely on deictic descriptions in natural language, i.e., referring to something depending on the context, such as "The object that is on the desk and behind the cup". However, deep learning approaches cannot reliably interpret such deictic representations as they have limited reasoning capabilities, particularly in complex scenarios. Therefore, we propose DeiSAM--a combination of large pre-trained neural networks with differentiable logic reasoners--for deictic promptable segmentation. Given a complex, textual segmentation description, DeiSAM leverages Large Language Models (LLMs) to generate first-order logic rules and performs differentiable forward reasoning on generated scene graphs. Subsequently, DeiSAM segments objects by matching them to the logically inferred image regions. As part of our evaluation, we propose the Deictic Visual Genome (DeiVG) dataset, containing paired visual input and complex, deictic textual prompts. Our empirical results demonstrate that DeiSAM is a substantial improvement over purely data-driven baselines for deictic promptable segmentation.
Identification, Amplification and Measurement: A bridge to Gaussian Differential Privacy
Gaussian differential privacy (GDP) is a single-parameter family of privacy notions that provides coherent guarantees to avoid the exposure of sensitive individual information. Despite the extra interpretability and tighter bounds under composition GDP provides, many widely used mechanisms (e.g., the Laplace mechanism) inherently provide GDP guarantees but often fail to take advantage of this new framework because their privacy guarantees were derived under a different background. In this paper, we study the asymptotic properties of privacy profiles and develop a simple criterion to identify algorithms with GDP properties. We propose an efficient method for GDP algorithms to narrow down possible values of an optimal privacy measurement, ยต with an arbitrarily small and quantifiable margin of error. For non GDP algorithms, we provide a post-processing procedure that can amplify existing privacy guarantees to meet the GDP condition. As applications, we compare two single-parameter families of privacy notions, ฯต-DP, and ยต-GDP, and show that all ฯต-DP algorithms are intrinsically also GDP. Lastly, we show that the combination of our measurement process and the composition theorem of GDP is a powerful and convenient tool to handle compositions compared to the traditional standard and advanced composition theorems.
AudioMarkBench: Benchmarking Robustness of Audio Watermarking
The increasing realism of synthetic speech, driven by advancements in text-tospeech models, raises ethical concerns regarding impersonation and disinformation. Audio watermarking offers a promising solution via embedding humanimperceptible watermarks into AI-generated audios. However, the robustness of audio watermarking against common/adversarial perturbations remains understudied.
Separate and Reconstruct: Asymmetric Encoder-Decoder for Speech Separation
In speech separation, time-domain approaches have successfully replaced the time-frequency domain with latent sequence feature from a learnable encoder. Conventionally, the feature is separated into speaker-specific ones at the final stage of the network. Instead, we propose a more intuitive strategy that separates features earlier by expanding the feature sequence to the number of speakers as an extra dimension. To achieve this, an asymmetric strategy is presented in which the encoder and decoder are partitioned to perform distinct processing in separation tasks. The encoder analyzes features, and the output of the encoder is split into the number of speakers to be separated.
NeurIPS22_data_benchmarks
This means that shorter time horizons train for more episodes. Regardless of the training setup, we evaluate on the random weather setting. When evaluating trained policies on test-time, test-location and test-horizon we use 20 repetitions. We report the performance on these generalization tasks for the final policy obtained at the end of training.