Plotting

AudioMarkBench: Benchmarking Robustness of Audio Watermarking

Neural Information Processing Systems

The increasing realism of synthetic speech, driven by advancements in text-tospeech models, raises ethical concerns regarding impersonation and disinformation. Audio watermarking offers a promising solution via embedding humanimperceptible watermarks into AI-generated audios. However, the robustness of audio watermarking against common/adversarial perturbations remains understudied.






e287f0b2e730059c55d97fa92649f4f2-AuthorFeedback.pdf

Neural Information Processing Systems

The execution time for inference is not provided in the paper. We will state this in the next revision. The advantage of the proposed algorithm is clear for the discrete tasks but not for continuous tasks. The results are competitive with SOTA so we have elected to include them for completeness. I think the authors used seven datasets out of the eight datasets described in the paper [7].



A Pseudo-Semantic Loss for Autoregressive Models with Logical Constraints

Neural Information Processing Systems

This often requires maximizing the likelihood of a symbolic constraint w.r.t. the neural network's output distribution. Such output distributions are typically assumed to be fully-factorized. This limits the applicability of neuro-symbolic learning to the more expressive autoregressive distributions, e.g., transformers. Under such distributions, computing the likelihood of even simple constraints is #P-hard. Instead of attempting to enforce the constraint on the entire output distribution, we propose to do so on a random, local approximation thereof.


Dual-Free Stochastic Decentralized Optimization with Variance Reduction

Neural Information Processing Systems

We consider the problem of training machine learning models on distributed data in a decentralized way. For finite-sum problems, fast single-machine algorithms for large datasets rely on stochastic updates combined with variance reduction. Yet, existing decentralized stochastic algorithms either do not obtain the full speedup allowed by stochastic updates, or require oracles that are more expensive than regular gradients. In this work, we introduce a Decentralized stochastic algorithm with Variance Reduction called DVR. DVR only requires computing stochastic gradients of the local functions, and is computationally as fast as a standard stochastic variance-reduced algorithms run on a 1/n fraction of the dataset, where n is the number of nodes. To derive DVR, we use Bregman coordinate descent on a well-chosen dual problem, and obtain a dual-free algorithm using a specific Bregman divergence.


Untangling tradeoffs between recurrence and self-attention in neural networks Kyle Goyette

Neural Information Processing Systems

Attention and self-attention mechanisms, are now central to state-of-the-art deep learning on sequential tasks. However, most recent progress hinges on heuristic approaches with limited understanding of attention's role in model optimization and computation, and rely on considerable memory and computational resources that scale poorly. In this work, we present a formal analysis of how self-attention affects gradient propagation in recurrent networks, and prove that it mitigates the problem of vanishing gradients when trying to capture long-term dependencies by establishing concrete bounds for gradient norms. Building on these results, we propose a relevancy screening mechanism, inspired by the cognitive process of memory consolidation, that allows for a scalable use of sparse self-attention with recurrence. While providing guarantees to avoid vanishing gradients, we use simple numerical experiments to demonstrate the tradeoffs in performance and computational resources by efficiently balancing attention and recurrence. Based on our results, we propose a concrete direction of research to improve scalability of attentive networks.