Plotting

What are Germany's Taurus missiles that Ukraine wants?

Al Jazeera

Ukrainian President Volodymyr Zelenskyy has held talks with Germany's Friedrich Merz in Berlin, days after the newly installed chancellor said Kyiv's Western allies had lifted range restrictions on their missiles and would allow Ukraine to use them to strike deep inside Russian territory. Merz made the announcement on Monday as Russia carried out heavy aerial bombardments on Ukraine and both sides launched tit-for-tat drone attacks. That comment sparked hope in Kyiv and put renewed attention on the possibility of Germany supplying Ukraine with Taurus missiles, which the war-wracked country has long requested. However, Merz, in a joint appearance with Zelenskyy on Wednesday, promised the Ukrainian leader that Germany would help his country develop long-range missiles on its territory. He did not make any commitments regarding the Taurus.



Degraded Polygons Raise Fundamental Questions of Neural Network Perception Dan Ley Department of Mathematics School of Engineering and Applied Sciences Harvard University

Neural Information Processing Systems

It is well-known that modern computer vision systems often exhibit behaviors misaligned with those of humans: from adversarial attacks to image corruptions, deep learning vision models suffer in a variety of settings that humans capably handle. In light of these phenomena, here we introduce another, orthogonal perspective studying the human-machine vision gap. We revisit the task of recovering images under degradation, first introduced over 30 years ago in the Recognition-by-Components theory of human vision. Specifically, we study the performance and behavior of neural networks on the seemingly simple task of classifying regular polygons at varying orders of degradation along their perimeters.


On Improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models Melissa Hall

Neural Information Processing Systems

Large-scale training of latent diffusion models (LDMs) has enabled unprecedented quality in image generation. However, the key components of the best performing LDM training recipes are oftentimes not available to the research community, preventing apple-to-apple comparisons and hindering the validation of progress in the field. In this work, we perform an in-depth study of LDM training recipes focusing on the performance of models and their training efficiency. To ensure apple-to-apple comparisons, we re-implement five previously published models with their corresponding recipes. Through our study, we explore the effects of (i) the mechanisms used to condition the generative model on semantic information (e.g., text prompt) and control metadata (e.g., crop size, random flip flag, etc.) on the model performance, and (ii) the transfer of the representations learned on smaller and lower-resolution datasets to larger ones on the training efficiency and model performance. We then propose a novel conditioning mechanism that disentangles semantic and control metadata conditionings and sets a new state-of-the-art in classconditional generation on the ImageNet-1k dataset - with FID improvements of 7% on 256 and 8% on 512 resolutions - as well as text-to-image generation on the CC12M dataset - with FID improvements of 8% on 256 and 23% on 512 resolution.


Towards Comprehensive Detection of Chinese Harmful Memes Junyu Lu

Neural Information Processing Systems

Harmful memes have proliferated on the Chinese Internet, while research on detecting Chinese harmful memes significantly lags behind due to the absence of reliable datasets and effective detectors. To this end, we focus on the comprehensive detection of Chinese harmful memes.



Federated Compositional Deep AUC Maximization

Neural Information Processing Systems

Federated learning has attracted increasing attention due to the promise of balancing privacy and large-scale learning; numerous approaches have been proposed. However, most existing approaches focus on problems with balanced data, and prediction performance is far from satisfactory for many real-world applications where the number of samples in different classes is highly imbalanced. To address this challenging problem, we developed a novel federated learning method for imbalanced data by directly optimizing the area under curve (AUC) score. In particular, we formulate the AUC maximization problem as a federated compositional minimax optimization problem, develop a local stochastic compositional gradient descent ascent with momentum algorithm, and provide bounds on the computational and communication complexities of our algorithm. To the best of our knowledge, this is the first work to achieve such favorable theoretical results. Finally, extensive experimental results confirm the efficacy of our method.


Zero-Shot Event-Intensity Asymmetric Stereo via Visual Prompting from Image Domain

Neural Information Processing Systems

Event-intensity asymmetric stereo systems have emerged as a promising approach for robust 3D perception in dynamic and challenging environments by integrating event cameras with frame-based sensors in different views. However, existing methods often suffer from overfitting and poor generalization due to limited dataset sizes and lack of scene diversity in the event domain. To address these issues, we propose a zero-shot framework that utilizes monocular depth estimation and stereo matching models pretrained on diverse image datasets. Our approach introduces a visual prompting technique to align the representations of frames and events, allowing the use of off-the-shelf stereo models without additional training. Furthermore, we introduce a monocular cue-guided disparity refinement module to improve robustness across static and dynamic regions by incorporating monocular depth information from foundation models. Extensive experiments on real-world datasets demonstrate the superior zero-shot evaluation performance and enhanced generalization ability of our method compared to existing approaches.


Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks Tianyu He a, Aritra Das

Neural Information Processing Systems

Large language models can solve tasks that were not present in the training set. This capability is believed to be due to in-context learning and skill composition. In this work, we study the emergence of in-context learning and skill composition in a collection of modular arithmetic tasks.