Genre
Mind-the-Glitch: Visual Correspondence for Detecting Inconsistencies in Subject-Driven Generation
We propose a novel approach for disentangling visual and semantic features from the backbones of pre-trained diffusion models, enabling visual correspondence in a manner analogous to the well-established semantic correspondence. While diffusion model backbones are known to encode semantically rich features, they must also contain visual features to support their image synthesis capabilities. However, isolating these visual features is challenging due to the absence of annotated datasets. To address this, we introduce an automated pipeline that constructs image pairs with annotated semantic and visual correspondences based on existing subject-driven image generation datasets, and design a contrastive architecture to separate the two feature types. Leveraging the disentangled representations, we propose a new metric, Visual Semantic Matching (VSM), that quantifies visual inconsistencies in subject-driven image generation. Empirical results show that our approach outperforms global feature-based metrics such as CLIP, DINO, and vision--language models in quantifying visual inconsistencies while also enabling spatial localization of inconsistent regions. To our knowledge, this is the first method that supports both quantification and localization of inconsistencies in subject-driven generation, offering a valuable tool for advancing this task.
Odd-shaped vessel hints at alchemy in medieval German castle
The tall container was almost certainly used for distillation experiments. More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. The ceramic container is over 1.5 feet tall. Breakthroughs, discoveries, and DIY tips sent six days a week. By signing up, you confirm you are 16+, will receive newsletters and promotional content and agree to our Terms of Use and acknowledge the data practices in our Privacy Policy .
Sex jumpstarted Earth's animal biodiversity
Many species didn't have much sex for millions of years. More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. Breakthroughs, discoveries, and DIY tips sent six days a week. By signing up, you confirm you are 16+, will receive newsletters and promotional content and agree to our Terms of Use and acknowledge the data practices in our Privacy Policy . Evolution is responsible for Earth's stunningly diverse spectrum of life, but that wasn't always the case.
700,000-year-old squirrel poop helps scientist recreate an ancient world
Descendants of these rodents are still alive today and are'like tiny Arctic pack rats.' More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. Researchers document a cluster of ancient Arctic ground squirrel faecal pellets preserved in permafrost at Hunker Creek, Yukon, in August 2022. These coprolites contain remarkably intact ancient DNA, offering rare glimpses into ice age ecosystems. Breakthroughs, discoveries, and DIY tips sent six days a week.
SALS: Sparse Attention in Latent Space for KV Cache Compression
Large Language Models (LLMs) capable of handling extended contexts are in high demand, yet their inference remains challenging due to substantial Key-Value (KV) cache size and high memory bandwidth requirements. Previous research has demonstrated that KV cache exhibits low-rank characteristics within the hidden dimension, suggesting the potential for effective compression. However, due to the widely adopted Rotary Position Embedding (RoPE) mechanism in modern LLMs, naive low -rank compression suffers severe accuracy degradation or creates a new speed bottleneck, as the low-rank cache must first be reconstructed in order to apply RoPE. In this paper, we introduce two key insights: first, the application of RoPE to the key vectors increases their variance, which in turn results in a higher rank; second, after the key vectors are transformed into the latent space, they largely maintain their representation across most layers. Based on these insights, we propose the Sparse Attention in Latent Space (SALS) framework.
Vulnerable Data-Aware Adversarial Training
Fast adversarial training (FAT) has been considered as one of the most effective alternatives to the computationally-intensive adversarial training. Generally, FAT methods pay equal attention to each sample of the target task. However, the distance between each sample and the decision boundary is different, learning samples which are far from the decision boundary (i.e., less important to adversarial robustness) brings additional training cost and leads to sub-optimal results. To tackle this issue, we present vulnerable data-aware adversarial training (VDAT) in this study. Specifically, we first propose a margin-based vulnerability calculation method to measure the vulnerability of data samples. Moreover, we propose a vulnerability-aware data filtering method to reduce the training data for adversarial training thus improve the training efficiency. The experiments are conducted in terms of adversarial training and robust neural architecture search on CIFAR-10, CIFAR-100, and ImageNet-1K. The results demonstrate that VDAT is up to 76% more efficient than state-of-the-art FAT methods, while achieving improvements regarding the natural accuracy and adversarial accuracy in both scenarios. Furthermore, the visualizations and ablation studies show the effectiveness of both core components designed in VDAT.
Diffusion Feature Field for Text-based 3D Editing with Gaussian Splatting
Recent advances in text-based image editing have motivated the extension of these techniques into the 3D domain. However, existing methods typically apply 2D diffusion models independently to multiple viewpoints, resulting in significant artifacts, most notably the Janus problem, due to inconsistencies across edited views. To address this, we propose a novel approach termed DFFSplat, which integrates a 3D-consistent diffusion feature field into the editing pipeline. By rendering and injecting these 3D-consistent structural features into intermediate layers of a 2D diffusion model, our method effectively enforces geometric alignment and semantic coherence across views. However, averaging 3D features during the feature field learning process can lead to the loss of fine texture details. To overcome this, we introduce a dual-encoder architecture to disentangle view-independent structural information from view-dependent appearance details. By encoding only the disentangled structure into the 3D field and injecting it during 2D editing, our method produces semantically and multi-view coherent edited images while maintaining high text fidelity. Additionally, we employ a time-invariance objective to ensure consistency across diffusion timesteps, enhancing the stability of learned representations. Experimental results demonstrate that our method achieves state-of-the-art performance in terms of text-fidelity, and better preserves structural and semantic consistency compared to existing approaches.
Interview with AAAI Fellow Sanmay Das: multiagent systems
Each year the AAAI recognizes a group of individuals who have made significant, sustained contributions to the field of artificial intelligence by appointing them as Fellows. We're talking to some of the 2026 AAAI Fellows to find out more about their work. In this interview, we chat to Sanmay Das, who was elected as a Fellow . Could you start with a quick introduction, where you work, and your general area of research? Broadly speaking, I work in multiagent systems. I've done a lot of work at the intersection of AI and economics, and over the last decade or so I've thought a lot about projects in the AI for social impact and social good space. In particular, my interest has been in the allocation of scarce societal resources, thinking about how AI can be integrated, and what it tells us about systems where we don't necessarily want full free market resource allocation.
Design tweaks promote responsible AI use for environmental protection, research shows
Artificial intelligence systems that ask users to pause to consider AI's energy consumption and environmental impacts are likely to reduce unnecessary AI use, new research by Oregon State University suggests. The findings, published in Science Communication, are important as AI is already using electricity on scales that can be meaningfully compared to households, factories and towns. For example, the electricity needed to train a large language model would power 120 homes for a year, the researchers note; one AI-generated image has roughly the same energy cost as charging a smartphone. With about 85% of the world's energy still coming from fossil fuels, every megawatt-hour that can be carved from AI's electricity profile is significant, says the study's leader, Cheng "Chris" Chen of the OSU College of Liberal Arts. "Despite AI's substantial environmental impacts, information about those impacts is rarely disclosed or effectively communicated to everyday users of AI systems," said Chen, assistant professor in the School of Communication.