Technology
ForensicHub: AUnified Benchmark & Codebase for All-Domain Fake Image Detection and Localization
The field of Fake Image Detection and Localization (FIDL) is highly fragmented, encompassing four domains: deepfake detection (Deepfake), image manipulation detection and localization (IMDL), artificial intelligence-generated image detection (AIGC), and document image manipulation localization (Doc). Although individual benchmarks exist in some domains, a unified benchmark for all domains in FIDL remains blank.
Purity Law for Neural Routing Problem Solvers with Enhanced Generalizability
Achieving generalization in neural approaches across different scales and distributions remains a significant challenge for routing problems. A key obstacle is that neural networks often fail to learn robust principles for identifying universal patterns and deriving optimal solutions from diverse instances. In this paper, we first uncover Purity Law, a fundamental structural principle for optimal solutions of routing problems, defining that edge prevalence grows exponentially with the sparsity of surrounding vertices. Statistically and theoretically validated across diverse instances, Purity Law reveals a consistent bias toward local sparsity in global optima. Building on this insight, we propose Purity Policy Optimization (PUPO), a novel training paradigm that explicitly aligns characteristics of neural solutions with Purity Law during the solution construction process to enhance generalization. Extensive experiments demonstrate that PUPO can be seamlessly integrated with popular neural solvers, significantly enhancing their generalization performance without incurring additional computational overhead during inference. The code is available at https://github.com/Kejun0627/PUPO.
5d7e8991f75f3e5af14edf7aebb5be5e-Paper-Conference.pdf
Theoretical efforts to prove advantages of Transformers in comparison with classical architectures such as feedforward and recurrent neural networks have mostly focused on representational power. In this work, we take an alternative perspective and prove that even with infinite compute, feedforward and recurrent networks may suffer from larger sample complexity compared to Transformers, as the latter can adapt to a form of dynamic sparsity. Specifically, we consider a sequence-tosequence data generating model on sequences of length N, where the output at each position only depends on q N relevant tokens, and the positions of these tokens are described in the input prompt. We prove that a single-layer Transformer can learn this model if and only if its number of attention heads is at least q, in which case it achieves a sample complexity almost independent of N, while recurrent networks require NΩ(1) samples on the same problem. If we simplify this model, recurrent networks may achieve a complexity almost independent of N, while feedforward networks still require N samples. Our proposed sparse retrieval model illustrates a natural hierarchy in sample complexity across these architectures.
Nvidia's AI squadmate is finally dropping into PUBG
Nvidia ACE, a new AI technology for creating realistic gaming NPCs, is now available in PUBG's special Ally Duo Mode until June 30th. PCWorld reports that ACE uses small language models to enable dynamic NPC interactions and real-time speech synthesis without pre-recorded dialogue. The technology requires an Nvidia graphics card with 8GB+ memory and could revolutionize gaming by making NPCs more lifelike and interactive. One of the frustrating things about the proliferation of "AI", in the large language model sense, is how it muddles previously serviceable terms. It's neither artificial intelligence, in the sci-fi robot sense, nor artificial intelligence in the video game sense -- i.e., deliberate programmed behavior not controlled by the player.
HEROFILTER: Adaptive Spectral Graph Filter for Varying Heterophilic Relations
Graph heterophily, where connected nodes have different labels, has attracted significant interest recently. Most existing works adopt a simplified approach using low-pass filters for homophilic graphs and high-pass filters for heterophilic graphs. However, we discover that the relationship between graph heterophily and spectral filters is more complex - the optimal filter response varies across frequency components and does not follow a strict monotonic correlation with heterophily degree. This finding challenges conventional fixed filter designs and suggests the need for adaptive filtering to preserve expressiveness in graph embeddings. Formally, natural questions arise: Given a heterophilic graph G, how and to what extent will the varying heterophily degree of G affect the performance of GNNs? How can we design adaptive filters to fit those varying heterophilic connections? Our theoretical analysis reveals that the average frequency response of GNNs and graph heterophily degree do not follow a strict monotonic correlation, necessitating adaptive graph filters to guarantee good generalization performance. Hence, we propose HEROFILTER, a simple yet powerful GNN, which extracts information across the heterophily spectrum and combines salient representations through adaptive mixing. HEROFILTER's superior performance achieves up to 9.2% accuracy improvement over leading baselines across homophilic and heterophilic graphs.
Epistemic Uncertainty Estimation in Regression Ensemble Models with Pairwise Epistemic Estimators Lucas Berry, David Meger Department of Computer Science McGill University lucas.berry@mail.mcgill.ca
This work introduces a novel approach, Pairwise Epistemic Estimators (PairEpEsts), for epistemic uncertainty estimation in ensemble models for regression tasks using pairwise-distance estimators (PaiDEs). By utilizing the pairwise distances between model components, PaiDEs establish bounds on entropy. We leverage this capability to enhance the performance of Bayesian Active Learning by Disagreement (BALD). Notably, unlike sample-based Monte Carlo estimators, PairEpEsts can estimate epistemic uncertainty up to 100 times faster and demonstrate superior performance in higher dimensions. To validate our approach, we conducted a varied series of regression experiments on commonly used benchmarks: 1D sinusoidal data, Pendulum, Hopper, Ant, and Humanoid, demonstrating PairEpEsts' advantage over baselines in high-dimensional regression active learning.
scSplit: Bringing Severity Cognizance to Image Decomposition in Fluorescence Microscopy
Fluorescence microscopy, while being a key driver for progress in the life sciences, is also subject to technical limitations. To overcome them, computational multiplexing techniques have recently been proposed, which allow multiple cellular structures to be captured in a single image and later be unmixed. Existing image decomposition methods are trained on a set of superimposed input images and the respective unmixed target images. It is critical to note that the relative strength (mixing ratio) of the superimposed images for a given input is a priori unknown. However, existing methods are trained on a fixed intensity ratio of superimposed inputs, making them not cognizant of the range of relative intensities that can occur in fluorescence microscopy.
Universal Video Temporal Grounding with Generative Multi-modal Large Language Models
This paper presents a computational model for universal video temporal grounding, which accurately localizes temporal moments in videos based on natural language queries (e.g., questions or descriptions). Unlike existing methods that are often limited to specific video domains or durations, we propose UniTime, a robust and universal video grounding model leveraging the strong vision-language understanding capabilities of generative Multi-modal Large Language Models (MLLMs). Our model effectively handles videos of diverse views, genres, and lengths while comprehending complex language queries. The key contributions include: (i) We consider steering strong MLLMs for temporal grounding in videos. To enable precise timestamp outputs, we incorporate temporal information by interleaving timestamp tokens with video tokens.
Windows Media Player update still can't beat the old version
PCWorld examines the latest Windows Media Player update, comparing its functionality and performance against the classic legacy version that many users still prefer. The updated media player remains inferior to its predecessor, lacking key features and polish that made the original version reliable for audio and video playback. Despite Microsoft's efforts to modernize the application, users may find better value sticking with the older Windows Media Player for their multimedia needs. Windows Insider members have been given access to a new version of Windows Media Player. In taking a closer look at the new version, Windows Latest notes that it offers a number of improvements, not least in terms of stability and the handling of subtitles.
AI is making journalistic language more repetitive and predictable – and it's a problem for all of us
AI is making journalistic language more repetitive and predictable - and it's a problem for all of us What happens to language when a growing amount of text published in the press, online and on social media is written by machines? This question is not just important for the profession of journalism - it also has an impact on the richness of the language we all use to comprehend, describe and discuss reality itself. Historically, the press has been a space where public language grows and becomes richer. It is not, of course, the only driver of linguistic change, but it is one of the fields where new or emerging words, turns of phrase and ways of describing facts begin to circulate within society. Studies on journalistic language and neologisms clearly demonstrate that newspapers are platforms for the creation and dissemination of new vocabulary, especially when it is needed to report on events, technology and social changes for a broad audience.