Goto

Collaborating Authors

 patcher



Paxion: Patching Action Knowledge in Video-Language Foundation Models

Neural Information Processing Systems

Action knowledge involves the understanding of textual, visual, and temporal aspects of actions. We introduce the Action Dynamics Benchmark (ActionBench) containing two carefully designed probing tasks: Action Antonym and Video Reversal, which targets multimodal alignment capabilities and temporal understanding skills of the model, respectively. Despite recent video-language models' (VidLM) impressive performance on various benchmark tasks, our diagnostic tasks reveal their surprising deficiency (near-random performance) in action knowledge, suggesting that current models rely on object recognition abilities as a shortcut for action understanding. To remedy this, we propose a novel framework, Paxion, along with a new Discriminative Video Dynamics Modeling (DVDM) objective. The Paxion framework utilizes a Knowledge Patcher network to encode new action knowledge and a Knowledge Fuser component to integrate the Patcher into frozen VidLMs without compromising their existing capabilities.


Repairing Catastrophic-Neglect in Text-to-Image Diffusion Models via Attention-Guided Feature Enhancement

Chang, Zhiyuan, Li, Mingyang, Wang, Junjie, Liu, Yi, Wang, Qing, Liu, Yang

arXiv.org Artificial Intelligence

Text-to-Image Diffusion Models (T2I DMs) have garnered significant attention for their ability to generate high-quality images from textual descriptions. However, these models often produce images that do not fully align with the input prompts, resulting in semantic inconsistencies. The most prominent issue among these semantic inconsistencies is catastrophic-neglect, where the images generated by T2I DMs miss key objects mentioned in the prompt. We first conduct an empirical study on this issue, exploring the prevalence of catastrophic-neglect, potential mitigation strategies with feature enhancement, and the insights gained. Guided by the empirical findings, we propose an automated repair approach named Patcher to address catastrophic-neglect in T2I DMs. Specifically, Patcher first determines whether there are any neglected objects in the prompt, and then applies attention-guided feature enhancement to these neglected objects, resulting in a repaired prompt. Experimental results on three versions of Stable Diffusion demonstrate that Patcher effectively repairs the issue of catastrophic-neglect, achieving 10.1%-16.3% higher Correct Rate in image generation compared to baselines.


FECoM: A Step towards Fine-Grained Energy Measurement for Deep Learning

Rajput, Saurabhsingh, Widmayer, Tim, Shang, Ziyuan, Kechagia, Maria, Sarro, Federica, Sharma, Tushar

arXiv.org Artificial Intelligence

With the increasing usage, scale, and complexity of Deep Learning (DL) models, their rapidly growing energy consumption has become a critical concern. Promoting green development and energy awareness at different granularities is the need of the hour to limit carbon emissions of DL systems. However, the lack of standard and repeatable tools to accurately measure and optimize energy consumption at a fine granularity (e.g., at method level) hinders progress in this area. In this paper, we introduce FECoM (Fine-grained Energy Consumption Meter), a framework for fine-grained DL energy consumption measurement. Specifically, FECoM provides researchers and developers a mechanism to profile DL APIs. FECoM addresses the challenges of measuring energy consumption at fine-grained level by using static instrumentation and considering various factors, including computational load and temperature stability. We assess FECoM's capability to measure fine-grained energy consumption for one of the most popular open-source DL frameworks, namely TensorFlow. Using FECoM, we also investigate the impact of parameter size and execution time on energy consumption, enriching our understanding of TensorFlow APIs' energy profiles. Furthermore, we elaborate on the considerations, issues, and challenges that one needs to consider while designing and implementing a fine-grained energy consumption measurement tool. We hope this work will facilitate further advances in DL energy measurement and the development of energy-aware practices for DL systems.


Fans resurrect 'Tomb Raider' in your web browser

Engadget

If you need a reminder of how far video games have come since the mid-90s, look no further than OpenTomb. Over the past four years, a handful of devoted developers have been rebuilding the original five Tomb Raider games from scratch, and the City of Vilcabamba level is available in your browser right now (heads up, game audio auto-plays from that link). The OpenTomb team wasn't able to retrieve the original Tomb Raider source code from Square Enix, so developers simply decided to re-make the game from the ground-up. They built their own engine and wrote their own code designed to take advantage of modern CPUs, graphics cards and gameplay features. "The older [an] engine gets, less chance it'll become compatible with further systems; but in [the] case of OpenTomb, you can port it to any platform you wish," the developers write on GitHub.