Goto

Collaborating Authors

 hawk


HAWK: Learning to Understand Open-World Video Anomalies

Neural Information Processing Systems

Video Anomaly Detection (VAD) systems can autonomously monitor and identify disturbances, reducing the need for manual labor and associated costs. However, current VAD systems are often limited by their superficial semantic understanding of scenes and minimal user interaction. Additionally, the prevalent data scarcity in existing datasets restricts their applicability in open-world scenarios.In this paper, we introduce HAWK, a novel framework that leverages interactive large Visual Language Models (VLM) to interpret video anomalies precisely. Recognizing the difference in motion information between abnormal and normal videos, HAWK explicitly integrates motion modality to enhance anomaly identification. To reinforce motion attention, we construct an auxiliary consistency loss within the motion and video space, guiding the video branch to focus on the motion modality. Moreover, to improve the interpretation of motion-to-language, we establish a clear supervisory relationship between motion and its linguistic representation. Furthermore, we have annotated over 8,000 anomaly videos with language descriptions, enabling effective training across diverse open-world scenarios, and also created 8,000 question-answering pairs for users' open-world questions. The final results demonstrate that HAWK achieves SOTA performance, surpassing existing baselines in both video description generation and question-answering. Our codes/dataset/demo will be released at https://github.com/jqtangust/hawk.


Hawk: Leveraging Spatial Context for Faster Autoregressive Text-to-Image Generation

Chen, Zhi-Kai, Jiang, Jun-Peng, Ye, Han-Jia, Zhan, De-Chuan

arXiv.org Artificial Intelligence

Autoregressive (AR) image generation models are capable of producing high-fidelity images but often suffer from slow inference due to their inherently sequential, token-by-token decoding process. Speculative decoding, which employs a lightweight draft model to approximate the output of a larger AR model, has shown promise in accelerating text generation without compromising quality. However, its application to image generation remains largely underexplored. The challenges stem from a significantly larger sampling space, which complicates the alignment between the draft and target model outputs, coupled with the inadequate use of the two-dimensional spatial structure inherent in images, thereby limiting the modeling of local dependencies. To overcome these challenges, we introduce Hawk, a new approach that harnesses the spatial structure of images to guide the speculative model toward more accurate and efficient predictions. Experimental results on multiple text-to-image benchmarks demonstrate a 1.71x speedup over standard AR models, while preserving both image fidelity and diversity.


HAWK: Learning to Understand Open-World Video Anomalies

Neural Information Processing Systems

Video Anomaly Detection (VAD) systems can autonomously monitor and identify disturbances, reducing the need for manual labor and associated costs. However, current VAD systems are often limited by their superficial semantic understanding of scenes and minimal user interaction. Additionally, the prevalent data scarcity in existing datasets restricts their applicability in open-world scenarios.In this paper, we introduce HAWK, a novel framework that leverages interactive large Visual Language Models (VLM) to interpret video anomalies precisely. Recognizing the difference in motion information between abnormal and normal videos, HAWK explicitly integrates motion modality to enhance anomaly identification. To reinforce motion attention, we construct an auxiliary consistency loss within the motion and video space, guiding the video branch to focus on the motion modality.


Serious Games: Human-AI Interaction, Evolution, and Coevolution

Doreswamy, Nandini, Horstmanshof, Louise

arXiv.org Artificial Intelligence

The serious games between humans and AI have only just begun. Evolutionary Game Theory (EGT) models the competitive and cooperative strategies of biological entities. EGT could help predict the potential evolutionary equilibrium of humans and AI. The objective of this work was to examine some of the EGT models relevant to human-AI interaction, evolution, and coevolution. Of thirteen EGT models considered, three were examined: the Hawk-Dove Game, Iterated Prisoner's Dilemma, and the War of Attrition. This selection was based on the widespread acceptance and clear relevance of these models to potential human-AI evolutionary dynamics and coevolutionary trajectories. The Hawk-Dove Game predicts balanced mixed-strategy equilibria based on the costs of conflict. It also shows the potential for balanced coevolution rather than dominance. Iterated Prisoner's Dilemma suggests that repeated interaction may lead to cognitive coevolution. It demonstrates how memory and reciprocity can lead to cooperation. The War of Attrition suggests that competition for resources may result in strategic coevolution, asymmetric equilibria, and conventions on sharing resources. Therefore, EGT may provide a suitable framework to understand and predict the human-AI evolutionary dynamic. However, future research could extend beyond EGT and explore additional frameworks, empirical validation methods, and interdisciplinary perspectives. AI is being shaped by human input and is evolving in response to it. So too, neuroplasticity allows the human brain to grow and evolve in response to stimuli. If humans and AI converge in future, what might be the result of human neuroplasticity combined with an ever-evolving AI? Future research should be mindful of the ethical and cognitive implications of human-AI interaction, evolution, and coevolution.


Leveraging State Space Models in Long Range Genomics

Popov, Matvei, Kallala, Aymen, Ramesh, Anirudha, Hennouni, Narimane, Khaitan, Shivesh, Gentry, Rick, Cohen, Alain-Sam

arXiv.org Artificial Intelligence

Long-range dependencies are critical for understanding genomic structure and function, yet most conventional methods struggle with them. Widely adopted transformer-based models, while excelling at short-context tasks, are limited by the attention module's quadratic computational complexity and inability to extrapolate to sequences longer than those seen in training. In this work, we explore State Space Models (SSMs) as a promising alternative by benchmarking two SSM-inspired architectures, Caduceus and Hawk, on long-range genomics modeling tasks under conditions parallel to a 50M parameter transformer baseline. We discover that SSMs match transformer performance and exhibit impressive zero-shot extrapolation across multiple tasks, handling contexts 10 to 100 times longer than those seen during training, indicating more generalizable representations better suited for modeling the long and complex human genome. Moreover, we demonstrate that these models can efficiently process sequences of 1M tokens on a single GPU, allowing for modeling entire genomic regions at once, even in labs with limited compute. Our findings establish SSMs as efficient and scalable for long-context genomic analysis.


Table-to-Text Generation with Pretrained Diffusion Models

Krylov, Aleksei S., Somov, Oleg D.

arXiv.org Artificial Intelligence

Diffusion models have demonstrated significant potential in achieving state-of-the-art performance across various text generation tasks. In this systematic study, we investigate their application to the table-to-text problem by adapting the diffusion model to the task and conducting an in-depth analysis. Our experiments cover multiple aspects of diffusion models training. We explore sampling strategy influence by inducing recent diffusion model accelerator DPM-Solver++ into our core model. We have tested different prediction aggregation methods, like ROVER and Minimum Bayes-Risk (MBR). Our studies cover the impact of the pre-training phase in diffusion models and the generation length constraints influence. We also have compared diffusion model generation with auto-regressive text-to-text models with different temperature settings for diversity evaluation. Our key observation is that diffusion models demonstrate the balance between quality and diversity while auto-regressive text-to-text models are not successful at handling both at the same time. Furthermore, we found out that to achieve the highest quality possible, it is preferable to use a regular sampler with the strictest length constraint to create multiple samples, and then use MBR to aggregate the predictions. However, if you are prepared to give up high level of diversity and to accelerate the process, you can also utilize a fast sampler DPM-Solver++. Our findings reveal that diffusion models achieve comparable results in the table-to-text domain, highlighting their viability in the table-to-text challenge as a promising research direction.


Multiple-input, multiple-output modal testing of a Hawk T1A aircraft: A new full-scale dataset for structural health monitoring

Wilson, James, Champneys, Max D., Tipuric, Matt, Mills, Robin, Wagg, David J., Rogers, Timothy J.

arXiv.org Artificial Intelligence

The use of measured vibration data from structures has a long history of enabling the development of methods for inference and monitoring. In particular, applications based on system identification and structural health monitoring have risen to prominence over recent decades and promise significant benefits when implemented in practice. However, significant challenges remain in the development of these methods. The introduction of realistic, full-scale datasets will be an important contribution to overcoming these challenges. This paper presents a new benchmark dataset capturing the dynamic response of a decommissioned BAE Systems Hawk T1A. The dataset reflects the behaviour of a complex structure with a history of service that can still be tested in controlled laboratory conditions, using a variety of known loading and damage simulation conditions. As such, it provides a key stepping stone between simple laboratory test structures and in-service structures. In this paper, the Hawk structure is described in detail, alongside a comprehensive summary of the experimental work undertaken. Following this, key descriptive highlights of the dataset are presented, before a discussion of the research challenges that the data present. Using the dataset, non-linearity in the structure is demonstrated, as well as the sensitivity of the structure to damage of different types. The dataset is highly applicable to many academic enquiries and additional analysis techniques which will enable further advancement of vibration-based engineering techniques.


Can Tech Stop Animal Poachers in Their Tracks?

Mother Jones

This story was originally published by Slate's Future Tense partnership and is reproduced here as part of the Climate Desk collaboration. In August 2021, forest range officer Remya Raghavan caught three people carrying wild boar meat in the Wayanad forest of Kerala, a state in southern India. Possessing wild animal meat is a crime under the country's 1972 Wildlife Protection Act, so Raghavan entered all the details of the crime--location, witnesses, names of the accused, items seized, and section of the forest--in a mobile application. Just like that, the case was officially registered in the app-based system, which signaled that it needed to be taken to court. The app Raghavan used is called HAWK, or Hostile Activity Watch Kernel, and it appears to be the first such digital intelligence gathering system for wildlife crime in India.


In the Shadowy, Hard-to-Track Poaching Industry, Governments Hope a New Tool Can Solve an Old Problem

Slate

In August 2021, forest range officer Remya Raghavan caught three people carrying wild boar meat in the Wayanad forest of Kerala, a state in southern India. Possessing wild animal meat is a crime under the country's 1972 Wildlife Protection Act, so Raghavan entered all the details of the crime--location, witnesses, names of the accused, items seized, and section of the forest--in a mobile application. Just like that, the case was officially registered in the app-based system, which signaled that it needed to be taken to court. The app Raghavan used is called HAWK, or Hostile Activity Watch Kernel, and it appears to be the first such digital intelligence gathering system for wildlife crime in India. It helps officers like Raghavan centralize and share information on forest and wildlife crimes in real time.


Interview with Safa Alver: Scalable and robust planning in lifelong reinforcement learning

AIHub

In their paper Minimal Value-Equivalent Partial Models for Scalable and Robust Planning in Lifelong Reinforcement Learning, Safa Alver and Doina Precup introduced special kinds of models that allow for performing scalable and robust planning in lifelong reinforcement learning scenarios. In this interview, Safa Alver tells us more about this work. It has long been argued that in order for reinforcement learning (RL) agents to perform well in lifelong RL (LRL) scenarios (which are scenarios like the ones we, biological agents, encounter in real life), they should be able to learn a model of their environment, which allows for advanced computational abilities such as counterfactual reasoning and fast re-planning. Even though this is a widely accepted view in the community, the question of what kinds of models would be better suited for performing LRL still remains unanswered. As LRL scenarios involve large environments with lots of irrelevant aspects and environments with unexpected distribution shifts, directly applying the ideas developed in the classical model-based RL literature to these scenarios is likely to lead to catastrophic results in building scalable and robust lifelong learning agents.