Du, Shaoyi
ERetinex: Event Camera Meets Retinex Theory for Low-Light Image Enhancement
Guo, Xuejian, Tian, Zhiqiang, Wang, Yuehang, Li, Siqi, Jiang, Yu, Du, Shaoyi, Gao, Yue
Low-light image enhancement aims to restore the under-exposure image captured in dark scenarios. Under such scenarios, traditional frame-based cameras may fail to capture the structure and color information due to the exposure time limitation. Event cameras are bio-inspired vision sensors that respond to pixel-wise brightness changes asynchronously. Event cameras' high dynamic range is pivotal for visual perception in extreme low-light scenarios, surpassing traditional cameras and enabling applications in challenging dark environments. In this paper, inspired by the success of the retinex theory for traditional frame-based low-light image restoration, we introduce the first methods that combine the retinex theory with event cameras and propose a novel retinex-based low-light image restoration framework named ERetinex. Among our contributions, the first is developing a new approach that leverages the high temporal resolution data from event cameras with traditional image information to estimate scene illumination accurately. This method outperforms traditional image-only techniques, especially in low-light environments, by providing more precise lighting information. Additionally, we propose an effective fusion strategy that combines the high dynamic range data from event cameras with the color information of traditional images to enhance image quality. Through this fusion, we can generate clearer and more detail-rich images, maintaining the integrity of visual information even under extreme lighting conditions. The experimental results indicate that our proposed method outperforms state-of-the-art (SOTA) methods, achieving a gain of 1.0613 dB in PSNR while reducing FLOPS by \textbf{84.28}\%.
Hypergraph Foundation Model
Feng, Yifan, Liu, Shiquan, Han, Xiangmin, Du, Shaoyi, Wu, Zongze, Hu, Han, Gao, Yue
Hypergraph neural networks (HGNNs) effectively model complex high-order relationships in domains like protein interactions and social networks by connecting multiple vertices through hyperedges, enhancing modeling capabilities, and reducing information loss. Developing foundation models for hypergraphs is challenging due to their distinct data, which includes both vertex features and intricate structural information. We present Hyper-FM, a Hypergraph Foundation Model for multi-domain knowledge extraction, featuring Hierarchical High-Order Neighbor Guided Vertex Knowledge Embedding for vertex feature representation and Hierarchical Multi-Hypergraph Guided Structural Knowledge Extraction for structural information. Additionally, we curate 10 text-attributed hypergraph datasets to advance research between HGNNs and LLMs. Experiments on these datasets show that Hyper-FM outperforms baseline methods by approximately 13.3\%, validating our approach. Furthermore, we propose the first scaling law for hypergraph foundation models, demonstrating that increasing domain diversity significantly enhances performance, unlike merely augmenting vertex and hyperedge counts. This underscores the critical role of domain diversity in scaling hypergraph models.
Beyond Graphs: Can Large Language Models Comprehend Hypergraphs?
Feng, Yifan, Yang, Chengwu, Hou, Xingliang, Du, Shaoyi, Ying, Shihui, Wu, Zongze, Gao, Yue
Existing benchmarks like NLGraph and GraphQA evaluate LLMs on graphs by focusing mainly on pairwise relationships, overlooking the high-order correlations found in real-world data. Hypergraphs, which can model complex beyondpairwise relationships, offer a more robust framework but are still underexplored in the context of LLMs. To address this gap, we introduce LLM4Hypergraph, the first comprehensive benchmark comprising 21,500 problems across eight loworder, five high-order, and two isomorphism tasks, utilizing both synthetic and real-world hypergraphs from citation networks and protein structures. We evaluate six prominent LLMs, including GPT-4o, demonstrating our benchmark's effectiveness in identifying model strengths and weaknesses. Our specialized prompting framework incorporates seven hypergraph languages and introduces two novel techniques, Hyper-BAG and Hyper-COT, which enhance high-order reasoning and achieve an average 4% (up to 9%) performance improvement on structure classification tasks. This work establishes a foundational testbed for integrating hypergraph computational capabilities into LLMs, advancing their comprehension. Large Language Models (LLMs) (Vaswani, 2017; Devlin, 2018; Brown, 2020; Ouyang et al., 2022) have made significant strides in domains such as dialogue systems (Bubeck et al., 2023) and image understanding (Zhao et al., 2023). However, they often produce untruthful or unsupported content, known as hallucinations (Wang et al., 2023). To mitigate this, Retrieval-Augmented Generation (RAG) (Vu et al., 2023) enhances prompts with relevant, factual, and up-to-date information (Khandelwal et al., 2019), thereby grounding outputs more effectively. RAG typically retrieves structured data with complex relational dependencies (Guu et al., 2020), such as social networks or molecular structures, which are efficiently represented as graphs. Graph representations capture intricate interdependencies and provide a concise encapsulation of data relationships. This has spurred research to improve LLMs' understanding of graph-structured data (Guo et al., 2023), leading to benchmarks like NLGraph (Wang et al., 2024), GraphQA (Fatemi et al., 2023), and LLM4DyG (Zhang et al., 2023). These benchmarks evaluate and enhance LLMs' capabilities in handling graph-related tasks, promoting the integration of graph-based representations in large language models. However, real-world data often involve complex correlations beyond simple pairwise relationships (Zhou et al., 2006). For example, sentences within a document sharing common keywords may exhibit high-order correlations that traditional graph models fail to capture (PM et al., 2017). In multimodal scenarios (Kim et al., 2020; Feng et al., 2023), interactions across different data types further increase correlation complexity, exceeding the capabilities of conventional graphs, which are limited to pairwise correlations.
Hypergraph-based Multi-View Action Recognition using Event Cameras
Gao, Yue, Lu, Jiaxuan, Li, Siqi, Li, Yipeng, Du, Shaoyi
Action recognition from video data forms a cornerstone with wide-ranging applications. Single-view action recognition faces limitations due to its reliance on a single viewpoint. In contrast, multi-view approaches capture complementary information from various viewpoints for improved accuracy. Recently, event cameras have emerged as innovative bio-inspired sensors, leading to advancements in event-based action recognition. However, existing works predominantly focus on single-view scenarios, leaving a gap in multi-view event data exploitation, particularly in challenges like information deficit and semantic misalignment. To bridge this gap, we introduce HyperMV, a multi-view event-based action recognition framework. HyperMV converts discrete event data into frame-like representations and extracts view-related features using a shared convolutional network. By treating segments as vertices and constructing hyperedges using rule-based and KNN-based strategies, a multi-view hypergraph neural network that captures relationships across viewpoint and temporal features is established. The vertex attention hypergraph propagation is also introduced for enhanced feature fusion. To prompt research in this area, we present the largest multi-view event-based action dataset $\text{THU}^{\text{MV-EACT}}\text{-50}$, comprising 50 actions from 6 viewpoints, which surpasses existing datasets by over tenfold. Experimental results show that HyperMV significantly outperforms baselines in both cross-subject and cross-view scenarios, and also exceeds the state-of-the-arts in frame-based multi-view action recognition.
ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition
Xie, Weidong, Luo, Lun, Ye, Nanfei, Ren, Yi, Du, Shaoyi, Wang, Minhang, Xu, Jintao, Ai, Rui, Gu, Weihao, Chen, Xieyuanli
Place recognition is an important task for robots and autonomous cars to localize themselves and close loops in pre-built maps. While single-modal sensor-based methods have shown satisfactory performance, cross-modal place recognition that retrieving images from a point-cloud database remains a challenging problem. Current cross-modal methods transform images into 3D points using depth estimation for modality conversion, which are usually computationally intensive and need expensive labeled data for depth supervision. In this work, we introduce a fast and lightweight framework to encode images and point clouds into place-distinctive descriptors. We propose an effective Field of View (FoV) transformation module to convert point clouds into an analogous modality as images. This module eliminates the necessity for depth estimation and helps subsequent modules achieve real-time performance. We further design a non-negative factorization-based encoder to extract mutually consistent semantic features between point clouds and images. This encoder yields more distinctive global descriptors for retrieval. Experimental results on the KITTI dataset show that our proposed methods achieve state-of-the-art performance while running in real time. Additional evaluation on the HAOMO dataset covering a 17 km trajectory further shows the practical generalization capabilities. We have released the implementation of our methods as open source at: https://github.com/haomo-ai/ModaLink.git.
Tolerating Annotation Displacement in Dense Object Counting via Point Annotation Probability Map
Chen, Yuehai, Yang, Jing, Chen, Badong, Gang, Hua, Du, Shaoyi
Counting objects in crowded scenes remains a challenge to computer vision. The current deep learning based approach often formulate it as a Gaussian density regression problem. Such a brute-force regression, though effective, may not consider the annotation displacement properly which arises from the human annotation process and may lead to different distributions. We conjecture that it would be beneficial to consider the annotation displacement in the dense object counting task. To obtain strong robustness against annotation displacement, generalized Gaussian distribution (GGD) function with a tunable bandwidth and shape parameter is exploited to form the learning target point annotation probability map, PAPM. Specifically, we first present a hand-designed PAPM method (HD-PAPM), in which we design a function based on GGD to tolerate the annotation displacement. For end-to-end training, the hand-designed PAPM may not be optimal for the particular network and dataset. An adaptively learned PAPM method (AL-PAPM) is proposed. To improve the robustness to annotation displacement, we design an effective transport cost function based on GGD. The proposed PAPM is capable of integration with other methods. We also combine PAPM with P2PNet through modifying the matching cost matrix, forming P2P-PAPM. This could also improve the robustness to annotation displacement of P2PNet. Extensive experiments show the superiority of our proposed methods.
CoBigICP: Robust and Precise Point Set Registration using Correntropy Metrics and Bidirectional Correspondence
Yin, Pengyu, Wang, Di, Du, Shaoyi, Ying, Shihui, Gao, Yue, Zheng, Nanning
In this paper, we propose a novel probabilistic variant of iterative closest point (ICP) dubbed as CoBigICP. The method leverages both local geometrical information and global noise characteristics. Locally, the 3D structure of both target and source clouds are incorporated into the objective function through bidirectional correspondence. Globally, error metric of correntropy is introduced as noise model to resist outliers. Importantly, the close resemblance between normal-distributions transform (NDT) and correntropy is revealed. To ease the minimization step, an on-manifold parameterization of the special Euclidean group is proposed. Extensive experiments validate that CoBigICP outperforms several well-known and state-of-the-art methods.