AITopics | fusionnet

Collaborating Authors

fusionnet

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Modality-Aware Infrared and Visible Image Fusion with Target-Aware Supervision

Sun, Tianyao, Xiang, Dawei, Ding, Tianqi, Fang, Xiang, Qi, Yijiashun, Zhao, Zunduo

arXiv.org Artificial IntelligenceSep-16-2025

Infrared and visible image fusion (IVIF) is a fundamental task in multi-modal perception that aims to integrate complementary structural and textural cues from different spectral domains. In this paper, we propose FusionNet, a novel end-to-end fusion framework that explicitly models inter-modality interaction and enhances task-critical regions. FusionNet introduces a modality-aware attention mechanism that dynamically adjusts the contribution of infrared and visible features based on their discriminative capacity. To achieve fine-grained, interpretable fusion, we further incorporate a pixel-wise alpha blending module, which learns spatially-varying fusion weights in an adaptive and content-aware manner. Moreover, we formulate a target-aware loss that leverages weak ROI supervision to preserve semantic consistency in regions containing important objects (e.g., pedestrians, vehicles). Experiments on the public M3FD dataset demonstrate that FusionNet generates fused images with enhanced semantic preservation, high perceptual quality, and clear interpretability. Our framework provides a general and extensible solution for semantic-aware multi-modal image fusion, with benefits for downstream tasks such as object detection and scene understanding.

artificial intelligence, image understanding, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.11476

Country:

North America > United States > Michigan (0.28)
North America > United States > Connecticut (0.28)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.93)

Add feedback

Multi-modal Iterative and Deep Fusion Frameworks for Enhanced Passive DOA Sensing via a Green Massive H2AD MIMO Receiver

Bai, Jiatong, Chen, Minghao, Tang, Wankai, Li, Yifan, Pan, Cunhua, Wu, Yongpeng, Shu, Feng

arXiv.org Artificial IntelligenceNov-11-2024

Most existing DOA estimation methods assume ideal source incident angles with minimal noise. Moreover, directly using pre-estimated angles to calculate weighted coefficients can lead to performance loss. Thus, a green multi-modal (MM) fusion DOA framework is proposed to realize a more practical, low-cost and high time-efficiency DOA estimation for a H$^2$AD array. Firstly, two more efficient clustering methods, global maximum cos\_similarity clustering (GMaxCS) and global minimum distance clustering (GMinD), are presented to infer more precise true solutions from the candidate solution sets. Based on this, an iteration weighted fusion (IWF)-based method is introduced to iteratively update weighted fusion coefficients and the clustering center of the true solution classes by using the estimated values. Particularly, the coarse DOA calculated by fully digital (FD) subarray, serves as the initial cluster center. The above process yields two methods called MM-IWF-GMaxCS and MM-IWF-GMinD. To further provide a higher-accuracy DOA estimation, a fusion network (fusionNet) is proposed to aggregate the inferred two-part true angles and thus generates two effective approaches called MM-fusionNet-GMaxCS and MM-fusionNet-GMinD. The simulation outcomes show the proposed four approaches can achieve the ideal DOA performance and the CRLB. Meanwhile, proposed MM-fusionNet-GMaxCS and MM-fusionNet-GMinD exhibit superior DOA performance compared to MM-IWF-GMaxCS and MM-IWF-GMinD, especially in extremely-low SNR range.

artificial intelligence, estimation, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2411.06927

Country:

Asia > China > Jiangsu Province > Nanjing (0.05)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hainan Province (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Add feedback

Multimodal Action Quality Assessment

Zeng, Ling-An, Zheng, Wei-Shi

arXiv.org Artificial IntelligenceFeb-20-2024

Action quality assessment (AQA) is to assess how well an action is performed. Previous works perform modelling by only the use of visual information, ignoring audio information. We argue that although AQA is highly dependent on visual information, the audio is useful complementary information for improving the score regression accuracy, especially for sports with background music, such as figure skating and rhythmic gymnastics. To leverage multimodal information for AQA, i.e., RGB, optical flow and audio information, we propose a Progressive Adaptive Multimodal Fusion Network (PAMFN) that separately models modality-specific information and mixed-modality information. Our model consists of with three modality-specific branches that independently explore modality-specific information and a mixed-modality branch that progressively aggregates the modality-specific information from the modality-specific branches. To build the bridge between modality-specific branches and the mixed-modality branch, three novel modules are proposed. First, a Modality-specific Feature Decoder module is designed to selectively transfer modality-specific information to the mixed-modality branch. Second, when exploring the interaction between modality-specific information, we argue that using an invariant multimodal fusion policy may lead to suboptimal results, so as to take the potential diversity in different parts of an action into consideration. Therefore, an Adaptive Fusion Module is proposed to learn adaptive multimodal fusion policies in different parts of an action. This module consists of several FusionNets for exploring different multimodal fusion strategies and a PolicyNet for deciding which FusionNets are enabled. Third, a module called Cross-modal Feature Decoder is designed to transfer cross-modal features generated by Adaptive Fusion Module to the mixed-modality branch.

fusionnet, information, mixed-modality branch, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TIP.2024.3362135

2402.09444

Country:

Asia > China > Guangdong Province > Guangzhou (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Asia > China > Guangdong Province > Zhuhai (0.04)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Sports (0.48)
Health & Medicine (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Spatial-temporal Vehicle Re-identification

Kim, Hye-Geun, Na, YouKyoung, Joe, Hae-Won, Moon, Yong-Hyuk, Cho, Yeong-Jun

arXiv.org Artificial IntelligenceSep-3-2023

Vehicle re-identification (ReID) in a large-scale camera network is important in public safety, traffic control, and security. However, due to the appearance ambiguities of vehicle, the previous appearance-based ReID methods often fail to track vehicle across multiple cameras. To overcome the challenge, we propose a spatial-temporal vehicle ReID framework that estimates reliable camera network topology based on the adaptive Parzen window method and optimally combines the appearance and spatial-temporal similarities through the fusion network. Based on the proposed methods, we performed superior performance on the public dataset (VeRi776) by 99.64% of rank-1 accuracy. The experimental results support that utilizing spatial and temporal information for ReID can leverage the accuracy of appearance-based methods and effectively deal with appearance ambiguities.

fusionnet, information, similarity, (14 more...)

arXiv.org Artificial Intelligence

2309.01166

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > China > Guangxi Province > Nanning (0.04)

Genre: Research Report (0.82)

Industry: Commercial Services & Supplies > Security & Alarm Services (0.58)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.87)

Add feedback

Do Question Answering Modeling Improvements Hold Across Benchmarks?

Liu, Nelson F., Lee, Tony, Jia, Robin, Liang, Percy

arXiv.org Artificial IntelligenceMay-30-2023

Do question answering (QA) modeling improvements (e.g., choice of architecture and training procedure) hold consistently across the diverse landscape of QA benchmarks? To study this question, we introduce the notion of concurrence -- two benchmarks have high concurrence on a set of modeling approaches if they rank the modeling approaches similarly. We measure the concurrence between 32 QA benchmarks on a set of 20 diverse modeling approaches and find that human-constructed benchmarks have high concurrence amongst themselves, even if their passage and question distributions are very different. Surprisingly, even downsampled human-constructed benchmarks (i.e., collecting less data) and programmatically-generated benchmarks (e.g., cloze-formatted examples) have high concurrence with human-constructed benchmarks. These results indicate that, despite years of intense community focus on a small number of benchmarks, the modeling improvements studied hold broadly.

benchmark, machine learning, question answering, (21 more...)

arXiv.org Artificial Intelligence

2102.01065

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Texas > McLennan County > Waco (0.04)
North America > United States > Texas > Falls County (0.04)
(7 more...)

Genre: Research Report (0.64)

Industry:

Government > Regional Government > North America Government > United States Government (0.67)
Leisure & Entertainment > Sports > Football (0.46)
Leisure & Entertainment > Sports > Basketball (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

CenterLineDet: CenterLine Graph Detection for Road Lanes with Vehicle-mounted Sensors by Transformer for HD Map Generation

Xu, Zhenhua, Liu, Yuxuan, Sun, Yuxiang, Liu, Ming, Wang, Lujia

arXiv.org Artificial IntelligenceFeb-28-2023

With the fast development of autonomous driving technologies, there is an increasing demand for high-definition (HD) maps, which provide reliable and robust prior information about the static part of the traffic environments. As one of the important elements in HD maps, road lane centerline is critical for downstream tasks, such as prediction and planning. Manually annotating centerlines for road lanes in HD maps is labor-intensive, expensive and inefficient, severely restricting the wide applications of autonomous driving systems. Previous work seldom explores the lane centerline detection problem due to the complicated topology and severe overlapping issues of lane centerlines. In this paper, we propose a novel method named CenterLineDet to detect lane centerlines for automatic HD map generation. Our CenterLineDet is trained by imitation learning and can effectively detect the graph of centerlines with vehicle-mounted sensors (i.e., six cameras and one LiDAR) through iterations. Due to the use of the DETR-like transformer network, CenterLineDet can handle complicated graph topology, such as lane intersections. The proposed approach is evaluated on the large-scale public dataset NuScenes. The superiority of our CenterLineDet is demonstrated by the comparative results. Our code, supplementary materials, and video demonstrations are available at \href{https://tonyxuqaq.github.io/projects/CenterLineDet/}{https://tonyxuqaq.github.io/projects/CenterLineDet/}.

artificial intelligence, centerlinedet, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2209.07734

Country:

Asia > China > Hong Kong > Kowloon (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry: Transportation > Ground > Road (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.55)

Add feedback

Pose Invariant Person Re-Identification using Robust Pose-transformation GAN

Karmakar, Arnab, Mishra, Deepak

arXiv.org Artificial IntelligenceApr-11-2021

Person re-identification (re-ID) aims to retrieve a person's images from an image gallery, given a single instance of the person of interest. Despite several advancements, learning discriminative identity-sensitive and viewpoint invariant features for robust Person Re-identification is a major challenge owing to large pose variation of humans. This paper proposes a re-ID pipeline that utilizes the image generation capability of Generative Adversarial Networks combined with pose regression and feature fusion to achieve pose invariant feature learning. The objective is to model a given person under different viewpoints and large pose changes and extract the most discriminative features from all the appearances. The pose transformational GAN (pt-GAN) module is trained to generate a person's image in any given pose. In order to identify the most significant poses for discriminative feature extraction, a Pose Regression module is proposed. The given instance of the person is modelled in varying poses and these features are effectively combined through the Feature Fusion Network. The final re-ID model consisting of these 3 sub-blocks, alleviates the pose dependence in person re-ID and outperforms the state-of-the-art GAN based models for re-ID in 4 benchmark datasets. The proposed model is robust to occlusion, scale and illumination, thereby outperforms the state-of-the-art models in terms of improvement over baseline.

dataset, person re-identification, proceedings, (8 more...)

arXiv.org Artificial Intelligence

2105.0093

Country: Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback