AITopics

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.67)

Neural Information Processing SystemsOct-10-2024, 19:08:01 GMT

Kalman Filter, Sensor Fusion, and Constrained Regression: Equivalences and Insights

The Kalman filter (KF) is one of the most widely used tools for data assimilation and sequential estimation. In this work, we show that the state estimates from the KF in a standard linear dynamical system setting are equivalent to those given by the KF in a transformed system, with infinite process noise (i.e., a flat prior'') and an augmented measurement space. This reformulation---which we refer to as augmented measurement sensor fusion (SF)---is conceptually interesting, because the transformed system here is seemingly static (as there is effectively no process model), but we can still capture the state dynamics inherent to the KF by folding the process model into the measurement space. Further, this reformulation of the KF turns out to be useful in settings in which past states are observed eventually (at some lag). Here, when the measurement noise covariance is estimated by the empirical covariance, we show that the state predictions from SF are equivalent to those from a regression of past states on past measurements, subject to particular linear constraints (reflecting the relationships encoded in the measurement map). This allows us to port standard ideas (say, regularization methods) in regression over to dynamical systems.

constrained regression, equivalence and insight, sensor fusion, (6 more...)

Country: North America > United States (0.08)

Industry: Health & Medicine > Therapeutic Area (0.36)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Data Science > Data Integration (0.64)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.64)

Neural Information Processing SystemsOct-10-2024, 10:07:53 GMT

On Single Source Robustness in Deep Fusion Models

Algorithms that fuse multiple input sources benefit from both complementary and shared information. Shared information may provide robustness against faulty or noisy inputs, which is indispensable for safety-critical applications like self-driving cars. We investigate learning fusion algorithms that are robust against noise added to a single source. We first demonstrate that robustness against single source noise is not guaranteed in a linear fusion model. Motivated by this discovery, two possible approaches are proposed to increase robustness: a carefully designed loss with corresponding training algorithms for deep fusion models, and a simple convolutional fusion layer that has a structural advantage in dealing with noise.

deep fusion model, noise, single source robustness, (3 more...)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)

arXiv.org Artificial IntelligenceOct-10-2024

A Visual Cooperative Localization Method for Airborne Magnetic Surveying Based on a Manifold Sensor Fusion Algorithm Using Lie Groups

Liu, Liang, Hu, Xiao, Jiang, Wei, Meng, Guanglei, Wang, Zhujun, Zhang, Taining

Recent advancements in UAV technology have spurred interest in developing multi-UAV aerial surveying systems for use in confined environments where GNSS signals are blocked or jammed. This paper focuses airborne magnetic surveying scenarios. To obtain clean magnetic measurements reflecting the Earth's magnetic field, the magnetic sensor must be isolated from other electronic devices, creating a significant localization challenge. We propose a visual cooperative localization solution. The solution incorporates a visual processing module and an improved manifold-based sensor fusion algorithm, delivering reliable and accurate positioning information. Real flight experiments validate the approach, demonstrating single-axis centimeter-level accuracy and decimeter-level overall 3D positioning accuracy.

arc, detection, ellipse, (15 more...)

2410.077

Country:

Asia > China > Liaoning Province > Shenyang (0.05)
Asia > China > Beijing > Beijing (0.05)
Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)

Genre: Research Report (0.64)

Industry:

Transportation > Air (0.47)
Aerospace & Defense > Aircraft (0.47)
Energy > Power Industry > Utilities > Nuclear (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)

Neural Information Processing SystemsOct-9-2024, 16:54:45 GMT

BayesIMP: Uncertainty Quantification for Causal Data Fusion

While causal models are becoming one of the mainstays of machine learning, the problem of uncertainty quantification in causal inference remains challenging. In this paper, we study the causal data fusion problem, where data arising from multiple causal graphs are combined to estimate the average treatment effect of a target variable. As data arises from multiple sources and can vary in quality and sample size, principled uncertainty quantification becomes essential. To that end, we introduce \emph{Bayesian Causal Mean Processes}, the framework which combines ideas from probabilistic integration and kernel mean embeddings to represent interventional distributions in the reproducing kernel Hilbert space, while taking into account the uncertainty within each causal graph. To demonstrate the informativeness of our uncertainty estimation, we apply our method to the Causal Bayesian Optimisation task and show improvements over state-of-the-art methods.

bayesimp, causal data fusion, uncertainty quantification, (1 more...)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)

arXiv.org Artificial IntelligenceOct-8-2024

Liver Cancer Knowledge Graph Construction based on dynamic entity replacement and masking strategies RoBERTa-BiLSTM-CRF model

Zhang, YiChi, Wang, HaiLing, Gao, YongBin, Hu, XiaoJun, Fan, YingFang, Fang, ZhiJun

Background: Liver cancer ranks as the fifth most common malignant tumor and the second most fatal in our country. Early diagnosis is crucial, necessitating that physicians identify liver cancer in patients at the earliest possible stage. However, the diagnostic process is complex and demanding. Physicians must analyze a broad spectrum of patient data, encompassing physical condition, symptoms, medical history, and results from various examinations and tests, recorded in both structured and unstructured medical formats. This results in a significant workload for healthcare professionals. In response, integrating knowledge graph technology to develop a liver cancer knowledge graph-assisted diagnosis and treatment system aligns with national efforts toward smart healthcare. Such a system promises to mitigate the challenges faced by physicians in diagnosing and treating liver cancer. Methods: This paper addresses the major challenges in building a knowledge graph for hepatocellular carcinoma diagnosis, such as the discrepancy between public data sources and real electronic medical records, the effective integration of which remains a key issue. The knowledge graph construction process consists of six steps: conceptual layer design, data preprocessing, entity identification, entity normalization, knowledge fusion, and graph visualization. A novel Dynamic Entity Replacement and Masking Strategy (DERM) for named entity recognition is proposed. Results: A knowledge graph for liver cancer was established, including 7 entity types such as disease, symptom, and constitution, containing 1495 entities. The recognition accuracy of the model was 93.23%, the recall was 94.69%, and the F1 score was 93.96%.

artificial intelligence, machine learning, natural language, (19 more...)

2410.1809

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report > New Finding (0.69)

Industry: Health & Medicine > Therapeutic Area > Oncology > Liver Cancer (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceOct-8-2024

Learning Content-Aware Multi-Modal Joint Input Pruning via Bird's-Eye-View Representation

Li, Yuxin, Li, Yiheng, Yang, Xulei, Yu, Mengying, Huang, Zihang, Wu, Xiaojun, Yeo, Chai Kiat

In the landscape of autonomous driving, Bird's-Eye-View (BEV) representation has recently garnered substantial academic attention, serving as a transformative framework for the fusion of multi-modal sensor inputs. This BEV paradigm effectively shifts the sensor fusion challenge from a rule-based methodology to a data-centric approach, thereby facilitating more nuanced feature extraction from an array of heterogeneous sensors. Notwithstanding its evident merits, the computational overhead associated with BEV-based techniques often mandates high-capacity hardware infrastructures, thus posing challenges for practical, real-world implementations. To mitigate this limitation, we introduce a novel content-aware multi-modal joint input pruning technique. Our method leverages BEV as a shared anchor to algorithmically identify and eliminate non-essential sensor regions prior to their introduction into the perception model's backbone. We validatethe efficacy of our approach through extensive experiments on the NuScenes dataset, demonstrating substantial computational efficiency without sacrificing perception accuracy. To the best of our knowledge, this work represents the first attempt to alleviate the computational burden from the input pruning point.

computer vision, detection, predictor, (15 more...)

2410.07268

Country:

Asia > Singapore (0.05)
North America > United States (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report (0.82)

Industry:

Education (0.40)
Information Technology (0.35)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.67)

arXiv.org Artificial IntelligenceOct-8-2024

STNet: Deep Audio-Visual Fusion Network for Robust Speaker Tracking

Li, Yidi, Liu, Hong, Yang, Bing

--Audio-visual speaker tracking aims to determine the location of human targets in a scene using signals captured by a multi-sensor platform, whose accuracy and robustness can be improved by multi-modal fusion methods. Recently, several fusion methods have been proposed to model the correlation in multiple modalities. However, for the speaker tracking problem, the cross-modal interaction between audio and visual signals hasn't been well exploited. T o this end, we present a novel Speaker Tracking Network (STNet) with a deep audio-visual fusion model in this work. We design a visual-guided acoustic measurement method to fuse heterogeneous cues in a unified localization space, which employs visual observations via a camera model to construct the enhanced acoustic map. For feature fusion, a cross-modal attention module is adopted to jointly model multi-modal contexts and interactions. The correlated information between audio and visual features is further interacted in the fusion model. Moreover, the STNet-based tracker is applied to multi-speaker cases by a quality-aware module, which evaluates the reliability of multi-modal observations to achieve robust tracking in complex scenarios. Experiments on the A V16.3 and CA V3D datasets show that the proposed STNet-based tracker outperforms uni-modal methods and state-of-the-art audio-visual speaker trackers. PEAKER tracking is a fundamental task in human-computer interaction that determines the position of the speaker in each time step by analyzing data from sensors such as microphones and cameras [1]. It has wide applications in intelligent surveillance [2], multimedia systems [3], and robot navigation [4]. In general, the basic approaches for solving the tracking problem include computer vision-based face or body tracking methods [5-7] and auditory-based Sound Source Localization (SSL) methods [8, 9]. However, it is difficult for uni-modal methods to adapt to complex dynamic environments. For example, visual trackers are susceptible to object occlusion and changes in illumination and appearance. Besides, acoustic tracking is not subject to visual interference, but the intermittent nature of speech signals, background noise, and room reverberation constrain the performance of SSL-based trackers. This work is supported by National Natural Science Foundation of China (No. 62403345).

international conference, module, tracking, (15 more...)

2410.05964

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
North America > United States (0.04)
Europe > Portugal > Braga > Braga (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
(2 more...)

arXiv.org Artificial IntelligenceOct-5-2024

Channel-Aware Throughput Maximization for Cooperative Data Fusion in CAV

An, Haonan, Fang, Zhengru, Zhang, Yuang, Hu, Senkang, Chen, Xianhao, Xu, Guowen, Fang, Yuguang

--Connected and autonomous vehicles (CA Vs) have garnered significant attention due to their extended perception range and enhanced sensing coverage. T o address challenges such as blind spots and obstructions, CA Vs employ vehicle-to-vehicle (V2V) communications to aggregate sensory data from surrounding vehicles. However, cooperative perception is often constrained by the limitations of achievable network throughput and channel quality. In this paper, we propose a channel-aware throughput maximization approach to facilitate CA V data fusion, leveraging a self-supervised autoencoder for adaptive data compression. We formulate the problem as a mixed integer programming (MIP) model, which we decompose into two sub-problems to derive optimal data rate and compression ratio solutions under given link conditions. An autoencoder is then trained to minimize bitrate with the determined compression ratio, and a fine-tuning strategy is employed to further reduce spectrum resource consumption. Experimental evaluation on the OpenCOOD platform demonstrates the effectiveness of our proposed algorithm, showing more than 20.19% improvement in network throughput and a 9.38% increase in average precision (AP@IoU) compared to state-of-the-art methods, with an optimal latency of 19.99 ms. Index T erms --Cooperative perception, throughput optimization, connected and autonomous driving (CA V). Recently, autonomous driving has emerged as a promising technology for smart cities. By leveraging communication and artificial intelligence (AI) technologies, autonomous driving can significantly enhance the performance of a city's transportation system. This improvement is achieved through real-time perception of road conditions and precise object detection from onboard sensors (such as radars, LiDARs, and cameras), thereby improving road safety without human intervention [1]. Moreover, the ability of autonomous vehicles to adapt to dynamic environments and communicate with surrounding infrastructure and vehicles is crucial for maintaining the timeliness and accuracy of collected data, thereby enhancing the overall system performance [2]-[9]. Joint perception among connected and autonomous vehicles (CA Vs) is a key enabler to overcome the limitations of individual agent sensing capabilities [10].

international conference, perception, vehicle, (16 more...)

2410.0432

Country:

Asia > China > Hong Kong (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(13 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

arXiv.org Artificial IntelligenceOct-3-2024

LEGO: Learnable Expansion of Graph Operators for Multi-Modal Feature Fusion

Ding, Dexuan, Wang, Lei, Zhu, Liyun, Gedeon, Tom, Koniusz, Piotr

In computer vision tasks, features often come from diverse representations, domains, and modalities, such as text, images, and videos. Effectively fusing these features is essential for robust performance, especially with the availability of powerful pre-trained models like vision-language models. However, common fusion methods, such as concatenation, element-wise operations, and non-linear techniques, often fail to capture structural relationships, deep feature interactions, and suffer from inefficiency or misalignment of features across domains. In this paper, we shift from high-dimensional feature space to a lower-dimensional, interpretable graph space by constructing similarity graphs that encode feature relationships at different levels, e.g., clip, frame, patch, token, etc. To capture deeper interactions, we use graph power expansions and introduce a learnable graph fusion operator to combine these graph powers for more effective fusion. Our approach is relationship-centric, operates in a homogeneous space, and is mathematically principled, resembling element-wise similarity score aggregation via multilinear polynomials. We demonstrate the effectiveness of our graph-based fusion method on video anomaly detection, showing strong performance across multi-representational, multi-modal, and multi-domain feature fusion tasks.

fusion, graph, modality, (15 more...)

2410.01506

Country:

North America > United States (0.14)
Oceania > Australia (0.14)

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)