Information Fusion
NEDS-SLAM: A Novel Neural Explicit Dense Semantic SLAM Framework using 3D Gaussian Splatting
Ji, Yiming, Liu, Yang, Xie, Guanghu, Ma, Boyu, Xie, Zongwu
We propose NEDS-SLAM, an Explicit Dense semantic SLAM system based on 3D Gaussian representation, that enables robust 3D semantic mapping, accurate camera tracking, and high-quality rendering in real-time. In the system, we propose a Spatially Consistent Feature Fusion model to reduce the effect of erroneous estimates from pre-trained segmentation head on semantic reconstruction, achieving robust 3D semantic Gaussian mapping. Additionally, we employ a lightweight encoder-decoder to compress the high-dimensional semantic features into a compact 3D Gaussian representation, mitigating the burden of excessive memory consumption. Furthermore, we leverage the advantage of 3D Gaussian splatting, which enables efficient and differentiable novel view rendering, and propose a Virtual Camera View Pruning method to eliminate outlier GS points, thereby effectively enhancing the quality of scene representations. Our NEDS-SLAM method demonstrates competitive performance over existing dense semantic SLAM methods in terms of mapping and tracking accuracy on Replica and ScanNet datasets, while also showing excellent capabilities in 3D dense semantic mapping.
Categorical semiotics: Foundations for Knowledge Integration
The integration of knowledge extracted from diverse models, whether described by domain experts or generated by machine learning algorithms, has historically been challenged by the absence of a suitable framework for specifying and integrating structures, learning processes, data transformations, and data models or rules. In this work, we extend algebraic specification methods to address these challenges within such a framework. In our work, we tackle the challenging task of developing a comprehensive framework for defining and analyzing deep learning architectures. We believe that previous efforts have fallen short by failing to establish a clear connection between the constraints a model must adhere to and its actual implementation. Our methodology employs graphical structures that resemble Ehresmann's sketches, interpreted within a universe of fuzzy sets. This approach offers a unified theory that elegantly encompasses both deterministic and non-deterministic neural network designs. Furthermore, we highlight how this theory naturally incorporates fundamental concepts from computer science and automata theory. Our extended algebraic specification framework, grounded in graphical structures akin to Ehresmann's sketches, offers a promising solution for integrating knowledge across disparate models and domains. By bridging the gap between domain-specific expertise and machine-generated insights, we pave the way for more comprehensive, collaborative, and effective approaches to knowledge integration and modeling.
Advancing Multimodal Data Fusion in Pain Recognition: A Strategy Leveraging Statistical Correlation and Human-Centered Perspectives
Gu, Xingrui, Wang, Zhixuan, Jin, Irisa, Wu, Zekun
This research tackles the challenge of integrating heterogeneous data for specific behavior recognition within the domain of Pain Recognition, presenting a novel methodology that harmonizes statistical correlations with a human-centered approach. By leveraging a diverse range of deep learning architectures, we highlight the adaptability and efficacy of our approach in improving model performance across various complex scenarios. The novelty of our methodology is the strategic incorporation of statistical relevance weights and the segmentation of modalities from a human-centric perspective, enhancing model precision and providing a explainable analysis of multimodal data. This study surpasses traditional modality fusion techniques by underscoring the role of data diversity and customized modality segmentation in enhancing pain behavior analysis. Introducing a framework that matches each modality with an suited classifier, based on the statistical significance, signals a move towards customized and accurate multimodal fusion strategies. Our contributions extend beyond the field of Pain Recognition by delivering new insights into modality fusion and human-centered computing applications, contributing towards explainable AI and bolstering patient-centric healthcare interventions. Thus, we bridge a significant void in the effective and interpretable fusion of multimodal data, establishing a novel standard for forthcoming inquiries in pain behavior recognition and allied fields.
Self-Corrective Sensor Fusion for Drone Positioning in Indoor Facilities
Gonzรกlez-Castaรฑo, Francisco Javier, Gil-Castiรฑeira, Felipe, Rodrรญguez-Pereira, David, Regueiro-Janeiro, Josรฉ รngel, Garcรญa-Mรฉndez, Silvia, Candal-Ventureira, David
Drones may be more advantageous than fixed cameras for quality control applications in industrial facilities, since they can be redeployed dynamically and adjusted to production planning. The practical scenario that has motivated this paper, image acquisition with drones in a car manufacturing plant, requires drone positioning accuracy in the order of 5 cm. During repetitive manufacturing processes, it is assumed that quality control imaging drones will follow highly deterministic periodic paths, stop at predefined points to take images and send them to image recognition servers. Therefore, by relying on prior knowledge about production chain schedules, it is possible to optimize the positioning technologies for the drones to stay at all times within the boundaries of their flight plans, which will be composed of stopping points and the paths in between. This involves mitigating issues such as temporary blocking of line-of-sight between the drone and any existing radio beacons; sensor data noise; and the loss of visual references. We present a self-corrective solution for this purpose. It corrects visual odometer readings based on filtered and clustered Ultra-Wide Band (UWB) data, as an alternative to direct Kalman fusion. The approach combines the advantages of these technologies when at least one of them works properly at any measurement spot. It has three method components: independent Kalman filtering, data association by means of stream clustering and mutual correction of sensor readings based on the generation of cumulative correction vectors. The approach is inspired by the observation that UWB positioning works reasonably well at static spots whereas visual odometer measurements reflect straight displacements correctly but can underestimate their length. Our experimental results demonstrate the advantages of the approach in the application scenario over Kalman fusion.
Dataverse: Open-Source ETL (Extract, Transform, Load) Pipeline for Large Language Models
Park, Hyunbyung, Lee, Sukyung, Gim, Gyoungjin, Kim, Yungi, Kim, Dahyun, Park, Chanjun
To address the challenges associated with data processing at scale, we propose Dataverse, a unified open-source Extract-Transform-Load (ETL) pipeline for large language models (LLMs) with a user-friendly design at its core. Easy addition of custom processors with block-based interface in Dataverse allows users to readily and efficiently use Dataverse to build their own ETL pipeline. We hope that Dataverse will serve as a vital tool for LLM development and open source the entire library to welcome community contribution. Additionally, we provide a concise, two-minute video demonstration of our system, illustrating its capabilities and implementation.
Enhancing Trust and Privacy in Distributed Networks: A Comprehensive Survey on Blockchain-based Federated Learning
Liu, Ji, Chen, Chunlu, Li, Yu, Sun, Lin, Song, Yulun, Zhou, Jingbo, Jing, Bo, Dou, Dejing
While centralized servers pose a risk of being a single point of failure, decentralized approaches like blockchain offer a compelling solution by implementing a consensus mechanism among multiple entities. Merging distributed computing with cryptographic techniques, decentralized technologies introduce a novel computing paradigm. Blockchain ensures secure, transparent, and tamper-proof data management by validating and recording transactions via consensus across network nodes. Federated Learning (FL), as a distributed machine learning framework, enables participants to collaboratively train models while safeguarding data privacy by avoiding direct raw data exchange. Despite the growing interest in decentralized methods, their application in FL remains underexplored. This paper presents a thorough investigation into Blockchain-based FL (BCFL), spotlighting the synergy between blockchain's security features and FL's privacy-preserving model training capabilities. First, we present the taxonomy of BCFL from three aspects, including decentralized, separate networks, and reputation-based architectures. Then, we summarize the general architecture of BCFL systems, providing a comprehensive perspective on FL architectures informed by blockchain. Afterward, we analyze the application of BCFL in healthcare, IoT, and other privacy-sensitive areas. Finally, we identify future research directions of BCFL.
The Artificial Neural Twin -- Process Optimization and Continual Learning in Distributed Process Chains
Emmert, Johannes, Mendez, Ronald, Dastjerdi, Houman Mirzaalian, Syben, Christopher, Maier, Andreas
Industrial process optimization and control is crucial to increase economic and ecologic efficiency. However, data sovereignty, differing goals, or the required expert knowledge for implementation impede holistic implementation. Further, the increasing use of data-driven AI-methods in process models and industrial sensory often requires regular fine-tuning to accommodate distribution drifts. We propose the Artificial Neural Twin, which combines concepts from model predictive control, deep learning, and sensor networks to address these issues. Our approach introduces differentiable data fusion to estimate the state of distributed process steps and their dependence on input data. By treating the interconnected process steps as a quasi neural-network, we can backpropagate loss gradients for process optimization or model fine-tuning to process parameters or AI models respectively. The concept is demonstrated on a virtual machine park simulated in Unity, consisting of bulk material processes in plastic recycling.
Supervised Multiple Kernel Learning approaches for multi-omics data integration
Briscik, Mitja, Tazza, Gabriele, Dillies, Marie-Agnes, Vidรกcs, Lรกszlรณ, Dejean, Sรฉbastien
Advances in high-throughput technologies have originated an ever-increasing availability of omics datasets. The integration of multiple heterogeneous data sources is currently an issue for biology and bioinformatics. Multiple kernel learning (MKL) has shown to be a flexible and valid approach to consider the diverse nature of multi-omics inputs, despite being an underused tool in genomic data mining.We provide novel MKL approaches based on different kernel fusion strategies.To learn from the meta-kernel of input kernels, we adaptedunsupervised integration algorithms for supervised tasks with support vector machines.We also tested deep learning architectures for kernel fusion and classification.The results show that MKL-based models can compete with more complex, state-of-the-art, supervised multi-omics integrative approaches. Multiple kernel learning offers a natural framework for predictive models in multi-omics genomic data. Our results offer a direction for bio-data mining research and further development of methods for heterogeneous data integration.
An Intermediate Fusion ViT Enables Efficient Text-Image Alignment in Diffusion Models
Hu, Zizhao, Jia, Shaochong, Rostami, Mohammad
Diffusion models have been widely used for conditional data cross-modal generation tasks such as text-to-image and text-to-video. However, state-of-the-art models still fail to align the generated visual concepts with high-level semantics in a language such as object count, spatial relationship, etc. We approach this problem from a multimodal data fusion perspective and investigate how different fusion strategies can affect vision-language alignment. We discover that compared to the widely used early fusion of conditioning text in a pretrained image feature space, a specially designed intermediate fusion can: (i) boost textto-image alignment with improved generation quality and (ii) improve training and inference efficiency by reducing low-rank text-to-image attention calculations. We perform experiments using a text-to-image generation task on the MS-COCO dataset. We compare our intermediate fusion mechanism with the classic early fusion mechanism on two common conditioning methods on a U-shaped ViT backbone. Our intermediate fusion model achieves higher CLIP Score and lower FID, with 20% reduced FLOPs, and 50% increased training speed compared against a strong U-ViT baseline with an early fusion.
Multiple and Gyro-Free Inertial Datasets
Yampolsky, Zeev, Stolero, Yair, Pri-Hadash, Nitzan, Solodar, Dan, Massas, Shira, Savin, Itai, Klein, Itzik
An inertial navigation system (INS) utilizes three orthogonal accelerometers and gyroscopes to determine platform position, velocity, and orientation. There are countless applications for INS, including robotics, autonomous platforms, and the internet of things. Recent research explores the integration of data-driven methods with INS, highlighting significant innovations, improving accuracy and efficiency. Despite the growing interest in this field and the availability of INS datasets, no datasets are available for gyro-free INS (GFINS) and multiple inertial measurement unit (MIMU) architectures. To fill this gap and to stimulate further research in this field, we designed and recorded GFINS and MIMU datasets using 54 inertial sensors grouped in nine inertial measurement units. These sensors can be used to define and evaluate different types of MIMU and GFINS architectures. The inertial sensors were arranged in three different sensor configurations and mounted on a mobile robot and a passenger car. In total, the dataset contains 35 hours of inertial data and corresponding ground truth trajectories. The data and code are freely accessible through our GitHub repository.