Information Fusion
Novel Architecture for Distributed Travel Data Integration and Service Provision Using Microservices
Barua, Biman, Kaiser, M. Shamim
This paper introduces a microservices architecture for the purpose of enhancing the flexibility and performance of an airline reservation system. The architectural design incorporates Redis cache technologies, two different messaging systems (Kafka and RabbitMQ), two types of storages (MongoDB, and PostgreSQL). It also introduces authorization techniques, including secure communication through OAuth2 and JWT which is essential with the management of high-demand travel services. According to selected indicators, the architecture provides an impressive level of data consistency at 99.5% and a latency of data propagation of less than 75 ms allowing rapid and reliable intercommunication between microservices. A system throughput of 1050 events per second was achieved so that the acceptability level was maintained even during peak time. Redis caching reduced a 92% cache hit ratio on the database thereby lowering the burden on the database and increasing the speed of response. Further improvement of the systems scalability was done through the use of Docker and Kubernetes which enabled services to be expanded horizontally to cope with the changes in demand. The error rates were very low, at 0.2% further enhancing the efficiency of the system in handling real-time data integration. This approach is suggested to meet the specific needs of the airline reservation system. It is secure, fast, scalable, all serving to improve the user experience as well as the efficiency of operations. The low latency and high data integration levels and prevaiing efficient usage of the resources demonstrates the architecture ability to offer continued support in the ever growing high demand situations.
Graph Integration for Diffusion-Based Manifold Alignment
Rhodes, Jake S., Rustad, Adam G.
Data from individual observations can originate from various sources or modalities but are often intrinsically linked. Multimodal data integration can enrich information content compared to single-source data. Manifold alignment is a form of data integration that seeks a shared, underlying low-dimensional representation of multiple data sources that emphasizes similarities between alternative representations of the same entities. Semi-supervised manifold alignment relies on partially known correspondences between domains, either through shared features or through other known associations. In this paper, we introduce two semi-supervised manifold alignment methods. The first method, Shortest Paths on the Union of Domains (SPUD), forms a unified graph structure using known correspondences to establish graph edges. By learning inter-domain geodesic distances, SPUD creates a global, multi-domain structure. The second method, MASH (Manifold Alignment via Stochastic Hopping), learns local geometry within each domain and forms a joint diffusion operator using known correspondences to iteratively learn new inter-domain correspondences through a random-walk approach. Through the diffusion process, MASH forms a coupling matrix that links heterogeneous domains into a unified structure. We compare SPUD and MASH with existing semi-supervised manifold alignment methods and show that they outperform competing methods in aligning true correspondences and cross-domain classification. In addition, we show how these methods can be applied to transfer label information between domains.
Unsupervised Multimodal Fusion of In-process Sensor Data for Advanced Manufacturing Process Monitoring
McKinney, Matthew, Garland, Anthony, Cillessen, Dale, Adamczyk, Jesse, Bolintineanu, Dan, Heiden, Michael, Fowler, Elliott, Boyce, Brad L.
Effective monitoring of manufacturing processes is crucial for maintaining product quality and operational efficiency. Modern manufacturing environments generate vast amounts of multimodal data, including visual imagery from various perspectives and resolutions, hyperspectral data, and machine health monitoring information such as actuator positions, accelerometer readings, and temperature measurements. However, interpreting this complex, high-dimensional data presents significant challenges, particularly when labeled datasets are unavailable. This paper presents a novel approach to multimodal sensor data fusion in manufacturing processes, inspired by the Contrastive Language-Image Pre-training (CLIP) model. We leverage contrastive learning techniques to correlate different data modalities without the need for labeled data, developing encoders for five distinct modalities: visual imagery, audio signals, laser position (x and y coordinates), and laser power measurements. By compressing these high-dimensional datasets into low-dimensional representational spaces, our approach facilitates downstream tasks such as process control, anomaly detection, and quality assurance. We evaluate the effectiveness of our approach through experiments, demonstrating its potential to enhance process monitoring capabilities in advanced manufacturing systems. This research contributes to smart manufacturing by providing a flexible, scalable framework for multimodal data fusion that can adapt to diverse manufacturing environments and sensor configurations.
Efficient Bilinear Attention-based Fusion for Medical Visual Question Answering
Zhang, Zhilin, Wang, Jie, Zhu, Ruiqi, Gong, Xiaoliang
Medical Visual Question Answering (MedVQA) has gained increasing attention at the intersection of computer vision and natural language processing. Its capability to interpret radiological images and deliver precise answers to clinical inquiries positions MedVQA as a valuable tool for supporting diagnostic decision-making for physicians and alleviating the workload on radiologists. While recent approaches focus on using unified pre-trained large models for multi-modal fusion like cross-modal Transformers, research on more efficient fusion methods remains relatively scarce within this discipline. In this paper, we introduce a novel fusion model that integrates Orthogonality loss, Multi-head attention and Bilinear Attention Network (OMniBAN) to achieve high computational efficiency and strong performance without the need for pre-training. We conduct comprehensive experiments and clarify aspects of how to enhance bilinear attention fusion to achieve performance comparable to that of large models. Experimental results show that OMniBAN outperforms traditional models on key MedVQA benchmarks while maintaining a lower computational cost, which indicates its potential for efficient clinical application in radiology and pathology image question answering.
Combining Incomplete Observational and Randomized Data for Heterogeneous Treatment Effects
Yao, Dong, Tang, Caizhi, Cui, Qing, Li, Longfei
Data from observational studies (OSs) is widely available and readily obtainable yet frequently contains confounding biases. On the other hand, data derived from randomized controlled trials (RCTs) helps to reduce these biases; however, it is expensive to gather, resulting in a tiny size of randomized data. For this reason, effectively fusing observational data and randomized data to better estimate heterogeneous treatment effects (HTEs) has gained increasing attention. However, existing methods for integrating observational data with randomized data must require \textit{complete} observational data, meaning that both treated subjects and untreated subjects must be included in OSs. This prerequisite confines the applicability of such methods to very specific situations, given that including all subjects, whether treated or untreated, in observational studies is not consistently achievable. In our paper, we propose a resilient approach to \textbf{C}ombine \textbf{I}ncomplete \textbf{O}bservational data and randomized data for HTE estimation, which we abbreviate as \textbf{CIO}. The CIO is capable of estimating HTEs efficiently regardless of the completeness of the observational data, be it full or partial. Concretely, a confounding bias function is first derived using the pseudo-experimental group from OSs, in conjunction with the pseudo-control group from RCTs, via an effect estimation procedure. This function is subsequently utilized as a corrective residual to rectify the observed outcomes of observational data during the HTE estimation by combining the available observational data and the all randomized data. To validate our approach, we have conducted experiments on a synthetic dataset and two semi-synthetic datasets.
Smart ETL and LLM-based contents classification: the European Smart Tourism Tools Observatory experience
Cosme, Diogo, Galvรฃo, Antรณnio, Abreu, Fernando Brito e
Purpose: Our research project focuses on improving the content update of the online European Smart Tourism Tools (STTs) Observatory by incorporating and categorizing STTs. The categorization is based on their taxonomy, and it facilitates the end user's search process. The use of a Smart ETL (Extract, Transform, and Load) process, where \emph{Smart} indicates the use of Artificial Intelligence (AI), is central to this endeavor. Methods: The contents describing STTs are derived from PDF catalogs, where PDF-scraping techniques extract QR codes, images, links, and text information. Duplicate STTs between the catalogs are removed, and the remaining ones are classified based on their text information using Large Language Models (LLMs). Finally, the data is transformed to comply with the Dublin Core metadata structure (the observatory's metadata structure), chosen for its wide acceptance and flexibility. Results: The Smart ETL process to import STTs to the observatory combines PDF-scraping techniques with LLMs for text content-based classification. Our preliminary results have demonstrated the potential of LLMs for text content-based classification. Conclusion: The proposed approach's feasibility is a step towards efficient content-based classification, not only in Smart Tourism but also adaptable to other fields. Future work will mainly focus on refining this classification process.
Precision Soil Quality Analysis Using Transformer-based Data Fusion Strategies: A Systematic Review
Saki, Mahdi, Keshavarz, Rasool, Franklin, Daniel, Abolhasan, Mehran, Lipman, Justin, Shariati, Negin
The transformer-based data fusion techniques in agricultural implementation of PA, also known as smart farming, relies remote sensing (RS), with a particular focus on soil on the ability to collect, process, and analyse spatial and analysis. Utilizing a systematic, data-driven approach, we temporal data to optimize field management practices demonstrate that transformers have significantly (Cisternas et al., 2020; Pyingkodi et al., 2022). Despite its outperformed conventional deep learning and machine enormous potential, the adoption of PA remains below learning methods since 2022, achieving prediction expectations due to factors such as high initial investment performance between 92% and 97%. The review is costs, the complexity of IT, and the need for specialized specifically focused on soil analysis, due to the importance knowledge (Cisternas et al., 2020). of soil condition in optimizing crop productivity and Remote sensing (RS) has seen rapid advancements and ensuring sustainable farming practices. Transformer-based widespread adoption in PA, offering high-resolution data models have shown remarkable capabilities in handling for applications ranging from crop monitoring to irrigation complex multivariate soil data, improving the accuracy of management (Sishodia et al., 2020). Remote sensing has soil moisture prediction, soil element analysis, and other proven to be an effective tool for capturing and monitoring soil-related applications. This systematic review primarily the spectral and temporal properties of the land surface focuses on 1) analysing research trends and patterns in the influenced by human activities at different spatial and literature, both chronologically and technically, and 2) temporal scales (Bรฉguรฉ et al., 2018).
Captions Speak Louder than Images (CASLIE): Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data
Ling, Xinyi, Peng, Bo, Du, Hanwen, Zhu, Zhihui, Ning, Xia
Leveraging multimodal data to drive breakthroughs in e-commerce applications through Multimodal Foundation Models (MFMs) is gaining increasing attention from the research community. However, there are significant challenges that hinder the optimal use of multimodal e-commerce data by foundation models: (1) the scarcity of large-scale, high-quality multimodal benchmark datasets; and (2) the lack of effective multimodal information integration methods. To address these challenges, in this paper, we introduce MMECInstruct, the first-ever, large-scale, and high-quality multimodal instruction dataset for e-commerce. We also develop CASLIE, a simple, lightweight, yet effective framework for integrating multimodal information for e-commerce. Leveraging MMECInstruct, we fine-tune a series of e-commerce MFMs within CASLIE, denoted as CASLIE models. Our comprehensive evaluation demonstrates that CASLIE models substantially outperform 5 categories of advanced baseline models in the in-domain evaluation. Moreover, CASLIE models show strong generalizability to out-of-domain settings. MMECInstruct and CASLIE models are publicly accessible through https://ninglab.github.io/CASLIE/.
Multi-Sensor Fusion for UAV Classification Based on Feature Maps of Image and Radar Data
Sakellariou, Nikos, Lalas, Antonios, Votis, Konstantinos, Tzovaras, Dimitrios
Continuous Wave (FMCW) radars represent the most Unmanned Aerial Vehicles (UAV) have successfully attractive and cost-efficient solutions [2]. While for the permeated modern society with various applications for civil verification and classification task various methods exist in and military purposes. Oil and gas, construction, metals and literature employing machine learning techniques such as mining already incorporate UAVs in their processes. SVM [3], Random Forests [4], Nearest Neighbor [5] and Furthermore, UAVs are employed for commercial purposes, Deep Neural Networks [6][7][8]. More recent DNN such as the monitoring of public places, cartography, survey approaches based on convolutional neural networks are wildlife, search and rescue (SAR), first aid and delivery of introduced in Samaras et al. [9]. The authors presented a deep goods. Big technological companies continuously challenge learning classification method based on data from an X-band the status quote by announcing breakthrough services. FMCW surveillance 2D radar that is able to reach a Moreover, progress in UAV regulation has driven classification accuracy of up to 95.0% utilizing a custom investments since 2019, to further increase the popularity and CNN based architecture. A similar approach is presented in use of UAVs in sectors that present significant potential but [10] where the authors proposed Res-Net-SP, a compressed still minimal use, such as agriculture, healthcare, architecture of ResNet-18 that is based in convolutional infrastructure, property management and insurance.
Kaninfradet3D:A Road-side Camera-LiDAR Fusion 3D Perception Model based on Nonlinear Feature Extraction and Intrinsic Correlation
Liu, Pei, Zheng, Nanfang, Li, Yiqun, Chen, Junlan, Pu, Ziyuan
With the development of AI-assisted driving, numerous methods have emerged for ego-vehicle 3D perception tasks, but there has been limited research on roadside perception. With its ability to provide a global view and a broader sensing range, the roadside perspective is worth developing. LiDAR provides precise three-dimensional spatial information, while cameras offer semantic information. These two modalities are complementary in 3D detection. However, adding camera data does not increase accuracy in some studies since the information extraction and fusion procedure is not sufficiently reliable. Recently, Kolmogorov-Arnold Networks (KANs) have been proposed as replacements for MLPs, which are better suited for high-dimensional, complex data. Both the camera and the LiDAR provide high-dimensional information, and employing KANs should enhance the extraction of valuable features to produce better fusion outcomes. This paper proposes Kaninfradet3D, which optimizes the feature extraction and fusion modules. To extract features from complex high-dimensional data, the model's encoder and fuser modules were improved using KAN Layers. Cross-attention was applied to enhance feature fusion, and visual comparisons verified that camera features were more evenly integrated. This addressed the issue of camera features being abnormally concentrated, negatively impacting fusion. Compared to the benchmark, our approach shows improvements of +9.87 mAP and +10.64 mAP in the two viewpoints of the TUMTraf Intersection Dataset and an improvement of +1.40 mAP in the roadside end of the TUMTraf V2X Cooperative Perception Dataset. The results indicate that Kaninfradet3D can effectively fuse features, demonstrating the potential of applying KANs in roadside perception tasks.