Information Fusion
MM-Lego: Modular Biomedical Multimodal Models with Minimal Fine-Tuning
Hemker, Konstantin, Simidjievski, Nikola, Jamnik, Mateja
Learning holistic computational representations in physical, chemical or biological systems requires the ability to process information from different distributions and modalities within the same model. Thus, the demand for multimodal machine learning models has sharply risen for modalities that go beyond vision and language, such as sequences, graphs, time series, or tabular data. While there are many available multimodal fusion and alignment approaches, most of them require end-to-end training, scale quadratically with the number of modalities, cannot handle cases of high modality imbalance in the training set, or are highly topology-specific, making them too restrictive for many biomedical learning tasks. This paper presents Multimodal Lego (MM-Lego), a modular and general-purpose fusion and model merging framework to turn any set of encoders into a competitive multimodal model with no or minimal fine-tuning. We achieve this by introducing a wrapper for unimodal encoders that enforces lightweight dimensionality assumptions between modalities and harmonises their representations by learning features in the frequency domain to enable model merging with little signal interference. We show that MM-Lego 1) can be used as a model merging method which achieves competitive performance with end-to-end fusion models without any fine-tuning, 2) can operate on any unimodal encoder, and 3) is a model fusion method that, with minimal fine-tuning, achieves state-of-the-art results on six benchmarked multimodal biomedical tasks.
The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition
Kong, Lingdong, Xie, Shaoyuan, Hu, Hanjiang, Niu, Yaru, Ooi, Wei Tsang, Cottereau, Benoit R., Ng, Lai Xing, Ma, Yuexin, Zhang, Wenwei, Pan, Liang, Chen, Kai, Liu, Ziwei, Qiu, Weichao, Zhang, Wei, Cao, Xu, Lu, Hao, Chen, Ying-Cong, Kang, Caixin, Zhou, Xinning, Ying, Chengyang, Shang, Wentao, Wei, Xingxing, Dong, Yinpeng, Yang, Bo, Jiang, Shengyin, Ma, Zeliang, Ji, Dengyi, Li, Haiwen, Huang, Xingliang, Tian, Yu, Kou, Genghua, Jia, Fan, Liu, Yingfei, Wang, Tiancai, Li, Ying, Hao, Xiaoshuai, Yang, Yifan, Zhang, Hui, Wei, Mengchuan, Zhou, Yi, Zhao, Haimei, Zhang, Jing, Li, Jinke, He, Xiao, Cheng, Xiaoqiang, Zhang, Bingyang, Zhao, Lirong, Ding, Dianlei, Liu, Fangsheng, Yan, Yixiang, Wang, Hongming, Ye, Nanfei, Luo, Lun, Tian, Yubo, Zuo, Yiwei, Cao, Zhe, Ren, Yi, Li, Yunfan, Liu, Wenjie, Wu, Xun, Mao, Yifan, Li, Ming, Liu, Jian, Liu, Jiayang, Qin, Zihan, Chu, Cunxi, Xu, Jialei, Zhao, Wenbo, Jiang, Junjun, Liu, Xianming, Wang, Ziyan, Li, Chiwei, Li, Shilong, Yuan, Chendong, Yang, Songyue, Liu, Wentao, Chen, Peng, Zhou, Bin, Wang, Yubo, Zhang, Chi, Sun, Jianhang, Chen, Hai, Yang, Xiao, Wang, Lizhong, Fu, Dongyi, Lin, Yongchun, Yang, Huitong, Li, Haoang, Luo, Yadan, Cheng, Xianjing, Xu, Yong
In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that can withstand and adapt to these real-world variabilities. Focusing on four pivotal tasks -- BEV detection, map segmentation, semantic occupancy prediction, and multi-view depth estimation -- the competition laid down a gauntlet to innovate and enhance system resilience against typical and atypical disturbances. This year's challenge consisted of five distinct tracks and attracted 140 registered teams from 93 institutes across 11 countries, resulting in nearly one thousand submissions evaluated through our servers. The competition culminated in 15 top-performing solutions, which introduced a range of innovative approaches including advanced data augmentation, multi-sensor fusion, self-supervised learning for error correction, and new algorithmic strategies to enhance sensor robustness. These contributions significantly advanced the state of the art, particularly in handling sensor inconsistencies and environmental variability. Participants, through collaborative efforts, pushed the boundaries of current technologies, showcasing their potential in real-world scenarios. Extensive evaluations and analyses provided insights into the effectiveness of these solutions, highlighting key trends and successful strategies for improving the resilience of driving perception systems. This challenge has set a new benchmark in the field, providing a rich repository of techniques expected to guide future research in this field.
Experimental Evaluation of Road-Crossing Decisions by Autonomous Wheelchairs against Environmental Factors
Corradini, Franca, Grigioni, Carlo, Antonucci, Alessandro, Guzzi, Jรฉrรดme, Flammini, Francesco
Safe road crossing by autonomous wheelchairs can be affected by several environmental factors such as adverse weather conditions influencing the accuracy of artificial vision. Previous studies have addressed experimental evaluation of multi-sensor information fusion to support road-crossing decisions in autonomous wheelchairs. In this study, we focus on the fine-tuning of tracking performance and on its experimental evaluation against outdoor environmental factors such as fog, rain, darkness, etc. It is rather intuitive that those factors can negatively affect the tracking performance; therefore our aim is to provide an approach to quantify their effects in the reference scenario, in order to detect conditions of unacceptable accuracy. In those cases, warnings can be issued and system can be possibly reconfigured to reduce the reputation of less accurate sensors, and thus improve overall safety. Critical situations can be detected by the main sensors or by additional sensors, e.g., light sensors, rain sensors, etc. Results have been achieved by using an available laboratory dataset and by applying appropriate software filters; they show that the approach can be adopted to evaluate video tracking and event detection robustness against outdoor environmental factors in relevant operational scenarios.
Uncertainty Management in the Construction of Knowledge Graphs: a Survey
Jarnac, Lucas, Chabot, Yoan, Couceiro, Miguel
Knowledge Graphs (KGs) are a major asset for companies thanks to their great flexibility in data representation and their numerous applications, e.g., vocabulary sharing, Q/A or recommendation systems. To build a KG it is a common practice to rely on automatic methods for extracting knowledge from various heterogeneous sources. But in a noisy and uncertain world, knowledge may not be reliable and conflicts between data sources may occur. Integrating unreliable data would directly impact the use of the KG, therefore such conflicts must be resolved. This could be done manually by selecting the best data to integrate. This first approach is highly accurate, but costly and time-consuming. That is why recent efforts focus on automatic approaches, which represents a challenging task since it requires handling the uncertainty of extracted knowledge throughout its integration into the KG. We survey state-of-the-art approaches in this direction and present constructions of both open and enterprise KGs and how their quality is maintained. We then describe different knowledge extraction methods, introducing additional uncertainty. We also discuss downstream tasks after knowledge acquisition, including KG completion using embedding models, knowledge alignment, and knowledge fusion in order to address the problem of knowledge uncertainty in KG construction. We conclude with a discussion on the remaining challenges and perspectives when constructing a KG taking into account uncertainty.
A Systematic Review of Low-Rank and Local Low-Rank Matrix Approximation in Big Data Medical Imaging
Hamlomo, Sisipho, Atemkeng, Marcellin, Brima, Yusuf, Nunhokee, Chuneeta, Baxter, Jeremy
The large volume and complexity of medical imaging datasets are bottlenecks for storage, transmission, and processing. To tackle these challenges, the application of low-rank matrix approximation (LRMA) and its derivative, local LRMA (LLRMA) has demonstrated potential. A detailed analysis of the literature identifies LRMA and LLRMA methods applied to various imaging modalities, and the challenges and limitations associated with existing LRMA and LLRMA methods are addressed. We note a significant shift towards a preference for LLRMA in the medical imaging field since 2015, demonstrating its potential and effectiveness in capturing complex structures in medical data compared to LRMA. Acknowledging the limitations of shallow similarity methods used with LLRMA, we suggest advanced semantic image segmentation for similarity measure, explaining in detail how it can be used to measure similar patches and its feasibility. We note that LRMA and LLRMA are mainly applied to unstructured medical data, and we propose extending their application to different medical data types, including structured and semi-structured. This paper also discusses how LRMA and LLRMA can be applied to regular data with missing entries and the impact of inaccuracies in predicting missing values and their effects. We discuss the impact of patch size and propose the use of random search (RS) to determine the optimal patch size. To enhance feasibility, a hybrid approach using Bayesian optimization and RS is proposed, which could improve the application of LRMA and LLRMA in medical imaging.
Conformalized Late Fusion Multi-View Learning
Rivera, Eduardo Ochoa, Patel, Yash, Tewari, Ambuj
Uncertainty quantification for multi-view learning is motivated by the increasing use of multi-view data in scientific problems. A common variant of multi-view learning is late fusion: train separate predictors on individual views and combine them after single-view predictions are available. Existing methods for uncertainty quantification for late fusion often rely on undesirable distributional assumptions for validity. Conformal prediction is one approach that avoids such distributional assumptions. However, naively applying conformal prediction to late-stage fusion pipelines often produces overly conservative and uninformative prediction regions, limiting its downstream utility. We propose a novel methodology, Multi-View Conformal Prediction (MVCP), where conformal prediction is instead performed separately on the single-view predictors and only fused subsequently. Our framework extends the standard scalar formulation of a score function to a multivariate score that produces more efficient downstream prediction regions in both classification and regression settings. We then demonstrate that such improvements can be realized in methods built atop conformalized regressors, specifically in robust predict-then-optimize pipelines.
A Systematic Review on Custom Data Gloves
Belcamino, Valerio, Carfรฌ, Alessandro, Mastrogiovanni, Fulvio
Abstract--Hands are a fundamental tool humans use to interact with the environment and objects. Through hand motions, we can obtain information about the shape and materials of the surfaces we touch, modify our surroundings by interacting with objects, manipulate objects and tools, or communicate with other people by leveraging the power of gestures. For these reasons, sensorized gloves, which can collect information about hand motions and interactions, have been of interest since the 1980s in various fields, such as Human-Machine Interaction (HMI) and the analysis and control of human motions. Over the last 40 years, research in this field explored different technological approaches and contributed to the popularity of wearable custom and commercial products targeting hand sensorization. Despite a positive research trend, these instruments are not widespread yet outside research environments and devices aimed at research are often ad hoc solutions with a low chance of being reused. This paper aims to provide a systematic literature review for custom gloves to analyze their main characteristics and critical issues, from the type and number of sensors to the limitations due to device encumbrance. The collection of this information lays the foundation for a standardization process necessary for future breakthroughs in this research field. Figure 1: Hands are of the utmost importance for a variety of I. Human hands are peculiar body parts where two Studies in hand motion analysis can be categorized into two senses, namely proprioception and touch, are closely affected classes based on the adopted sensing modality, i.e., imagebased by each other. Approaches belonging to In general, proprioception relates to estimating one's motion the first class rely on suitably located cameras to collect and posture. Instead, traits of human behaviour, such as those related to motor approaches from the second class usually leverage sensors control and the associated cognitive processes. For these reasons, we will refer to the two the preferred physical medium enabling human-machine interaction, classes, respectively, with the more technology-oriented terms e.g., to use interfaces such as touchscreens or virtual vision-and wearable-based.
SpotNet: An Image Centric, Lidar Anchored Approach To Long Range Perception
Foucard, Louis, Khanna, Samar, Shi, Yi, Liu, Chi-Kuei, Shen, Quinn Z, Ngo, Thuyen, Xia, Zi-Xiang
In this paper, we propose SpotNet: a fast, single stage, image-centric but LiDAR anchored approach for long range 3D object detection. We demonstrate that our approach to LiDAR/image sensor fusion, combined with the joint learning of 2D and 3D detection tasks, can lead to accurate 3D object detection with very sparse LiDAR support. Unlike more recent bird's-eye-view (BEV) sensor-fusion methods which scale with range $r$ as $O(r^2)$, SpotNet scales as $O(1)$ with range. We argue that such an architecture is ideally suited to leverage each sensor's strength, i.e. semantic understanding from images and accurate range finding from LiDAR data. Finally we show that anchoring detections on LiDAR points removes the need to regress distances, and so the architecture is able to transfer from 2MP to 8MP resolution images without re-training.
AlabOS: A Python-based Reconfigurable Workflow Management Framework for Autonomous Laboratories
Fei, Yuxing, Rendy, Bernardus, Kumar, Rishi, Dartsi, Olympia, Sahasrabuddhe, Hrushikesh P., McDermott, Matthew J., Wang, Zheren, Szymanski, Nathan J., Walters, Lauren N., Milsted, David, Zeng, Yan, Jain, Anubhav, Ceder, Gerbrand
The recent advent of autonomous laboratories, coupled with algorithms for high-throughput screening and active learning, promises to accelerate materials discovery and innovation. As these autonomous systems grow in complexity, the demand for robust and efficient workflow management software becomes increasingly critical. In this paper, we introduce AlabOS, a general-purpose software framework for orchestrating experiments and managing resources, with an emphasis on automated laboratories for materials synthesis and characterization. We demonstrate the implementation of AlabOS in a prototype autonomous materials laboratory. AlabOS features a reconfigurable experiment workflow model, enabling the simultaneous execution of varied workflows composed of modular tasks. Therefore, AlabOS is well-suited to handle the rapidly changing experimental protocols defining the progress of self-driving laboratory development for materials research.
Federated Learning with Incomplete Sensing Modalities
Orzikulova, Adiba, Kwak, Jaehyun, Shin, Jaemin, Lee, Sung-Ju
Many mobile sensing applications utilize data from various modalities, including motion and physiological sensors in mobile and wearable devices. Federated Learning (FL) is particularly suitable for these applications thanks to its privacy-preserving feature. However, challenges such as limited battery life, poor network conditions, and sensor malfunctions can restrict the use of all available modalities for local model training. Additionally, existing multimodal FL systems also struggle with scalability and efficiency as the number of modality sources increases. To address these issues, we introduce FLISM, a framework designed to enable multimodal FL with incomplete modalities. FLISM leverages simulation technique to learn robust representations that can handle missing modalities and transfers model knowledge across clients with varying set of modalities. The evaluation results using three real-world datasets and simulations demonstrate FLISM's effective balance between model performance and system efficiency. It shows an average improvement of .067 in F1-score, while also reducing communication (2.69x faster) and computational (2.28x more efficient) overheads compared to existing methods addressing incomplete modalities. Moreover, in simulated scenarios involving tasks with a larger number of modalities, FLISM achieves a significant speedup of 3.23x~85.10x in communication and 3.73x~32.29x in computational efficiency.