fusion technique
- Information Technology > Artificial Intelligence > Vision (0.88)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.86)
- Information Technology > Artificial Intelligence > Machine Learning (0.71)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.51)
A review on data fusion in multimodal learning analytics and educational data mining
Chango, Wilson, Lara, Juan A., Cerezo, Rebeca, Romero, Cristóbal
Th e new educational models such as Smart Learning environments use of digita l and context - aware devices to facilitate the learning process . In this new educational scenario, a huge quantity of multimodal students' data from a variety of different sources can be captured, fused and analyze. It offers to researchers and educators a unique opportunity of being able to discover new knowledge to better understand the learning process and to intervene if necessary. However, it is necessary t o apply correctly d ata f usion approaches and techniques in order to combine various sources of Multimodal Learning Data (MLA) . The se sources or modalities in MLA include audio, video, electrodermal activity data, eye - tracking, user logs and click - stream data, but also learning artifacts and more natural human signals such as gestures, gaze, speech or writing. This survey introduces data fusion in Learning Analytics (LA) and Educational Data Mining (EDM) and how these data fusion techniques have been applied in Smart Learning. It shows the current state of the art by reviewing the main publications, the main type of fused educational data, and the data fusion approaches and techniques used in EDM/LA, as well as the main open problems, trends and challenges in th is specific research area.
- South America > Ecuador (0.04)
- South America > Brazil (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (2 more...)
- Research Report (1.00)
- Overview (1.00)
- Instructional Material > Course Syllabus & Notes (0.46)
- Education > Educational Setting > Online (1.00)
- Education > Educational Technology > Educational Software > Computer Based Training (0.94)
- Health & Medicine > Therapeutic Area (0.93)
- Education > Educational Setting > Higher Education (0.68)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Data Science > Data Integration (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Vision (0.88)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.86)
- Information Technology > Artificial Intelligence > Machine Learning (0.71)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.51)
Latent Space Data Fusion Outperforms Early Fusion in Multimodal Mental Health Digital Phenotyping Data
Barkat, Youcef, Hamitouche, Dylan, Parekh, Deven, Guo, Ivy, Benrimoh, David
Background: Mental illnesses such as depression and anxiety require improved methods for early detection and personalized intervention. Traditional predictive models often rely on unimodal data or early fusion strategies that fail to capture the complex, multimodal nature of psychiatric data. Advanced integration techniques, such as intermediate (latent space) fusion, may offer better accuracy and clinical utility. Methods: Using data from the BRIGHTEN clinical trial, we evaluated intermediate (latent space) fusion for predicting daily depressive symptoms (PHQ-2 scores). We compared early fusion implemented with a Random Forest (RF) model and intermediate fusion implemented via a Combined Model (CM) using autoencoders and a neural network. The dataset included behavioral (smartphone-based), demographic, and clinical features. Experiments were conducted across multiple temporal splits and data stream combinations. Performance was evaluated using mean squared error (MSE) and coefficient of determination (R2). Results: The CM outperformed both RF and Linear Regression (LR) baselines across all setups, achieving lower MSE (0.4985 vs. 0.5305 with RF) and higher R2 (0.4695 vs. 0.4356). The RF model showed signs of overfitting, with a large gap between training and test performance, while the CM maintained consistent generalization. Performance was best when integrating all data modalities in the CM (in contradistinction to RF), underscoring the value of latent space fusion for capturing non-linear interactions in complex psychiatric datasets. Conclusion: Latent space fusion offers a robust alternative to traditional fusion methods for prediction with multimodal mental health data. Future work should explore model interpretability and individual-level prediction for clinical deployment.
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.36)
Attention Is Not Always the Answer: Optimizing Voice Activity Detection with Simple Feature Fusion
Tripathi, Kumud, Kumar, Chowdam Venkata, Wasnik, Pankaj
V oice Activity Detection (V AD) plays a key role in speech processing, often utilizing hand-crafted or neural features. This study examines the effectiveness of Mel-Frequency Cepstral Coefficients (MFCCs) and pre-trained model (PTM) features, including wav2vec 2.0, HuBERT, WavLM, UniSpeech, MMS, and Whisper. We propose FusionV AD, a unified framework that combines both feature types using three fusion strategies: concatenation, addition, and cross-attention (CA). Experimental results reveal that simple fusion techniques, particularly addition, outperform CA in both accuracy and efficiency. Fusion-based models consistently surpass single-feature models, highlighting the complementary nature of MFCCs and PTM features. Notably, our best-performing fusion model exceeds the state-of-the-art Pyannote across multiple datasets, achieving an absolute average improvement of 2.04%. These results confirm that simple feature fusion enhances V AD robustness while maintaining computational efficiency.
A systematic review of challenges and proposed solutions in modeling multimodal data
Farhadizadeh, Maryam, Weymann, Maria, Blaß, Michael, Kraus, Johann, Gundler, Christopher, Walter, Sebastian, Hempen, Noah, Binder, Harald, Binder, Nadine
Multimodal data modeling has emerged as a powerful approach in clinical research, enabling the integration of diverse data types such as imaging, genomics, wearable sensors, and electronic health records. Despite its potential to improve diagnostic accuracy and support personalized care, modeling such heterogeneous data presents significant technical challenges. This systematic review synthesizes findings from 69 studies to identify common obstacles, including missing modalities, limited sample sizes, dimensionality imbalance, interpretability issues, and finding the optimal fusion techniques. We highlight recent methodological advances, such as transfer learning, generative models, attention mechanisms, and neural architecture search that offer promising solutions. By mapping current trends and innovations, this review provides a comprehensive overview of the field and offers practical insights to guide future research and development in multimodal modeling for medical applications.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Europe > Germany > Baden-Württemberg > Freiburg (0.05)
- (8 more...)
- Research Report > Experimental Study (1.00)
- Overview (1.00)
- Research Report > Promising Solution (0.87)
- Research Report > New Finding (0.87)
Feature Fusion Revisited: Multimodal CTR Prediction for MMCTR Challenge
With the rapid advancement of Multimodal Large Language Models (MLLMs), an increasing number of researchers are exploring their application in recommendation systems. However, the high latency associated with large models presents a significant challenge for such use cases. The EReL@MIR workshop provided a valuable opportunity to experiment with various approaches aimed at improving the efficiency of multimodal representation learning for information retrieval tasks. As part of the competition's requirements, participants were mandated to submit a technical report detailing their methodologies and findings. Our team was honored to receive the award for Task 2 - Winner (Multimodal CTR Prediction). In this technical report, we present our methods and key findings. Additionally, we propose several directions for future work, particularly focusing on how to effectively integrate recommendation signals into multimodal representations. The codebase for our implementation is publicly available at: https://github.com/Lattice-zjj/MMCTR_Code, and the trained model weights can be accessed at: https://huggingface.co/FireFlyCourageous/MMCTR_DIN_MicroLens_1M_x1.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.52)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.35)
Enhance Vision-based Tactile Sensors via Dynamic Illumination and Image Fusion
Redkin, Artemii, Dugonjic, Zdravko, Lambeta, Mike, Calandra, Roberto
Vision-based tactile sensors use structured light to measure deformation in their elastomeric interface. Until now, vision-based tactile sensors such as DIGIT and GelSight have been using a single, static pattern of structured light tuned to the specific form factor of the sensor. In this work, we investigate the effectiveness of dynamic illumination patterns, in conjunction with image fusion techniques, to improve the quality of sensing of vision-based tactile sensors. Specifically, we propose to capture multiple measurements, each with a different illumination pattern, and then fuse them together to obtain a single, higher-quality measurement. Experimental results demonstrate that this type of dynamic illumination yields significant improvements in image contrast, sharpness, and background difference. This discovery opens the possibility of retroactively improving the sensing quality of existing vision-based tactile sensors with a simple software update, and for new hardware designs capable of fully exploiting dynamic illumination.
- North America > United States > Massachusetts (0.04)
- North America > United States > Maryland (0.04)
- North America > United States > California > San Mateo County > Menlo Park (0.04)
- Europe > Germany > Saxony > Dresden (0.04)
Extending Dense Passage Retrieval with Temporal Information
Abdallah, Abdelrahman, Piryani, Bhawna, Wallat, Jonas, Anand, Avishek, Jatowt, Adam
Temporal awareness is crucial in many information retrieval tasks, particularly in scenarios where the relevance of documents depends on their alignment with the query's temporal context. Traditional retrieval methods such as BM25 and Dense Passage Retrieval (DPR) excel at capturing lexical and semantic relevance but fall short in addressing time-sensitive queries. To bridge this gap, we introduce the temporal retrieval model that integrates explicit temporal signals by incorporating query timestamps and document dates into the representation space. Our approach ensures that retrieved passages are not only topically relevant but also temporally aligned with user intent. We evaluate our approach on two large-scale benchmark datasets, ArchivalQA and ChroniclingAmericaQA, achieving substantial performance gains over standard retrieval baselines. In particular, our model improves Top-1 retrieval accuracy by 6.63% and NDCG@10 by 3.79% on ArchivalQA, while yielding a 9.56% boost in Top-1 retrieval accuracy and 4.68% in NDCG@10 on ChroniclingAmericaQA. Additionally, we introduce a time-sensitive negative sampling strategy, which refines the model's ability to distinguish between temporally relevant and irrelevant documents during training. Our findings highlight the importance of explicitly modeling time in retrieval systems and set a new standard for handling temporally grounded queries.
- Asia > China (0.28)
- Europe > Austria > Tyrol (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- (9 more...)
Large Multimodal Models for Low-Resource Languages: A Survey
Lupascu, Marian, Rogoz, Ana-Cristina, Stupariu, Mihai Sorin, Ionescu, Radu Tudor
In this survey, we systematically analyze techniques used to adapt large multimodal models (LMMs) for low-resource (LR) languages, examining approaches ranging from visual enhancement and data creation to cross-modal transfer and fusion strategies. Through a comprehensive analysis of 106 studies across 75 LR languages, we identify key patterns in how researchers tackle the challenges of limited data and computational resources. We find that visual information often serves as a crucial bridge for improving model performance in LR settings, though significant challenges remain in areas such as hallucination mitigation and computational efficiency. We aim to provide researchers with a clear understanding of current approaches and remaining challenges in making LMMs more accessible to speakers of LR (understudied) languages. We complement our survey with an open-source repository available at: https://github.com/marianlupascu/LMM4LRL-Survey.
- North America > United States (0.06)
- Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.04)
- Research Report (0.82)
- Overview (0.68)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)