AITopics | fusion technique

Collaborating Authors

fusion technique

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A review on data fusion in multimodal learning analytics and educational data mining

Chango, Wilson, Lara, Juan A., Cerezo, Rebeca, Romero, Cristóbal

arXiv.org Artificial IntelligenceNov-27-2025

Th e new educational models such as Smart Learning environments use of digita l and context - aware devices to facilitate the learning process . In this new educational scenario, a huge quantity of multimodal students' data from a variety of different sources can be captured, fused and analyze. It offers to researchers and educators a unique opportunity of being able to discover new knowledge to better understand the learning process and to intervene if necessary. However, it is necessary t o apply correctly d ata f usion approaches and techniques in order to combine various sources of Multimodal Learning Data (MLA) . The se sources or modalities in MLA include audio, video, electrodermal activity data, eye - tracking, user logs and click - stream data, but also learning artifacts and more natural human signals such as gestures, gaze, speech or writing. This survey introduces data fusion in Learning Analytics (LA) and Educational Data Mining (EDM) and how these data fusion techniques have been applied in Smart Learning. It shows the current state of the art by reviewing the main publications, the main type of fused educational data, and the data fusion approaches and techniques used in EDM/LA, as well as the main open problems, trends and challenges in th is specific research area.

artificial intelligence, data mining, information fusion, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1002/widm.1458

2511.20871

Country: Europe > Spain (0.28)

Genre:

Research Report (1.00)
Overview (1.00)
Instructional Material > Course Syllabus & Notes (0.46)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.94)
Health & Medicine > Therapeutic Area (0.93)
Education > Educational Setting > Higher Education (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Data Science > Data Integration (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)

Add feedback

Latent Space Data Fusion Outperforms Early Fusion in Multimodal Mental Health Digital Phenotyping Data

Barkat, Youcef, Hamitouche, Dylan, Parekh, Deven, Guo, Ivy, Benrimoh, David

arXiv.org Artificial IntelligenceJul-22-2025

Background: Mental illnesses such as depression and anxiety require improved methods for early detection and personalized intervention. Traditional predictive models often rely on unimodal data or early fusion strategies that fail to capture the complex, multimodal nature of psychiatric data. Advanced integration techniques, such as intermediate (latent space) fusion, may offer better accuracy and clinical utility. Methods: Using data from the BRIGHTEN clinical trial, we evaluated intermediate (latent space) fusion for predicting daily depressive symptoms (PHQ-2 scores). We compared early fusion implemented with a Random Forest (RF) model and intermediate fusion implemented via a Combined Model (CM) using autoencoders and a neural network. The dataset included behavioral (smartphone-based), demographic, and clinical features. Experiments were conducted across multiple temporal splits and data stream combinations. Performance was evaluated using mean squared error (MSE) and coefficient of determination (R2). Results: The CM outperformed both RF and Linear Regression (LR) baselines across all setups, achieving lower MSE (0.4985 vs. 0.5305 with RF) and higher R2 (0.4695 vs. 0.4356). The RF model showed signs of overfitting, with a large gap between training and test performance, while the CM maintained consistent generalization. Performance was best when integrating all data modalities in the CM (in contradistinction to RF), underscoring the value of latent space fusion for capturing non-linear interactions in complex psychiatric datasets. Conclusion: Latent space fusion offers a robust alternative to traditional fusion methods for prediction with multimodal mental health data. Future work should explore model interpretability and individual-level prediction for clinical deployment.

artificial intelligence, information fusion, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2507.14175

Country: North America > Canada > Quebec (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.67)

Technology:

Information Technology > Data Science > Data Integration (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.36)

Add feedback

Attention Is Not Always the Answer: Optimizing Voice Activity Detection with Simple Feature Fusion

Tripathi, Kumud, Kumar, Chowdam Venkata, Wasnik, Pankaj

arXiv.org Artificial IntelligenceJun-3-2025

V oice Activity Detection (V AD) plays a key role in speech processing, often utilizing hand-crafted or neural features. This study examines the effectiveness of Mel-Frequency Cepstral Coefficients (MFCCs) and pre-trained model (PTM) features, including wav2vec 2.0, HuBERT, WavLM, UniSpeech, MMS, and Whisper. We propose FusionV AD, a unified framework that combines both feature types using three fusion strategies: concatenation, addition, and cross-attention (CA). Experimental results reveal that simple fusion techniques, particularly addition, outperform CA in both accuracy and efficiency. Fusion-based models consistently surpass single-feature models, highlighting the complementary nature of MFCCs and PTM features. Notably, our best-performing fusion model exceeds the state-of-the-art Pyannote across multiple datasets, achieving an absolute average improvement of 2.04%. These results confirm that simple feature fusion enhances V AD robustness while maintaining computational efficiency.

artificial intelligence, machine learning, representation, (16 more...)

arXiv.org Artificial Intelligence

2506.01365

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.73)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.69)

Add feedback

A systematic review of challenges and proposed solutions in modeling multimodal data

Farhadizadeh, Maryam, Weymann, Maria, Blaß, Michael, Kraus, Johann, Gundler, Christopher, Walter, Sebastian, Hempen, Noah, Binder, Harald, Binder, Nadine

arXiv.org Machine LearningMay-16-2025

Multimodal data modeling has emerged as a powerful approach in clinical research, enabling the integration of diverse data types such as imaging, genomics, wearable sensors, and electronic health records. Despite its potential to improve diagnostic accuracy and support personalized care, modeling such heterogeneous data presents significant technical challenges. This systematic review synthesizes findings from 69 studies to identify common obstacles, including missing modalities, limited sample sizes, dimensionality imbalance, interpretability issues, and finding the optimal fusion techniques. We highlight recent methodological advances, such as transfer learning, generative models, attention mechanisms, and neural architecture search that offer promising solutions. By mapping current trends and innovations, this review provides a comprehensive overview of the field and offers practical insights to guide future research and development in multimodal modeling for medical applications.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

2505.06945

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Germany > Baden-Württemberg > Freiburg (0.05)
(8 more...)

Genre:

Research Report > Experimental Study (1.00)
Overview (1.00)
Research Report > Promising Solution (0.87)
Research Report > New Finding (0.87)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
(3 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
(2 more...)

Add feedback

Feature Fusion Revisited: Multimodal CTR Prediction for MMCTR Challenge

Zhou, Junjie

arXiv.org Artificial IntelligenceApr-29-2025

With the rapid advancement of Multimodal Large Language Models (MLLMs), an increasing number of researchers are exploring their application in recommendation systems. However, the high latency associated with large models presents a significant challenge for such use cases. The EReL@MIR workshop provided a valuable opportunity to experiment with various approaches aimed at improving the efficiency of multimodal representation learning for information retrieval tasks. As part of the competition's requirements, participants were mandated to submit a technical report detailing their methodologies and findings. Our team was honored to receive the award for Task 2 - Winner (Multimodal CTR Prediction). In this technical report, we present our methods and key findings. Additionally, we propose several directions for future work, particularly focusing on how to effectively integrate recommendation signals into multimodal representations. The codebase for our implementation is publicly available at: https://github.com/Lattice-zjj/MMCTR_Code, and the trained model weights can be accessed at: https://huggingface.co/FireFlyCourageous/MMCTR_DIN_MicroLens_1M_x1.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2504.18961

Country: Asia > China (0.14)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.35)

Add feedback

Enhance Vision-based Tactile Sensors via Dynamic Illumination and Image Fusion

Redkin, Artemii, Dugonjic, Zdravko, Lambeta, Mike, Calandra, Roberto

arXiv.org Artificial IntelligenceMar-27-2025

Vision-based tactile sensors use structured light to measure deformation in their elastomeric interface. Until now, vision-based tactile sensors such as DIGIT and GelSight have been using a single, static pattern of structured light tuned to the specific form factor of the sensor. In this work, we investigate the effectiveness of dynamic illumination patterns, in conjunction with image fusion techniques, to improve the quality of sensing of vision-based tactile sensors. Specifically, we propose to capture multiple measurements, each with a different illumination pattern, and then fuse them together to obtain a single, higher-quality measurement. Experimental results demonstrate that this type of dynamic illumination yields significant improvements in image contrast, sharpness, and background difference. This discovery opens the possibility of retroactively improving the sensing quality of existing vision-based tactile sensors with a simple software update, and for new hardware designs capable of fully exploiting dynamic illumination.

artificial intelligence, image understanding, sensor, (16 more...)

arXiv.org Artificial Intelligence

2504.00017

Country:

North America > United States > Massachusetts (0.04)
North America > United States > Maryland (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)
Europe > Germany > Saxony > Dresden (0.04)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Extending Dense Passage Retrieval with Temporal Information

Abdallah, Abdelrahman, Piryani, Bhawna, Wallat, Jonas, Anand, Avishek, Jatowt, Adam

arXiv.org Artificial IntelligenceFeb-28-2025

Temporal awareness is crucial in many information retrieval tasks, particularly in scenarios where the relevance of documents depends on their alignment with the query's temporal context. Traditional retrieval methods such as BM25 and Dense Passage Retrieval (DPR) excel at capturing lexical and semantic relevance but fall short in addressing time-sensitive queries. To bridge this gap, we introduce the temporal retrieval model that integrates explicit temporal signals by incorporating query timestamps and document dates into the representation space. Our approach ensures that retrieved passages are not only topically relevant but also temporally aligned with user intent. We evaluate our approach on two large-scale benchmark datasets, ArchivalQA and ChroniclingAmericaQA, achieving substantial performance gains over standard retrieval baselines. In particular, our model improves Top-1 retrieval accuracy by 6.63% and NDCG@10 by 3.79% on ArchivalQA, while yielding a 9.56% boost in Top-1 retrieval accuracy and 4.68% in NDCG@10 on ChroniclingAmericaQA. Additionally, we introduce a time-sensitive negative sampling strategy, which refines the model's ability to distinguish between temporally relevant and irrelevant documents during training. Our findings highlight the importance of explicitly modeling time in retrieval systems and set a new standard for handling temporally grounded queries.

accuracy, dataset, tempdpr, (13 more...)

arXiv.org Artificial Intelligence

2502.21024

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.05)
Europe > Austria > Tyrol > Innsbruck (0.05)
(14 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Large Multimodal Models for Low-Resource Languages: A Survey

Lupascu, Marian, Rogoz, Ana-Cristina, Stupariu, Mihai Sorin, Ionescu, Radu Tudor

arXiv.org Artificial IntelligenceFeb-8-2025

In this survey, we systematically analyze techniques used to adapt large multimodal models (LMMs) for low-resource (LR) languages, examining approaches ranging from visual enhancement and data creation to cross-modal transfer and fusion strategies. Through a comprehensive analysis of 106 studies across 75 LR languages, we identify key patterns in how researchers tackle the challenges of limited data and computational resources. We find that visual information often serves as a crucial bridge for improving model performance in LR settings, though significant challenges remain in areas such as hallucination mitigation and computational efficiency. We aim to provide researchers with a clear understanding of current approaches and remaining challenges in making LMMs more accessible to speakers of LR (understudied) languages. We complement our survey with an open-source repository available at: https://github.com/marianlupascu/LMM4LRL-Survey.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.05568

Country:

North America > United States (0.06)
Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.04)

Genre:

Research Report (0.82)
Overview (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Filters

Collaborating Authors

fusion technique

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

f0b42291ddab77dcb2ef8a3488301b62-Supplemental-Conference.pdf

A review on data fusion in multimodal learning analytics and educational data mining

Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding

Latent Space Data Fusion Outperforms Early Fusion in Multimodal Mental Health Digital Phenotyping Data

Attention Is Not Always the Answer: Optimizing Voice Activity Detection with Simple Feature Fusion

A systematic review of challenges and proposed solutions in modeling multimodal data

Feature Fusion Revisited: Multimodal CTR Prediction for MMCTR Challenge

Enhance Vision-based Tactile Sensors via Dynamic Illumination and Image Fusion

Extending Dense Passage Retrieval with Temporal Information

Large Multimodal Models for Low-Resource Languages: A Survey