data integration
Neural decoding from stereotactic EEG: accounting for electrode variability across subjects
Deep learning based neural decoding from stereotactic electroencephalography (sEEG) would likely benefit from scaling up both dataset and model size. To achieve this, combining data across multiple subjects is crucial. However, in sEEG cohorts, each subject has a variable number of electrodes placed at distinct locations in their brain, solely based on clinical needs. Such heterogeneity in electrode number/placement poses a significant challenge for data integration, since there is no clear correspondence of the neural activity recorded at distinct sites between individuals. Here we introduce seegnificant: a training framework and architecture that can be used to decode behavior across subjects using sEEG data.
A Field Guide to Deploying AI Agents in Clinical Practice
Gallifant, Jack, Kellogg, Katherine C., Butler, Matt, Centi, Amanda, Chen, Shan, Doyle, Patrick F., Dutta, Sayon, Guo, Joyce, Hadfield, Matthew J., Kim, Esther H., Kozono, David E., Aerts, Hugo JWL, Landman, Adam B., Mak, Raymond H., Mishuris, Rebecca G., Nelson, Tanna L., Savova, Guergana K., Sharon, Elad, Silverman, Benjamin C., Topaloglu, Umit, Warner, Jeremy L., Bitterman, Danielle S.
Large language models (LLMs) integrated into agent-driven workflows hold immense promise for healthcare, yet a significant gap exists between their potential and practical implementation within clinical settings. To address this, we present a practitioner-oriented field manual for deploying generative agents that use electronic health record (EHR) data. This guide is informed by our experience deploying the "irAE-Agent", an automated system to detect immune-related adverse events from clinical notes at Mass General Brigham, and by structured interviews with 21 clinicians, engineers, and informatics leaders involved in the project. Our analysis reveals a critical misalignment in clinical AI development: less than 20% of our effort was dedicated to prompt engineering and model development, while over 80% was consumed by the sociotechnical work of implementation. We distill this effort into five "heavy lifts": data integration, model validation, ensuring economic value, managing system drift, and governance. By providing actionable solutions for each of these challenges, this field manual shifts the focus from algorithmic development to the essential infrastructure and implementation work required to bridge the "valley of death" and successfully translate generative AI from pilot projects into routine clinical care.
- North America > United States > Gulf of Mexico > Central GOM (0.44)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)
KGpipe: Generation and Evaluation of Pipelines for Data Integration into Knowledge Graphs
Building high-quality knowledge graphs (KGs) from diverse sources requires combining methods for information extraction, data transformation, ontology mapping, entity matching, and data fusion. Numerous methods and tools exist for each of these tasks, but support for combining them into reproducible and effective end-to-end pipelines is still lacking. We present a new framework, KGpipe for defining and executing integration pipelines that can combine existing tools or LLM (Large Language Model) functionality. To evaluate different pipelines and the resulting KGs, we propose a benchmark to integrate heterogeneous data of different formats (RDF, JSON, text) into a seed KG. We demonstrate the flexibility of KGpipe by running and comparatively evaluating several pipelines integrating sources of the same or different formats using selected performance and quality metrics.
- Europe > Germany > Saxony > Leipzig (1.00)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- (9 more...)
- Media > Film (0.93)
- Leisure & Entertainment (0.93)
A Multi-Agent System for Semantic Mapping of Relational Data to Knowledge Graphs
Trajanoska, Milena, Stojanov, Riste, Trajanov, Dimitar
Enterprises often maintain multiple databases for storing critical business data in siloed systems, resulting in inefficiencies and challenges with data interoperability. A key to overcoming these challenges lies in integrating disparate data sources, enabling businesses to unlock the full potential of their data. Our work presents a novel approach for integrating multiple databases using knowledge graphs, focusing on the application of large language models as semantic agents for mapping and connecting structured data across systems by leveraging existing vocabularies. The proposed methodology introduces a semantic layer above tables in relational databases, utilizing a system comprising multiple LLM agents that map tables and columns to Schema.org terms. Our approach achieves a mapping accuracy of over 90% in multiple domains.
A Multimodal Foundation Model to Enhance Generalizability and Data Efficiency for Pan-cancer Prognosis Prediction
Zhou, Huajun, Zhou, Fengtao, Ma, Jiabo, Xu, Yingxue, Wang, Xi, Zhang, Xiuming, Liang, Li, Li, Zhenhui, Chen, Hao
Multimodal data provides heterogeneous information for a holistic understanding of the tumor microenvironment. However, existing AI models often struggle to harness the rich information within multimodal data and extract poorly generalizable representations. Here we present MICE (Multimodal data Integration via Collaborative Experts), a multimodal foundation model that effectively integrates pathology images, clinical reports, and genomics data for precise pan-cancer prognosis prediction. Instead of conventional multi-expert modules, MICE employs multiple functionally diverse experts to comprehensively capture both cross-cancer and cancer-specific insights. Leveraging data from 11,799 patients across 30 cancer types, we enhanced MICE's generalizability by coupling contrastive and supervised learning. MICE outperformed both unimodal and state-of-the-art multi-expert-based multimodal models, demonstrating substantial improvements in C-index ranging from 3.8% to 11.2% on internal cohorts and 5.8% to 8.8% on independent cohorts, respectively. Moreover, it exhibited remarkable data efficiency across diverse clinical scenarios. With its enhanced generalizability and data efficiency, MICE establishes an effective and scalable foundation for pan-cancer prognosis prediction, holding strong potential to personalize tailored therapies and improve treatment outcomes.
- Asia > China > Hong Kong (0.05)
- Asia > China > Yunnan Province > Kunming (0.04)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (0.93)
- Health & Medicine > Therapeutic Area > Oncology > Carcinoma (0.67)
Empowering Tabular Data Preparation with Language Models: Why and How?
Chen, Mengshi, Sun, Yuxiang, Li, Tengchao, Wang, Jianwei, Wang, Kai, Lin, Xuemin, Zhang, Ying, Zhang, Wenjie
Data preparation is a critical step in enhancing the usability of tabular data and thus boosts downstream data-driven tasks. Traditional methods often face challenges in capturing the intricate relationships within tables and adapting to the tasks involved. Recent advances in Language Models (LMs), especially in Large Language Models (LLMs), offer new opportunities to automate and support tabular data preparation. However, why LMs suit tabular data preparation (i.e., how their capabilities match task demands) and how to use them effectively across phases still remain to be systematically explored. In this survey, we systematically analyze the role of LMs in enhancing tabular data preparation processes, focusing on four core phases: data acquisition, integration, cleaning, and transformation. For each phase, we present an integrated analysis of how LMs can be combined with other components for different preparation tasks, highlight key advancements, and outline prospective pipelines.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- (14 more...)
- Research Report (1.00)
- Overview (1.00)
Neural decoding from stereotactic EEG: accounting for electrode variability across subjects
Deep learning based neural decoding from stereotactic electroencephalography (sEEG) would likely benefit from scaling up both dataset and model size. To achieve this, combining data across multiple subjects is crucial. However, in sEEG cohorts, each subject has a variable number of electrodes placed at distinct locations in their brain, solely based on clinical needs. Such heterogeneity in electrode number/placement poses a significant challenge for data integration, since there is no clear correspondence of the neural activity recorded at distinct sites between individuals. Here we introduce seegnificant: a training framework and architecture that can be used to decode behavior across subjects using sEEG data.
Generalized probabilistic canonical correlation analysis for multi-modal data integration with full or partial observations
Yang, Tianjian, Li, Wei Vivian
Generalized Probabilistic Canonical Correlation Analysis for Multi-modal Data Integration with Full or Partial Observations Tianjian Y ang 1 and Wei Vivian Li 1,* 1 Department of Statistics, University of California, Riverside * T o whom correspondence should be addressed: weil@ucr.edu Abstract Background: The integration and analysis of multi-modal data are increasingly essential across various domains including bioinformatics. As the volume and complexity of such data grow, there is a pressing need for computational models that not only integrate diverse modalities but also leverage their complementary information to improve clustering accuracy and insights, especially when dealing with partial observations with missing data. Results: We propose Generalized Probabilistic Canonical Correlation Analysis (GPCCA), an unsupervised method for the integration and joint dimensionality reduction of multi-modal data. GPCCA addresses key challenges in multi-modal data analysis by handling missing values within the model, enabling the integration of more than two modalities, and identifying informative features while accounting for correlations within individual modalities. The model demonstrates robustness to various missing data patterns and provides low-dimensional embeddings that facilitate downstream clustering and analysis. In a range of simulation settings, GPCCA outperforms existing methods in capturing essential patterns across modalities. Additionally, we demonstrate its applicability to multi-omics data from TCGA cancer datasets and a multi-view image dataset. Conclusion: GPCCA offers a useful framework for multi-modal data integration, effectively handling missing data and providing informative low-dimensional embeddings. Its performance across cancer genomics and multi-view image data highlights its robustness and potential for broad application. T o make the method accessible to the wider research community, we have released an R package, GPCCA, which is available at https://github.com/Kaversoniano/ GPCCA . 1 Introduction Many real-world datasets can be described from multiple perspectives, where each perspective, typically represented as a matrix, corresponds to a data modality . A dataset that is consisted of multiple modalities collected from the same set of individuals is termed a multi-modal dataset [1]. Examples include medical imaging data combining computed tomography (CT) and magnetic 1 arXiv:2504.11610v1 T echnological advances have made the collection of multi-modal data increasingly prevalent, enabling integrative analyses that leverage information across modalities.
- North America > United States > California > Riverside County > Riverside (0.24)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Middle East > Jordan (0.04)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.89)
GridMind: A Multi-Agent NLP Framework for Unified, Cross-Modal NFL Data Insights
Chipka, Jordan, Moyer, Chris, Troyer, Clay, Fuelling, Tyler, Hochstedler, Jeremy
The rapid growth of big data and advancements in computational techniques have significantly transformed sports analytics. However, the diverse range of data sources -- including structured statistics, semi-structured formats like sensor data, and unstructured media such as written articles, audio, and video -- creates substantial challenges in extracting actionable insights. These various formats, often referred to as multimodal data, require integration to fully leverage their potential. Conventional systems, which typically prioritize structured data, face limitations when processing and combining these diverse content types, reducing their effectiveness in real-time sports analysis. To address these challenges, recent research highlights the importance of multimodal data integration for capturing the complexity of real-world sports environments. Building on this foundation, this paper introduces GridMind, a multi-agent framework that unifies structured, semi-structured, and unstructured data through Retrieval-Augmented Generation (RAG) and large language models (LLMs) to facilitate natural language querying of NFL data. This approach aligns with the evolving field of multimodal representation learning, where unified models are increasingly essential for real-time, cross-modal interactions. GridMind's distributed architecture includes specialized agents that autonomously manage each stage of a prompt -- from interpretation and data retrieval to response synthesis. This modular design enables flexible, scalable handling of multimodal data, allowing users to pose complex, context-rich questions and receive comprehensive, intuitive responses via a conversational interface.
- North America > United States > Minnesota (0.04)
- North America > United States > Indiana > Marion County > Indianapolis (0.04)
- Asia > Middle East > Jordan (0.04)
Multimodal Data Integration for Sustainable Indoor Gardening: Tracking Anyplant with Time Series Foundation Model
Nabaei, Seyed Hamidreza, Zheng, Zeyang, Chen, Dong, Heydarian, Arsalan
Indoor gardening within sustainable buildings offers a transformative solution to urban food security and environmental sustainability. By 2030, urban farming, including Controlled Environment Agriculture (CEA) and vertical farming, is expected to grow at a compound annual growth rate (CAGR) of 13.2% from 2024 to 2030, according to market reports. This growth is fueled by advancements in Internet of Things (IoT) technologies, sustainable innovations such as smart growing systems, and the rising interest in green interior design. This paper presents a novel framework that integrates computer vision, machine learning (ML), and environmental sensing for the automated monitoring of plant health and growth. Unlike previous approaches, this framework combines RGB imagery, plant phenotyping data, and environmental factors such as temperature and humidity, to predict plant water stress in a controlled growth environment. The system utilizes high-resolution cameras to extract phenotypic features, such as RGB, plant area, height, and width while employing the Lag-Llama time series model to analyze and predict water stress. Experimental results demonstrate that integrating RGB, size ratios, and environmental data significantly enhances predictive accuracy, with the Fine-tuned model achieving the lowest errors (MSE = 0.420777, MAE = 0.595428) and reduced uncertainty. These findings highlight the potential of multimodal data and intelligent systems to automate plant care, optimize resource consumption, and align indoor gardening with sustainable building management practices, paving the way for resilient, green urban spaces.
- North America > United States > Virginia > Albemarle County > Charlottesville (0.15)
- North America > United States > Mississippi > Oktibbeha County > Starkville (0.04)
- Europe > Switzerland (0.04)
- Europe > Spain > Aragón (0.04)
- Research Report > New Finding (0.49)
- Research Report > Promising Solution (0.34)
- Health & Medicine (1.00)
- Food & Agriculture > Agriculture (1.00)
- Information Technology (0.89)