AITopics | data integration

Collaborating Authors

data integration

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Neural decoding from stereotactic EEG: accounting for electrode variability across subjects

Neural Information Processing SystemsMar-22-2026, 10:12:28 GMT

Deep learning based neural decoding from stereotactic electroencephalography (sEEG) would likely benefit from scaling up both dataset and model size. To achieve this, combining data across multiple subjects is crucial. However, in sEEG cohorts, each subject has a variable number of electrodes placed at distinct locations in their brain, solely based on clinical needs. Such heterogeneity in electrode number/placement poses a significant challenge for data integration, since there is no clear correspondence of the neural activity recorded at distinct sites between individuals. Here we introduce seegnificant: a training framework and architecture that can be used to decode behavior across subjects using sEEG data.

artificial intelligence, machine learning, proceedings, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.58)

Add feedback

What was Doge? How Elon Musk tried to gamify government

The GuardianMar-17-2026, 05:00:39 GMT

In 2025, when Elon Musk joined the government as the de facto head of something called the "department of government efficiency", he declared that governments were poorly configured "big dumb machines". To the senator Ted Cruz, he explained that "the only way to reconcile the databases and get rid of waste and fraud is to actually look at the computers". Muskism came to Washington soaked in memes, adolescent boasts and sadistic victory dances over mass firings. Leading a team of teenage coders and mid-level managers drawn from his suite of companies, Musk aimed to enter the codebase and rewrite regulations and budget lines from within. He would drag the paper-pushing bureaucracy kicking and screaming into the digital 21st century, scanning the contents of cavernous rooms of filing cabinets and feeding the data into a single interoperable system. The undertaking combined features of private equity-led restructuring with startup management, shot through with the sensibility of gaming and rightwing culture war. To succeed, he would need "God mode", an overview of the whole. If the mandate of Doge was to "[modernise] federal technology and software to maximise governmental efficiency and productivity", in the words of the executive order that launched the initiative on 20 January 2025, the reality was a strengthening of the state's surveillance capacities. Over time, Musk had become convinced that the real bugs in the code were people, especially the non-white illegal immigrants whom he saw as pawns in a liberal scheme to corrupt democracy and beneficiaries of what he called "suicidal empathy". He understood empathy itself in coding terms.

artificial intelligence, musk, social media, (16 more...)

The Guardian

Country:

North America > United States > New York (0.04)
North America > United States > California (0.04)
Oceania > Australia (0.04)
(4 more...)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Government > Immigration & Customs (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

A Field Guide to Deploying AI Agents in Clinical Practice

Gallifant, Jack, Kellogg, Katherine C., Butler, Matt, Centi, Amanda, Chen, Shan, Doyle, Patrick F., Dutta, Sayon, Guo, Joyce, Hadfield, Matthew J., Kim, Esther H., Kozono, David E., Aerts, Hugo JWL, Landman, Adam B., Mak, Raymond H., Mishuris, Rebecca G., Nelson, Tanna L., Savova, Guergana K., Sharon, Elad, Silverman, Benjamin C., Topaloglu, Umit, Warner, Jeremy L., Bitterman, Danielle S.

arXiv.org Artificial IntelligenceDec-9-2025

Large language models (LLMs) integrated into agent-driven workflows hold immense promise for healthcare, yet a significant gap exists between their potential and practical implementation within clinical settings. To address this, we present a practitioner-oriented field manual for deploying generative agents that use electronic health record (EHR) data. This guide is informed by our experience deploying the "irAE-Agent", an automated system to detect immune-related adverse events from clinical notes at Mass General Brigham, and by structured interviews with 21 clinicians, engineers, and informatics leaders involved in the project. Our analysis reveals a critical misalignment in clinical AI development: less than 20% of our effort was dedicated to prompt engineering and model development, while over 80% was consumed by the sociotechnical work of implementation. We distill this effort into five "heavy lifts": data integration, model validation, ensuring economic value, managing system drift, and governance. By providing actionable solutions for each of these challenges, this field manual shifts the focus from algorithmic development to the essential infrastructure and implementation work required to bridge the "valley of death" and successfully translate generative AI from pilot projects into routine clinical care.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2509.26153

Country: North America > United States (1.00)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)

Add feedback

KGpipe: Generation and Evaluation of Pipelines for Data Integration into Knowledge Graphs

Hofer, Marvin, Rahm, Erhard

arXiv.org Artificial IntelligenceNov-25-2025

Building high-quality knowledge graphs (KGs) from diverse sources requires combining methods for information extraction, data transformation, ontology mapping, entity matching, and data fusion. Numerous methods and tools exist for each of these tasks, but support for combining them into reproducible and effective end-to-end pipelines is still lacking. We present a new framework, KGpipe for defining and executing integration pipelines that can combine existing tools or LLM (Large Language Model) functionality. To evaluate different pipelines and the resulting KGs, we propose a benchmark to integrate heterogeneous data of different formats (RDF, JSON, text) into a seed KG. We demonstrate the flexibility of KGpipe by running and comparatively evaluating several pipelines integrating sources of the same or different formats using selected performance and quality metrics.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2511.18364

Country:

North America > United States (0.46)
Europe > Germany (0.29)

Genre: Research Report (0.82)

Industry:

Media > Film (0.93)
Leisure & Entertainment (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

A Multi-Agent System for Semantic Mapping of Relational Data to Knowledge Graphs

Trajanoska, Milena, Stojanov, Riste, Trajanov, Dimitar

arXiv.org Artificial IntelligenceNov-11-2025

Enterprises often maintain multiple databases for storing critical business data in siloed systems, resulting in inefficiencies and challenges with data interoperability. A key to overcoming these challenges lies in integrating disparate data sources, enabling businesses to unlock the full potential of their data. Our work presents a novel approach for integrating multiple databases using knowledge graphs, focusing on the application of large language models as semantic agents for mapping and connecting structured data across systems by leveraging existing vocabularies. The proposed methodology introduces a semantic layer above tables in relational databases, utilizing a system comprising multiple LLM agents that map tables and columns to Schema.org terms. Our approach achieves a mapping accuracy of over 90% in multiple domains.

agent, artificial intelligence, knowledge graph, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.5281/zenodo.16913321

2511.06455

Country: Europe > North Macedonia (0.14)

Genre: Research Report > New Finding (0.47)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

A Multimodal Foundation Model to Enhance Generalizability and Data Efficiency for Pan-cancer Prognosis Prediction

Zhou, Huajun, Zhou, Fengtao, Ma, Jiabo, Xu, Yingxue, Wang, Xi, Zhang, Xiuming, Liang, Li, Li, Zhenhui, Chen, Hao

arXiv.org Artificial IntelligenceSep-17-2025

Multimodal data provides heterogeneous information for a holistic understanding of the tumor microenvironment. However, existing AI models often struggle to harness the rich information within multimodal data and extract poorly generalizable representations. Here we present MICE (Multimodal data Integration via Collaborative Experts), a multimodal foundation model that effectively integrates pathology images, clinical reports, and genomics data for precise pan-cancer prognosis prediction. Instead of conventional multi-expert modules, MICE employs multiple functionally diverse experts to comprehensively capture both cross-cancer and cancer-specific insights. Leveraging data from 11,799 patients across 30 cancer types, we enhanced MICE's generalizability by coupling contrastive and supervised learning. MICE outperformed both unimodal and state-of-the-art multi-expert-based multimodal models, demonstrating substantial improvements in C-index ranging from 3.8% to 11.2% on internal cohorts and 5.8% to 8.8% on independent cohorts, respectively. Moreover, it exhibited remarkable data efficiency across diverse clinical scenarios. With its enhanced generalizability and data efficiency, MICE establishes an effective and scalable foundation for pan-cancer prognosis prediction, holding strong potential to personalize tailored therapies and improve treatment outcomes.

artificial intelligence, information fusion, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.126

Country: Asia > China (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.93)
Health & Medicine > Therapeutic Area > Oncology > Carcinoma (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.68)

Add feedback

Empowering Tabular Data Preparation with Language Models: Why and How?

Chen, Mengshi, Sun, Yuxiang, Li, Tengchao, Wang, Jianwei, Wang, Kai, Lin, Xuemin, Zhang, Ying, Zhang, Wenjie

arXiv.org Artificial IntelligenceAug-5-2025

Data preparation is a critical step in enhancing the usability of tabular data and thus boosts downstream data-driven tasks. Traditional methods often face challenges in capturing the intricate relationships within tables and adapting to the tasks involved. Recent advances in Language Models (LMs), especially in Large Language Models (LLMs), offer new opportunities to automate and support tabular data preparation. However, why LMs suit tabular data preparation (i.e., how their capabilities match task demands) and how to use them effectively across phases still remain to be systematically explored. In this survey, we systematically analyze the role of LMs in enhancing tabular data preparation processes, focusing on four core phases: data acquisition, integration, cleaning, and transformation. For each phase, we present an integrated analysis of how LMs can be combined with other components for different preparation tasks, highlight key advancements, and outline prospective pipelines.

data mining, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2508.01556

Country:

North America > United States (0.93)
Europe (0.93)
Asia > Middle East > UAE (0.28)

Genre:

Research Report (1.00)
Overview (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Neural decoding from stereotactic EEG: accounting for electrode variability across subjects

Neural Information Processing SystemsMay-27-2025, 15:41:46 GMT

electrode variability, neural representation, stereotactic eeg, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.60)

Add feedback

Generalized probabilistic canonical correlation analysis for multi-modal data integration with full or partial observations

Yang, Tianjian, Li, Wei Vivian

arXiv.org Machine LearningApr-15-2025

Generalized Probabilistic Canonical Correlation Analysis for Multi-modal Data Integration with Full or Partial Observations Tianjian Y ang 1 and Wei Vivian Li 1,* 1 Department of Statistics, University of California, Riverside * T o whom correspondence should be addressed: weil@ucr.edu Abstract Background: The integration and analysis of multi-modal data are increasingly essential across various domains including bioinformatics. As the volume and complexity of such data grow, there is a pressing need for computational models that not only integrate diverse modalities but also leverage their complementary information to improve clustering accuracy and insights, especially when dealing with partial observations with missing data. Results: We propose Generalized Probabilistic Canonical Correlation Analysis (GPCCA), an unsupervised method for the integration and joint dimensionality reduction of multi-modal data. GPCCA addresses key challenges in multi-modal data analysis by handling missing values within the model, enabling the integration of more than two modalities, and identifying informative features while accounting for correlations within individual modalities. The model demonstrates robustness to various missing data patterns and provides low-dimensional embeddings that facilitate downstream clustering and analysis. In a range of simulation settings, GPCCA outperforms existing methods in capturing essential patterns across modalities. Additionally, we demonstrate its applicability to multi-omics data from TCGA cancer datasets and a multi-view image dataset. Conclusion: GPCCA offers a useful framework for multi-modal data integration, effectively handling missing data and providing informative low-dimensional embeddings. Its performance across cancer genomics and multi-view image data highlights its robustness and potential for broad application. T o make the method accessible to the wider research community, we have released an R package, GPCCA, which is available at https://github.com/Kaversoniano/ GPCCA . 1 Introduction Many real-world datasets can be described from multiple perspectives, where each perspective, typically represented as a matrix, corresponds to a data modality . A dataset that is consisted of multiple modalities collected from the same set of individuals is termed a multi-modal dataset [1]. Examples include medical imaging data combining computed tomography (CT) and magnetic 1 arXiv:2504.11610v1 T echnological advances have made the collection of multi-modal data increasingly prevalent, enabling integrative analyses that leverage information across modalities.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Machine Learning

2504.1161

Country:

North America > United States > California > Riverside County > Riverside (0.24)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.68)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.89)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

GridMind: A Multi-Agent NLP Framework for Unified, Cross-Modal NFL Data Insights

Chipka, Jordan, Moyer, Chris, Troyer, Clay, Fuelling, Tyler, Hochstedler, Jeremy

arXiv.org Artificial IntelligenceApr-15-2025

The rapid growth of big data and advancements in computational techniques have significantly transformed sports analytics. However, the diverse range of data sources -- including structured statistics, semi-structured formats like sensor data, and unstructured media such as written articles, audio, and video -- creates substantial challenges in extracting actionable insights. These various formats, often referred to as multimodal data, require integration to fully leverage their potential. Conventional systems, which typically prioritize structured data, face limitations when processing and combining these diverse content types, reducing their effectiveness in real-time sports analysis. To address these challenges, recent research highlights the importance of multimodal data integration for capturing the complexity of real-world sports environments. Building on this foundation, this paper introduces GridMind, a multi-agent framework that unifies structured, semi-structured, and unstructured data through Retrieval-Augmented Generation (RAG) and large language models (LLMs) to facilitate natural language querying of NFL data. This approach aligns with the evolving field of multimodal representation learning, where unified models are increasingly essential for real-time, cross-modal interactions. GridMind's distributed architecture includes specialized agents that autonomously manage each stage of a prompt -- from interpretation and data retrieval to response synthesis. This modular design enables flexible, scalable handling of multimodal data, allowing users to pose complex, context-rich questions and receive comprehensive, intuitive responses via a conversational interface.

gridmind, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2504.08747

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Sports > Football (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback