AITopics | cluster 1

Collaborating Authors

cluster 1

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Brain Diffusion for Visual Exploration: Cortical Discovery using Large Scale Generative Models

Neural Information Processing SystemsApr-30-2026, 05:56:49 GMT

A long standing goal in neuroscience has been to elucidate the functional organization of the brain. Within higher visual cortex, functional accounts have remained relatively coarse, focusing on regions of interest (ROIs) and taking the form of selectivity for broad categories such as faces, places, bodies, food, or words. Because the identification of such ROIs has typically relied on manually assembled stimulus sets consisting of isolated objects in non-ecological contexts, exploring functional organization without robust a priori hypotheses has been challenging. To overcome these limitations, we introduce a data-driven approach in which we synthesize images predicted to activate a given brain region using paired natural images and fMRI recordings, bypassing the need for category-specific stimuli. Our approach - Brain Diffusion for Visual Exploration ("BrainDiVE") - builds on recent generative methods by combining large-scale diffusion models with brain-guided image synthesis. Validating our method, we demonstrate the ability to synthesize preferred images with appropriate semantic specificity for well-characterized category-selective ROIs. We then show that BrainDiVE can characterize differences between ROIs selective for the same high-level category. Finally we identify novel functional subdivisions within these ROIs, validated with behavioral data. These results advance our understanding of the fine-grained functional organization of human visual cortex, and provide well-specified constraints for further examination of cortical organization using hypothesis-driven methods.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

LLmFPCA-detect: LLM-powered Multivariate Functional PCA for Anomaly Detection in Sparse Longitudinal Texts

Dubey, Prasanjit, Guha, Aritra, Zhou, Zhengyi, Wu, Qiong, Huo, Xiaoming, Dubey, Paromita

arXiv.org Machine LearningDec-17-2025

Sparse longitudinal (SL) textual data arises when individuals generate text repeatedly over time (e.g., customer reviews, occasional social media posts, electronic medical records across visits), but the frequency and timing of observations vary across individuals. These complex textual data sets have immense potential to inform future policy and targeted recommendations. However, because SL text data lack dedicated methods and are noisy, heterogeneous, and prone to anomalies, detecting and inferring key patterns is challenging. We introduce LLmFPCA-detect, a flexible framework that pairs LLM-based text embeddings with functional data analysis to detect clusters and infer anomalies in large SL text datasets. First, LLmFPCA-detect embeds each piece of text into an application-specific numeric space using LLM prompts. Sparse multivariate functional principal component analysis (mFPCA) conducted in the numeric space forms the workhorse to recover primary population characteristics, and produces subject-level scores which, together with baseline static covariates, facilitate data segmentation, unsupervised anomaly detection and inference, and enable other downstream tasks. In particular, we leverage LLMs to perform dynamic keyword profiling guided by the data segments and anomalies discovered by LLmFPCA-detect, and we show that cluster-specific functional PC scores from LLmFPCA-detect, used as features in existing pipelines, help boost prediction performance. We support the stability of LLmFPCA-detect with experiments and evaluate it on two different applications using public datasets, Amazon customer-review trajectories, and Wikipedia talk-page comment streams, demonstrating utility across domains and outperforming state-of-the-art baselines.

anomaly, llmfpca-detect, trajectory, (15 more...)

arXiv.org Machine Learning

2512.14604

Country:

North America > United States > California (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)

Genre: Research Report > Experimental Study (0.68)

Industry:

Information Technology > Services (0.66)
Health & Medicine > Health Care Technology > Medical Record (0.54)
Health & Medicine > Therapeutic Area (0.46)
Education > Educational Setting (0.45)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Explainable AI for Curie Temperature Prediction in Magnetic Materials

Ajaib, M. Adeel, Nasir, Fariha, Rehman, Abdul

arXiv.org Artificial IntelligenceNov-20-2025

Traditional approaches based on quantum mechanical computations or empirical models are often limited in scalability and accuracy. In recent years, machine learning (ML) has emerged as a promising alternative for property prediction across materials science domains [1-9]. Building on this momentum, several recent studies have proposed the use of ML models trained on curated magnetic datasets. In particular, the recent study [10] introduced the NE-MAD database, which aggregates experimentally measured magnetic transition temperatures and compositions. Similarly, the study by [11] utilized two of the largest available datasets of experimental Curie temperatures--comprising over 2,500 materials for training and more than 3,000 entries for validation--to compare machine learning strategies for predicting Curie temperature solely from chemical composition. Our work is inspired by these prior efforts and aims to improve the predictive accuracy and gain insights into model in-terpretability. We develop a pipeline that starts from the NE-MAD dataset, augments it with compositional and elemental features, and evaluates several ML models. A key contribution of our work is the integration of explainable AI (XAI) through SHAP (SHapley Additive exPlanations) analysis, which allows us to quantify how each input feature contributes to the model's prediction. Moreover, we benchmark our models on external datasets from literature to demonstrate generalization.

artificial intelligence, curie temperature, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2508.06996

Country: North America > United States (0.29)

Genre: Research Report (0.90)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)

Add feedback

When AI Does Science: Evaluating the Autonomous AI Scientist KOSMOS in Radiation Biology

Nusrat, Humza, Nusrat, Omar

arXiv.org Artificial IntelligenceNov-19-2025

Agentic AI "scientists" now use language models to search the literature, run analyses, and generate hypotheses. We evaluate KOSMOS, an autonomous AI scientist, on three problems in radiation biology using simple random-gene null benchmarks. Hypothesis 1: baseline DNA damage response (DDR) capacity across cell lines predicts the p53 transcriptional response after irradiation (GSE30240). Hypothesis 2: baseline expression of OGT and CDO1 predicts the strength of repressed and induced radiation-response modules in breast cancer cells (GSE59732). Hypothesis 3: a 12-gene expression signature predicts biochemical recurrence-free survival after prostate radiotherapy plus androgen deprivation therapy (GSE116918). The DDR-p53 hypothesis was not supported: DDR score and p53 response were weakly negatively correlated (Spearman rho = -0.40, p = 0.76), indistinguishable from random five-gene scores. OGT showed only a weak association (r = 0.23, p = 0.34), whereas CDO1 was a clear outlier (r = 0.70, empirical p = 0.0039). The 12-gene signature achieved a concordance index of 0.61 (p = 0.017) but a non-unique effect size. Overall, KOSMOS produced one well-supported discovery, one plausible but uncertain result, and one false hypothesis, illustrating that AI scientists can generate useful ideas but require rigorous auditing against appropriate null models.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.13825

Country:

North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Nuclear Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.68)

Add feedback

AttentiveGRUAE: An Attention-Based GRU Autoencoder for Temporal Clustering and Behavioral Characterization of Depression from Wearable Data

Soley, Nidhi, Patel, Vishal M, Taylor, Casey O

arXiv.org Artificial IntelligenceNov-17-2025

In this study, we present AttentiveGRUAE, a novel attention-based gated recurrent unit (GRU) autoencoder designed for temporal clustering and prediction of outcome from longitudinal wearable data. Our model jointly optimizes three objectives: (1) learning a compact latent representation of daily behavioral features via sequence reconstruction, (2) predicting end-of-period depression rate through a binary classification head, and (3) identifying behavioral subtypes through Gaussian Mixture Model (GMM) based soft clustering of learned embeddings. We evaluate AttentiveGRUAE on longitudinal sleep data from 372 participants (GLOBEM 2018-2019), and it demonstrates superior performance over baseline clustering, domain-aligned self-supervised, and ablated models in both clustering quality (silhouette score = 0.70 vs 0.32-0.70) and depression classification (AUC = 0.74 vs 0.50-0.67). Additionally, external validation on cross-year cohorts from 332 participants (GLOBEM 2020-2021) confirms cluster reproducibility (silhouette score = 0.63, AUC = 0.61) and stability. We further perform subtype analysis and visualize temporal attention, which highlights sleep-related differences between clusters and identifies salient time windows that align with changes in sleep regularity, yielding clinically interpretable explanations of risk.

artificial intelligence, depression, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.02558

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.85)

Add feedback

User Profiles of Sleep Disorder Sufferers: Towards Explainable Clustering and Differential Variable Analysis

Sellami, Sifeddine, Agoun, Juba, Yessad, Lamia, Bounia, Louenas

arXiv.org Artificial IntelligenceOct-21-2025

Sleep disorders have a major impact on patients' health and quality of life, but their diagnosis remains complex due to the diversity of symptoms. Today, technological advances, combined with medical data analysis, are opening new perspectives for a better understanding of these disorders. In particular, explainable artificial intelligence (XAI) aims to make AI model decisions understandable and interpretable for users. In this study, we propose a clustering-based method to group patients according to different sleep disorder profiles. By integrating an explainable approach, we identify the key factors influencing these pathologies. An experiment on anonymized real data illustrates the effectiveness and relevance of our approach.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.15986

Country: Europe > France (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.91)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.96)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.55)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.34)

Add feedback

MMM: Clustering Multivariate Longitudinal Mixed-type Data

Amato, Francesco, Jacques, Julien

arXiv.org Machine LearningSep-16-2025

Multivariate longitudinal data of mixed-type are increasingly collected in many science domains. However, algorithms to cluster this kind of data remain scarce, due to the challenge to simultaneously model the within- and between-time dependence structures for multivariate data of mixed kind. We introduce the Mixture of Mixed-Matrices (MMM) model: reorganizing the data in a three-way structure and assuming that the non-continuous variables are observations of underlying latent continuous variables, the model relies on a mixture of matrix-variate normal distributions to perform clustering in the latent dimension. The MMM model is thus able to handle continuous, ordinal, binary, nominal and count data and to concurrently model the heterogeneity, the association among the responses and the temporal dependence structure in a parsimonious way and without assuming conditional independence. The inference is carried out through an MCMC-EM algorithm, which is detailed. An evaluation of the model through synthetic data shows its inference abilities. A real-world application on financial data is presented.

algorithm, journal, longitudinal data, (17 more...)

arXiv.org Machine Learning

2509.12166

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry:

Information Technology (1.00)
Health & Medicine (1.00)
Banking & Finance > Trading (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.88)

Add feedback

Revealing the empirical flexibility of gas units through deep clustering

Bassini, Chiara Fusar, Xu, Alice Lixuan, Canales, Jorge Sánchez, Hirth, Lion, Kaack, Lynn H.

arXiv.org Artificial IntelligenceSep-5-2025

The flexibility of a power generation unit determines how quickly and often it can ramp up or down. In energy models, it depends on assumptions on the technical characteristics of the unit, such as its installed capacity or turbine technology. In this paper, we learn the empirical flexibility of gas units from their electricity generation, revealing how real-world limitations can lead to substantial differences between units with similar technical characteristics. Using a novel deep clustering approach, we transform 5 years (2019-2023) of unit-level hourly generation data for 49 German units from 100 MWp of installed capacity into low-dimensional embeddings. Our unsupervised approach identifies two clusters of peaker units (high flexibility) and two clusters of non-peaker units (low flexibility). The estimated ramp rates of non-peakers, which constitute half of the sample, display a low empirical flexibility, comparable to coal units. Non-peakers, predominantly owned by industry and municipal utilities, show limited response to low residual load and negative prices, generating on average 1.3 GWh during those hours. As the transition to renewables increases market variability, regulatory changes will be needed to unlock this flexibility potential.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2504.16943

Country: Europe > Germany (0.29)

Genre: Research Report > New Finding (0.94)

Industry:

Energy > Power Industry (1.00)
Government > Regional Government > Europe Government (0.46)
Materials > Metals & Mining > Coal (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

MS-ConTab: Multi-Scale Contrastive Learning of Mutation Signatures for Pan Cancer Representation and Stratification

Dou, Yifan, Khadre, Adam, Petreaca, Ruben C, Mirzaei, Golrokh

arXiv.org Artificial IntelligenceAug-29-2025

Motivation. Understanding the pan-cancer mutational landscape offers critical insights into the molecular mechanisms underlying tumorigenesis. While patient-level machine learning techniques have been widely employed to identify tumor subtypes, cohort-level clustering, where entire cancer types are grouped based on shared molecular features, has largely relied on classical statistical methods. Results. In this study, we introduce a novel unsupervised contrastive learning framework to cluster 43 cancer types based on coding mutation data derived from the COSMIC database. For each cancer type, we construct two complementary mutation signatures: a gene-level profile capturing nucleotide substitution patterns across the most frequently mutated genes, and a chromosome-level profile representing normalized substitution frequencies across chromosomes. These dual views are encoded using TabNet encoders and optimized via a multi-scale contrastive learning objective (NT-Xent loss) to learn unified cancer-type embeddings. We demonstrate that the resulting latent representations yield biologically meaningful clusters of cancer types, aligning with known mutational processes and tissue origins. Our work represents the first application of contrastive learning to cohort-level cancer clustering, offering a scalable and interpretable framework for mutation-driven cancer subtyping.

artificial intelligence, cancer type, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2508.19424

Country: North America > United States > Ohio (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Contextual Phenotyping of Pediatric Sepsis Cohort Using Large Language Models

Nagori, Aditya, Gautam, Ayush, Wiens, Matthew O., Nguyen, Vuong, Mugisha, Nathan Kenya, Kabakyenga, Jerome, Kissoon, Niranjan, Ansermino, John Mark, Kamaleswaran, Rishikesan

arXiv.org Artificial IntelligenceAug-5-2025

Clustering patient subgroups is essential for personalized care and efficient resource use. Traditional clustering methods struggle with high-dimensional, heterogeneous healthcare data and lack contextual understanding. This study evaluates Large Language Model (LLM) based clustering against classical methods using a pediatric sepsis dataset from a low-income country (LIC), containing 2,686 records with 28 numerical and 119 categorical variables. Patient records were serialized into text with and without a clustering objective. Embeddings were generated using quantized LLAMA 3.1 8B, DeepSeek-R1-Distill-Llama-8B with low-rank adaptation(LoRA), and Stella-En-400M-V5 models. K-means clustering was applied to these embeddings. Classical comparisons included K-Medoids clustering on UMAP and FAMD-reduced mixed data. Silhouette scores and statistical tests evaluated cluster quality and distinctiveness. Stella-En-400M-V5 achieved the highest Silhouette Score (0.86). LLAMA 3.1 8B with the clustering objective performed better with higher number of clusters, identifying subgroups with distinct nutritional, clinical, and socioeconomic profiles. LLM-based methods outperformed classical techniques by capturing richer context and prioritizing key features. These results highlight potential of LLMs for contextual phenotyping and informed decision-making in resource-limited settings.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.09805

Country:

North America > United States (0.46)
Africa > Uganda (0.29)
North America > Canada > British Columbia (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback