AITopics

2506.04696

Country: Asia > Bangladesh (1.00)

Genre: Research Report (1.00)

Industry:

Food & Agriculture > Agriculture (0.68)
Energy (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.50)

AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance

Patel, Dhaval, Lin, Shuxin, Rayfield, James, Zhou, Nianjun, Vaculin, Roman, Martinez, Natalia, O'donncha, Fearghal, Kalagnanam, Jayant

AI for Industrial Asset Lifecycle Management aims to automate complex operational workflows -- such as condition monitoring, maintenance planning, and intervention scheduling -- to reduce human workload and minimize system downtime. Traditional AI/ML approaches have primarily tackled these problems in isolation, solving narrow tasks within the broader operational pipeline. In contrast, the emergence of AI agents and large language models (LLMs) introduces a next-generation opportunity: enabling end-to-end automation across the entire asset lifecycle. This paper envisions a future where AI agents autonomously manage tasks that previously required distinct expertise and manual coordination. To this end, we introduce AssetOpsBench -- a unified framework and environment designed to guide the development, orchestration, and evaluation of domain-specific agents tailored for Industry 4.0 applications. We outline the key requirements for such holistic systems and provide actionable insights into building agents that integrate perception, reasoning, and control for real-world industrial operations. The software is available at https://github.com/IBM/AssetOpsBench.

large language model, machine learning, natural language, (20 more...)

2506.03828

Country:

North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
Europe > Ireland (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)

Genre:

Research Report > New Finding (0.92)
Workflow (0.88)

Industry: Energy (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

QQSUM: A Novel Task and Model of Quantitative Query-Focused Summarization for Review-based Product Question Answering

Tang, An Quang, Zhang, Xiuzhen, Dinh, Minh Ngoc, Li, Zhuang

Review-based Product Question Answering (PQA) allows e-commerce platforms to automatically address customer queries by leveraging insights from user reviews. However, existing PQA systems generate answers with only a single perspective, failing to capture the diversity of customer opinions. In this paper we introduce a novel task Quantitative Query-Focused Summarization (QQSUM), which aims to summarize diverse customer opinions into representative Key Points (KPs) and quantify their prevalence to effectively answer user queries. While Retrieval-Augmented Generation (RAG) shows promise for PQA, its generated answers still fall short of capturing the full diversity of viewpoints. To tackle this challenge, our model QQSUM-RAG, which extends RAG, employs few-shot learning to jointly train a KP-oriented retriever and a KP summary generator, enabling KP-based summaries that capture diverse and representative opinions. Experimental results demonstrate that QQSUM-RAG achieves superior performance compared to state-of-the-art RAG baselines in both textual quality and quantification accuracy of opinions. Our source code is available at: https://github.com/antangrocket1312/QQSUMM

computational linguistic, large language model, machine learning, (21 more...)

2506.0402

Country:

North America > Canada > Ontario > Toronto (0.04)
North America > Dominican Republic (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(7 more...)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

WeiQuan, Wang, Mian, Riaz-ul-Haque

Optimizing FPGA and Wafer Test Coverage with Spatial Sampling and Machine Learning

In semiconductor manufacturing, testing costs remain significantly high, especially during wafer and FPGA testing. To reduce the number of required tests while maintaining predictive accuracy, this study investigates three baseline sampling strategies: Random Sampling, Stratified Sampling, and k-means Clustering Sampling. To further enhance these methods, this study proposes a novel algorithm that improves the sampling quality of each approach. This research is conducted using real industrial production data from wafer-level tests and silicon measurements from various FPGAs. This study introduces two hybrid strategies: Stratified with Short Distance Elimination (S-SDE) and k-means with Short Distance Elimination (K-SDE). Their performance is evaluated within the framework of Gaussian Process Regression (GPR) for predicting wafer and FPGA test data. At the core of our proposed approach is the Short Distance Elimination (SDE) algorithm, which excludes spatially proximate candidate points during sampling, thereby ensuring a more uniform distribution of training data across the physical domain. A parameter sweep was conducted over the (alpha, beta) thresholds, where alpha and beta are in the range {0, 1, 2, 3, 4} and not both zero, to identify the optimal combination that minimizes RMSD. Experimental results on a randomly selected wafer file reveal that (alpha, beta) equal (2, 2) yields the lowest RMSD. Accordingly, all subsequent experiments adopt this parameter configuration. The results demonstrate that the proposed SDE-based strategies enhance predictive accuracy: K-SDE improves upon k-means sampling by 16.26 percent (wafer) and 13.07 percent (FPGA), while S-SDE improves upon stratified sampling by 16.49 percent (wafer) and 8.84 percent (FPGA).

artificial intelligence, machine learning, short distance elimination, (15 more...)

2506.03556

Country:

Asia > Japan > Honshū > Chūgoku > Shimane Prefecture > Matsue (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report > New Finding (0.49)

Industry: Semiconductors & Electronics (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.36)

Huong, Vu Thi, Litzel, Ida, Koch, Thorsten

Similarity-based fuzzy clustering scientific articles: potentials and challenges from mathematical and computational perspectives

Fuzzy clustering, which allows an article to belong to multiple clusters with soft membership degrees, plays a vital role in analyzing publication data. This problem can be formulated as a constrained optimization model, where the goal is to minimize the discrepancy between the similarity observed from data and the similarity derived from a predicted distribution. While this approach benefits from leveraging state-of-the-art optimization algorithms, tailoring them to work with real, massive databases like OpenAlex or Web of Science - containing about 70 million articles and a billion citations - poses significant challenges. We analyze potentials and challenges of the approach from both mathematical and computational perspectives. Among other things, second-order optimality conditions are established, providing new theoretical insights, and practical solution methods are proposed by exploiting the structure of the problem. Specifically, we accelerate the gradient projection method using GPU-based parallel computing to efficiently handle large-scale data.

algorithm, artificial intelligence, machine learning, (18 more...)

2506.04045

Country: Europe > Germany (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

A Foundation Model for Spatial Proteomics

Shaban, Muhammad, Chang, Yuzhou, Qiu, Huaying, Yeo, Yao Yu, Song, Andrew H., Jaume, Guillaume, Wang, Yuchen, Weishaupt, Luca L., Ding, Tong, Vaidya, Anurag, Lamane, Abdallah, Shao, Daniel, Zidane, Mohammed, Bai, Yunhao, McCallum, Paige, Luo, Shuli, Wu, Wenrui, Wang, Yang, Cramer, Precious, Chan, Chi Ngai, Stephan, Pierre, Schaffenrath, Johanna, Lee, Jia Le, Michel, Hendrik A., Tian, Caiwei, Almagro-Perez, Cristina, Wagner, Sophia J., Sahai, Sharifa, Lu, Ming Y., Chen, Richard J., Zhang, Andrew, Gonzales, Mark Edward M., Makky, Ahmad, Lee, Jia-Ying Joey, Cheng, Hao, Ahmar, Nourhan El, Matar, Sayed, Haist, Maximilian, Phillips, Darci, Tan, Yuqi, Nolan, Garry P., Burack, W. Richard, Estes, Jacob D., Liu, Jonathan T. C., Choueiri, Toni K, Agarwal, Neeraj, Barry, Marc, Rodig, Scott J., Le, Long Phi, Gerber, Georg, Schürch, Christian M., Theis, Fabian J., Kim, Youn H, Yeong, Joe, Signoretti, Sabina, Howitt, Brooke E., Loo, Lit-Hsin, Ma, Qin, Jiang, Sizun, Mahmood, Faisal

Foundation models have begun to transform image analysis by acting as pretrained generalist backbones that can be adapted to many tasks even when post-training data are limited, yet their impact on spatial proteomics, imaging that maps proteins at single-cell resolution, remains limited. Here, we introduce KRONOS, a foundation model built for spatial proteomics. KRONOS was trained in a self-supervised manner on over 47 million image patches covering 175 protein markers, 16 tissue types, and 8 fluorescence-based imaging platforms. We introduce key architectural adaptations to address the high-dimensional, multi-channel, and heterogeneous nature of multiplex imaging. We demonstrate that KRONOS learns biologically meaningful representations across multiple scales, ranging from cellular and microenvironment to tissue levels, enabling it to address diverse downstream tasks, including cell phenotyping, region classification, and patient stratification. Evaluated across 11 independent cohorts, KRONOS achieves state-of-the-art performance across cell phenotyping, treatment response prediction, and retrieval tasks, and is highly data-efficient. KRONOS also introduces the paradigm of segmentation-free patch-level processing for efficient and scalable spatial proteomics analysis, allowing cross-institutional comparisons, and as an image reverse search engine for spatial patterns.

bioinformatics, machine learning, natural language, (21 more...)

2506.03373

Country:

North America > United States > Massachusetts (0.28)
North America > Canada > Quebec (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology > Lymphoma (0.48)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Biomedical Informatics > Translational Bioinformatics (1.00)
(5 more...)

ARIA: Training Language Agents with Intention-Driven Reward Aggregation

Yang, Ruihan, Zhang, Yikai, Chen, Aili, Wang, Xintao, Yuan, Siyu, Chen, Jiangjie, Yang, Deqing, Xiao, Yanghua

Large language models (LLMs) have enabled agents to perform complex reasoning and decision-making through free-form language interactions. However, in open-ended language action environments (e.g., negotiation or question-asking games), the action space can be formulated as a joint distribution over tokens, resulting in an exponentially large action space. Sampling actions in such a space can lead to extreme reward sparsity, which brings large reward variance, hindering effective reinforcement learning (RL). To address this, we propose ARIA, a method that Aggregates Rewards in Intention space to enable efficient and effective language Agents training. ARIA aims to project natural language actions from the high-dimensional joint token distribution space into a low-dimensional intention space, where semantically similar actions are clustered and assigned shared rewards. This intention-aware reward aggregation reduces reward variance by densifying reward signals, fostering better policy optimization. Extensive experiments demonstrate that ARIA not only significantly reduces policy gradient variance, but also delivers substantial performance gains of an average of 9.95% across four downstream tasks, consistently outperforming offline and online RL baselines.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

2506.00539

Country: Asia (0.46)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Rohr, Edward H., Nardini, John T.

A novel sensitivity analysis method for agent-based models stratifies in-silico tumor spheroid simulations

arXiv.org Machine LearningJun-4-2025

Agent-based models (ABMs) are widely used in biology to understand how individual actions scale into emergent population behavior. Modelers employ sensitivity analysis (SA) algorithms to quantify input parameters' impact on model outputs, however, it is hard to perform SA for ABMs due to their computational and complex nature. In this work, we develop the Simulate, Summarize, Reduce, Cluster, and Analyze (SSRCA) methodology, a machine-learning based pipeline designed to facilitate SA for ABMs. In particular, SSRCA can achieve the following tasks for ABMS: 1) identify sensitive model parameters, 2) reveal common output model patterns, and 3) determine which input parameter values generate these patterns. We use an example ABM of tumor spheroid growth to showcase how SSRCA provides similar SA results to the popular Sobol' Method while also identifying four common patterns from the ABM and the parameter regions that generate these outputs. This analysis could streamline data-driven tasks, such as parameter estimation, for ABMs by reducing parameter space. While we highlight these results with an ABM on tumor spheroid formation, the SSRCA methodology is broadly applicable to biological ABMs.

artificial intelligence, machine learning, simulation, (17 more...)

arXiv.org Machine Learning

2506.00168

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > New Jersey > Mercer County > Ewing (0.14)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Energy (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Krishnamurthy, Madan, Korn, Daniel, Haendel, Melissa A, Mungall, Christopher J, Thessen, Anne E

A Dynamic Framework for Semantic Grouping of Common Data Elements (CDE) Using Embeddings and Clustering

arXiv.org Artificial IntelligenceJun-4-2025

This research aims to develop a dynamic and scalable framework to facilitate harmonization of Common Data Elements (CDEs) across heterogeneous biomedical datasets by addressing challenges such as semantic heterogeneity, structural variability, and context dependence to streamline integration, enhance interoperability, and accelerate scientific discovery. Our methodology leverages Large Language Models (LLMs) for context-aware text embeddings that convert CDEs into dense vectors capturing semantic relationships and patterns. These embeddings are clustered using Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) to group semantically similar CDEs. The framework incorporates four key steps: (1) LLM-based text embedding to mathematically represent semantic context, (2) unsupervised clustering of embeddings via HDBSCAN, (3) automated labeling using LLM summarization, and (4) supervised learning to train a classifier assigning new or unclustered CDEs to labeled clusters. Evaluated on the NIH NLM CDE Repository with over 24,000 CDEs, the system identified 118 meaningful clusters at an optimized minimum cluster size of 20. The classifier achieved 90.46 percent overall accuracy, performing best in larger categories. External validation against Gravity Projects Social Determinants of Health domains showed strong agreement (Adjusted Rand Index 0.52, Normalized Mutual Information 0.78), indicating that embeddings effectively capture cluster characteristics. This adaptable and scalable approach offers a practical solution to CDE harmonization, improving selection efficiency and supporting ongoing data interoperability.

cde, large language model, machine learning, (18 more...)

2506.0216

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.68)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Tsai, Yun-Cheng, Liu, Yen-Ku, Chen, Samuel Yen-Chi

Enhancing Interpretability of Quantum-Assisted Blockchain Clustering via AI Agent-Based Qualitative Analysis

arXiv.org Artificial IntelligenceJun-4-2025

Blockchain transaction data is inherently high dimensional, noisy, and entangled, posing substantial challenges for traditional clustering algorithms. While quantum enhanced clustering models have demonstrated promising performance gains, their interpretability remains limited, restricting their application in sensitive domains such as financial fraud detection and blockchain governance. To address this gap, we propose a two stage analysis framework that synergistically combines quantitative clustering evaluation with AI Agent assisted qualitative interpretation. In the first stage, we employ classical clustering methods and evaluation metrics including the Silhouette Score, Davies Bouldin Index, and Calinski Harabasz Index to determine the optimal cluster count and baseline partition quality. In the second stage, we integrate an AI Agent to generate human readable, semantic explanations of clustering results, identifying intra cluster characteristics and inter cluster relationships. Our experiments reveal that while fully trained Quantum Neural Networks (QNN) outperform random Quantum Features (QF) in quantitative metrics, the AI Agent further uncovers nuanced differences between these methods, notably exposing the singleton cluster phenomenon in QNN driven models. The consolidated insights from both stages consistently endorse the three cluster configuration, demonstrating the practical value of our hybrid approach. This work advances the interpretability frontier in quantum assisted blockchain analytics and lays the groundwork for future autonomous AI orchestrated clustering frameworks.

artificial intelligence, machine learning, quantum feature, (14 more...)

2506.02068

Country: North America > United States (0.49)

Genre: Research Report > New Finding (1.00)

Industry:

Banking & Finance (0.46)
Law Enforcement & Public Safety > Fraud (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.86)