Goto

Collaborating Authors

 training report


Knowledge Elicitation with Large Language Models for Interpretable Cancer Stage Identification from Pathology Reports

Lee, Yeawon, Yang, Christopher C., Chang, Chia-Hsuan, Lu-Yao, Grace

arXiv.org Artificial Intelligence

Cancer staging is critical for patient prognosis and treatment planning, yet extracting pathologic TNM staging from unstructured pathology reports poses a persistent challenge. Existing natural language processing (NLP) and machine learning (ML) strategies often depend on large annotated datasets, limiting their scalability and adaptability. In this study, we introduce two Knowledge Elicitation methods designed to overcome these limitations by enabling large language models (LLMs) to induce and apply domain-specific rules for cancer staging. The first, Knowledge Elicitation with Long-Term Memory (KEwLTM), uses an iterative prompting strategy to derive staging rules directly from unannotated pathology reports, without requiring ground-truth labels. The second, Knowledge Elicitation with Retrieval-Augmented Generation (KEwRAG), employs a variation of RAG where rules are pre-extracted from relevant guidelines in a single step and then applied, enhancing interpretability and avoiding repeated retrieval overhead. We leverage the ability of LLMs to apply broad knowledge learned during pre-training to new tasks. Using breast cancer pathology reports from the TCGA dataset, we evaluate their performance in identifying T and N stages, comparing them against various baseline approaches on two open-source LLMs. Our results indicate that KEwLTM outperforms KEwRAG when Zero-Shot Chain-of-Thought (ZSCOT) inference is effective, whereas KEwRAG achieves better performance when ZSCOT inference is less effective. Both methods offer transparent, interpretable interfaces by making the induced rules explicit. These findings highlight the promise of our Knowledge Elicitation methods as scalable, high-performing solutions for automated cancer staging with enhanced interpretability, particularly in clinical settings with limited annotated data.


Utilizing XGBoost training reports to improve your models

#artificialintelligence

In 2019, AWS unveiled Amazon SageMaker Debugger, a SageMaker capability that enables you to automatically detect a variety of issues that may arise while a model is being trained. SageMaker Debugger captures model state data at specified intervals during a training job. With this data, SageMaker Debugger can detect training issues or anomalies by leveraging built-in or user-defined rules. In addition to detecting issues during the training job, you can analyze the captured state data afterwards to evaluate model performance and identify areas for improvement. This task is made easier with the newly launched XGBoost training report feature.


Automated Machine Learning in Power BI - Visual BI Solutions

#artificialintelligence

In the last few years, Artificial Intelligence and Machine Learning have seen an unprecedented rise in popularity across industries and areas of scientific research. Businesses are looking for ways to integrate these new technologies into their operations. However, the shortage of qualified data scientist and machine learning experts has been one of the challenges which thwart the adoption of AI. But a growing number of tools are bringing these capabilities into the hands of developers, citizen data scientists, domain experts and business users. In this blog, we will delve into Automated Machine Learning (AutoML) for Data Flows – a new capability in Power BI which enables business users to experience machine learning models without having to learn how to program or extensive knowledge of mathematics and statistics. AutoML for dataflows allows users to create Machine Learning models with few simple clicks and generates model summary reports.