subtype
What if the idea of the autism spectrum is completely wrong?
What if the idea of the autism spectrum is completely wrong? For years, we've thought of autism as lying on a spectrum, but emerging evidence suggests that it comes in several distinct types. These three words have become synonymous with autism, yet behind them lies a common misunderstanding. The idea of "the spectrum" suggests that all autistic people share similar experiences and behave in similar ways - only to a greater or lesser extent. The reality couldn't be further from the truth. Some autistic people may not speak at all; others are hyperverbal and extremely fluent.
- Europe > United Kingdom (0.14)
- North America > United States > New York (0.05)
- North America > United States > Maryland (0.04)
- North America > United States > California (0.04)
Structure Learning with Side Information: Sample Complexity
The vertices represent the RVs, and the edges signify the conditional dependencies among the RVs. Structure learning is the process of inferring the edges by observing realizations of the RVs, and it has applications in a wide range of technological, social, and biological networks. Learning the structure of graphs when the vertices are treated in isolation from inferential information known about them is well-investigated. In a wide range of domains, however, often there exist additional inferred knowledge about the structure, which can serve as valuable side information. For instance, the gene networks that represent different subtypes of the same cancer share similar edges across all subtypes and also have exclusive edges corresponding to each subtype, rendering partially similar graphical models for gene expression in different cancer subtypes.
GUMBridge: a Corpus for Varieties of Bridging Anaphora
Bridging is an anaphoric phenomenon where the referent of an entity in a discourse is dependent on a previous, non-identical entity for interpretation, such as in "There is 'a house'. 'The door' is red," where the door is specifically understood to be the door of the aforementioned house. While there are several existing resources in English for bridging anaphora, most are small, provide limited coverage of the phenomenon, and/or provide limited genre coverage. In this paper, we introduce GUMBridge, a new resource for bridging, which includes 16 diverse genres of English, providing both broad coverage for the phenomenon and granular annotations for the subtype categorization of bridging varieties. We also present an evaluation of annotation quality and report on baseline performance using open and closed source contemporary LLMs on three tasks underlying our data, showing that bridging resolution and subtype classification remain difficult NLP tasks in the age of LLMs.
- Europe > Austria > Vienna (0.14)
- North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- (8 more...)
Mitigating the Antigenic Data Bottleneck: Semi-supervised Learning with Protein Language Models for Influenza A Surveillance
Influenza A viruses (IAVs) evolve antigenically at a pace that requires frequent vaccine updates, yet the haemagglutination inhibition (HI) assays used to quantify antigenicity are labor-intensive and unscalable. As a result, genomic data vastly outpace available phenotypic labels, limiting the effectiveness of traditional supervised models. We hypothesize that combining pre-trained Protein Language Models (PLMs) with Semi-Supervised Learning (SSL) can retain high predictive accuracy even when labeled data are scarce. We evaluated two SSL strategies, Self-training and Label Spreading, against fully supervised baselines using four PLM-derived embeddings (ESM-2, ProtVec, ProtT5, ProtBert) applied to haemagglutinin (HA) sequences. A nested cross-validation framework simulated low-label regimes (25%, 50%, 75%, and 100% label availability) across four IAV subtypes (H1N1, H3N2, H5N1, H9N2). SSL consistently improved performance under label scarcity. Self-training with ProtVec produced the largest relative gains, showing that SSL can compensate for lower-resolution representations. ESM-2 remained highly robust, achieving F1 scores above 0.82 with only 25% labeled data, indicating that its embeddings capture key antigenic determinants. While H1N1 and H9N2 were predicted with high accuracy, the hypervariable H3N2 subtype remained challenging, although SSL mitigated the performance decline. These findings demonstrate that integrating PLMs with SSL can address the antigenicity labeling bottleneck and enable more effective use of unlabeled surveillance sequences, supporting rapid variant prioritization and timely vaccine strain selection.
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.74)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Bayesian Event-Based Model for Disease Subtype and Stage Inference
Hao, Hongtao, Austerweil, Joseph L.
Chronic diseases often progress differently across patients. Rather than randomly varying, there are typically a small number of subtypes for how a disease progresses across patients. To capture this structured heterogeneity, the Subtype and Stage Inference Event-Based Model (SuStaIn) estimates the number of subtypes, the order of disease progression for each subtype, and assigns each patient to a subtype from primarily cross-sectional data. It has been widely applied to uncover the subtypes of many diseases and inform our understanding of them. But how robust is its performance? In this paper, we develop a principled Bayesian subtype variant of the event-based model (BEBMS) and compare its performance to SuStaIn in a variety of synthetic data experiments with varied levels of model misspecification. BEBMS substantially outperforms SuStaIn across ordering, staging, and subtype assignment tasks. Further, we apply BEBMS and SuStaIn to a real-world Alzheimer's data set. We find BEBMS has results that are more consistent with the scientific consensus of Alzheimer's disease progression than SuStaIn.
- North America > United States > California (0.28)
- North America > United States > Wisconsin > Dane County > Madison (0.05)
- Asia > Japan (0.04)
- North America > Canada (0.04)
PathReasoning: A multimodal reasoning agent for query-based ROI navigation on whole-slide images
Zhang, Kunpeng, Xu, Hanwen, Wang, Sheng
Deciphering tumor microenvironment from Whole Slide Images (WSIs) is intriguing as it is key to cancer diagnosis, prognosis and treatment response. While these gigapixel images on one hand offer a comprehensive portrait of cancer, on the other hand, the extremely large size, as much as more than 10 billion pixels, make it challenging and time-consuming to navigate to corresponding regions to support diverse clinical inspection. Inspired by pathologists who conducted navigation on WSIs with a combination of sampling, reasoning and self-reflection, we proposed "PathReasoning", a multi-modal reasoning agent that iteratively navigates across WSIs through multiple rounds of reasoning and refinements. Specifically, starting with randomly sampled candidate regions, PathReasoning reviews current selections with self-reflection, reasoning over the correspondence between visual observations and clinical questions, and concludes by proposing new regions to explore. Across rounds, PathReasoning builds a reasoning chain that gradually directs attention to diagnostically relevant areas. PathReasoning turns each whole slide into a sequence of question-guided views, allowing the model to efficiently find informative ROIs within a fixed number of steps, without the need for dense pixel-level annotations. PathReasoning can substantially outperform strong ROI-selection approaches by 6.7% and 3.1% of AUROC on subtyping and longitudinal analysis tasks. The high-quality ROIs further support accurate report generation on breast cancer, significantly outperforming the standard GPT-4o by 10% in accuracy. PathReasoning prioritizes question-specific regions and constructs interpretable reasoning chains, supporting efficient slide review, consistent diagnostic interpretations, comprehensive reporting, and evidence traceability in digital pathology.
- North America > United States > Washington > King County > Seattle (0.14)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- Europe > Norway > Norwegian Sea (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
- Health & Medicine > Therapeutic Area > Oncology > Carcinoma (1.00)
- Health & Medicine > Therapeutic Area > Dermatology (1.00)
- Health & Medicine > Diagnostic Medicine (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Cognitive bias in LLM reasoning compromises interpretation of clinical oncology notes
Kenaston, Matthew W., Ayub, Umair, Parmar, Mihir, Anjum, Muhammad Umair, Naqvi, Syed Arsalan Ahmed, Kumar, Priya, Rawal, Samarth, Chaudhuri, Aadel A., Zakharia, Yousef, Heath, Elizabeth I., Bekaii-Saab, Tanios S., Tao, Cui, Van Allen, Eliezer M., Zhou, Ben, Choi, YooJung, Baral, Chitta, Riaz, Irbaz Bin
Despite high performance on clinical benchmarks, large language models may reach correct conclusions through faulty reasoning, a failure mode with safety implications for oncology decision support that is not captured by accuracy-based evaluation. In this two-cohort retrospective study, we developed a hierarchical taxonomy of reasoning errors from GPT-4 chain-of-thought responses to real oncology notes and tested its clinical relevance. Using breast and pancreatic cancer notes from the CORAL dataset, we annotated 600 reasoning traces to define a three-tier taxonomy mapping computational failures to cognitive bias frameworks. We validated the taxonomy on 822 responses from prostate cancer consult notes spanning localized through metastatic disease, simulating extraction, analysis, and clinical recommendation tasks. Reasoning errors occurred in 23 percent of interpretations and dominated overall errors, with confirmation bias and anchoring bias most common. Reasoning failures were associated with guideline-discordant and potentially harmful recommendations, particularly in advanced disease management. Automated evaluators using state-of-the-art language models detected error presence but could not reliably classify subtypes. These findings show that large language models may provide fluent but clinically unsafe recommendations when reasoning is flawed. The taxonomy provides a generalizable framework for evaluating and improving reasoning fidelity before clinical deployment.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Minnesota > Olmsted County > Rochester (0.04)
- North America > United States > Arizona > Maricopa County > Phoenix (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- North America > Canada > Quebec > Montreal (0.04)
Hierarchical Generalized Category Discovery for Brain Tumor Classification in Digital Pathology
Perkonigg, Matthias, Rockenschaub, Patrick, Göbel, Georg, Wöhrer, Adelheid
Accurate brain tumor classification is critical for intra-operative decision making in neuro-oncological surgery. However, existing approaches are restricted to a fixed set of predefined classes and are therefore unable to capture patterns of tumor types not available during training. Unsupervised learning can extract general-purpose features, but it lacks the ability to incorporate prior knowledge from labelled data, and semi-supervised methods often assume that all potential classes are represented in the labelled data. Generalized Category Discovery (GCD) aims to bridge this gap by categorizing both known and unknown classes within unlabelled data. To reflect the hierarchical structure of brain tumor taxonomies, in this work, we introduce Hierarchical Generalized Category Discovery for Brain Tumor Classification (HGCD-BT), a novel approach that integrates hierarchical clustering with contrastive learning. Our method extends contrastive learning based GCD by incorporating a novel semi-supervised hierarchical clustering loss. We evaluate HGCD-BT on OpenSRH, a dataset of stimulated Raman histology brain tumor images, achieving a +28% improvement in accuracy over state-of-the-art GCD methods for patch-level classification, particularly in identifying previously unseen tumor categories. Furthermore, we demonstrate the generalizability of HGCD-BT on slide-level classification of hematoxylin and eosin stained whole-slide images from the Digital Brain Tumor Atlas, confirming its utility across imaging modalities.
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Europe > Austria > Tyrol > Innsbruck (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Europe > Switzerland (0.04)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Diagnostic Medicine (1.00)
Rare Genomic Subtype Discovery from RNA-seq via Autoencoder Embeddings and Stability-Aware Clustering
Unsupervised learning on high-dimensional RNA-seq data can reveal molecular subtypes beyond standard labels. We combine an autoencoder-based representation with clustering and stability analysis to search for rare but reproducible genomic subtypes. On the UCI "Gene Expression Cancer RNA-Seq" dataset (801 samples, 20,531 genes; BRCA, COAD, KIRC, LUAD, PRAD), a pan-cancer analysis shows clusters aligning almost perfectly with tissue of origin (Cramer's V = 0.887), serving as a negative control. We therefore reframe the problem within KIRC (n = 146): we select the top 2,000 highly variable genes, standardize them, train a feed-forward autoencoder (128-dimensional latent space), and run k-means for k = 2-10. While global indices favor small k, scanning k with a pre-specified discovery rule (rare < 10 percent and stable with Jaccard >= 0.60 across 20 seeds after Hungarian alignment) yields a simple solution at k = 5 (silhouette = 0.129, DBI = 2.045) with a rare cluster C0 (6.85 percent of patients) that is highly stable (Jaccard = 0.787). Cluster-vs-rest differential expression (Welch's t-test, Benjamini-Hochberg FDR) identifies coherent markers. Overall, pan-cancer clustering is dominated by tissue of origin, whereas a stability-aware within-cancer approach reveals a rare, reproducible KIRC subtype.
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)