enrichment
Donated Christmas trees get a second life at the zoo
The evergreen trees give kangaroos, bison, lions, and more extra shelter and fun. Capybaras use donated Christmas trees as wind breaks to protect their habitats. Breakthroughs, discoveries, and DIY tips sent every weekday. The presents are unwrapped, the cookies are crumbs, and that real Christmas tree will become a fire hazard soon enough. Most of us haul it out to the curb for our local sanitation departments to take care of, but some lucky trees make it into the paws of animals living in zoos.
- North America > United States > New Jersey > Cape May County (0.05)
- Europe > United Kingdom > Scotland (0.05)
- Health & Medicine (0.99)
- Leisure & Entertainment > Zoo & Circus (0.32)
- North America > United States (0.14)
- Europe > United Kingdom > Irish Sea > East Irish Sea > Liverpool Bay (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
- Asia > China > Hong Kong (0.04)
- Health & Medicine (0.68)
- Transportation (0.46)
- Leisure & Entertainment (0.46)
A Multi-Evidence Framework Rescues Low-Power Prognostic Signals and Rejects Statistical Artifacts in Cancer Genomics
Motivation: Standard genome-wide association studies in cancer genomics rely on statistical significance with multiple testing correction, but systematically fail in underpowered cohorts. In TCGA breast cancer (n=967, 133 deaths), low event rates (13.8%) create severe power limitations, producing false negatives for known drivers and false positives for large passenger genes. Results: We developed a five-criteria computational framework integrating causal inference (inverse probability weighting, doubly robust estimation) with orthogonal biological validation (expression, mutation patterns, literature evidence). Applied to TCGA-BRCA mortality analysis, standard Cox+FDR detected zero genes at FDR<0.05, confirming complete failure in underpowered settings. Our framework correctly identified RYR2 -- a cardiac gene with no cancer function -- as a false positive despite nominal significance (p=0.024), while identifying KMT2C as a complex candidate requiring validation despite marginal significance (p=0.047, q=0.954). Power analysis revealed median power of 15.1% across genes, with KMT2C achieving only 29.8% power (HR=1.55), explaining borderline statistical significance despite strong biological evidence. The framework distinguished true signals from artifacts through mutation pattern analysis: RYR2 showed 29.8% silent mutations (passenger signature) with no hotspots, while KMT2C showed 6.7% silent mutations with 31.4% truncating variants (driver signature). This multi-evidence approach provides a template for analyzing underpowered cohorts, prioritizing biological interpretability over purely statistical significance. Availability: All code and analysis pipelines available at github.com/akarlaraytu/causal-inference-for-cancer-genomics
- Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- North America > United States > Massachusetts > Essex County > Lawrence (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting
Existing methods on visual knowledge extraction often rely on the predefined format (e.g., sub-verb-obj tuples) or vocabulary (e.g., relation types), restricting the expressiveness of the extracted knowledge. In this work, we take a first exploration to a new paradigm of open visual knowledge extraction.
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
- Asia > China > Hong Kong (0.04)
Rezwan: Leveraging Large Language Models for Comprehensive Hadith Text Processing: A 1.2M Corpus Development
Asgari-Bidhendi, Majid, Ghaseminia, Muhammad Amin, Shahbazi, Alireza, Hossayni, Sayyed Ali, Torabian, Najmeh, Minaei-Bidgoli, Behrouz
This paper presents the development of Rezwan, a large-scale AI-assisted Hadith corpus comprising over 1.2M narrations, extracted and structured through a fully automated pipeline. Building on digital repositories such as Maktabat Ahl al-Bayt, the pipeline employs Large Language Models (LLMs) for segmentation, chain--text separation, validation, and multi-layer enrichment. Each narration is enhanced with machine translation into twelve languages, intelligent diacritization, abstractive summarization, thematic tagging, and cross-text semantic analysis. This multi-step process transforms raw text into a richly annotated research-ready infrastructure for digital humanities and Islamic studies. A rigorous evaluation was conducted on 1,213 randomly sampled narrations, assessed by six domain experts. Results show near-human accuracy in structured tasks such as chain--text separation (9.33/10) and summarization (9.33/10), while highlighting ongoing challenges in diacritization and semantic similarity detection. Comparative analysis against the manually curated Noor Corpus demonstrates the superiority of Najm in both scale and quality, with a mean overall score of 8.46/10 versus 3.66/10. Furthermore, cost analysis confirms the economic feasibility of the AI approach: tasks requiring over 229,000 hours of expert labor were completed within months at a fraction of the cost. The work introduces a new paradigm in religious text processing by showing how AI can augment human expertise, enabling large-scale, multilingual, and semantically enriched access to Islamic heritage.
Combining Knowledge Graphs and NLP to Analyze Instant Messaging Data in Criminal Investigations
Pozzi, Riccardo, Barbera, Valentina, Principe, Renzo Alva, Giardini, Davide, Rubini, Riccardo, Palmonari, Matteo
Criminal investigations often involve the analysis of messages exchanged through instant messaging apps such as WhatsApp, which can be an extremely effort-consuming task. Our approach integrates knowledge graphs and NLP models to support this analysis by semantically enriching data collected from suspects' mobile phones, and help prosecutors and investigators search into the data and get valuable insights. Our semantic enrichment process involves extracting message data and modeling it using a knowledge graph, generating transcriptions of voice messages, and annotating the data using an end-to-end entity extraction approach. We adopt two different solutions to help users get insights into the data, one based on querying and visualizing the graph, and one based on semantic search. The proposed approach ensures that users can verify the information by accessing the original data. While we report about early results and prototypes developed in the context of an ongoing project, our proposal has undergone practical applications with real investigation data. As a consequence, we had the chance to interact closely with prosecutors, collecting positive feedback but also identifying interesting opportunities as well as promising research directions to share with the research community.
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Law (1.00)
- Information Technology (1.00)
Benchmarking Dimensionality Reduction Techniques for Spatial Transcriptomics
Mahmud, Md Ishtyaq, Kochat, Veena, Satpati, Suresh, Dwarampudi, Jagan Mohan Reddy, Rai, Kunal, Banerjee, Tania
We introduce a unified framework for evaluating dimensionality reduction techniques in spatial transcriptomics beyond standard PCA approaches. We benchmark six methods PCA, NMF, autoencoder, VAE, and two hybrid embeddings on a cholangiocarcinoma Xenium dataset, systematically varying latent dimensions ($k$=5-40) and clustering resolutions ($ρ$=0.1-1.2). Each configuration is evaluated using complementary metrics including reconstruction error, explained variance, cluster cohesion, and two novel biologically-motivated measures: Cluster Marker Coherence (CMC) and Marker Exclusion Rate (MER). Our results demonstrate distinct performance profiles: PCA provides a fast baseline, NMF maximizes marker enrichment, VAE balances reconstruction and interpretability, while autoencoders occupy a middle ground. We provide systematic hyperparameter selection using Pareto optimal analysis and demonstrate how MER-guided reassignment improves biological fidelity across all methods, with CMC scores improving by up to 12\% on average. This framework enables principled selection of dimensionality reduction methods tailored to specific spatial transcriptomics analyses.
- North America > United States > Texas > Harris County > Houston (0.14)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.05)
- Europe > Netherlands > South Holland > Leiden (0.05)
- (5 more...)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.48)
- Health & Medicine > Therapeutic Area > Oncology (0.34)
Automated Creation and Enrichment Framework for Improved Invocation of Enterprise APIs as Tools
Agarwal, Prerna, Gupta, Himanshu, Soni, Soujanya, Vallam, Rohith, Sindhgatta, Renuka, Mehta, Sameep
Recent advancements in Large Language Models (LLMs) has lead to the development of agents capable of complex reasoning and interaction with external tools. In enterprise contexts, the effective use of such tools that are often enabled by application programming interfaces (APIs), is hindered by poor documentation, complex input or output schema, and large number of operations. These challenges make tool selection difficult and reduce the accuracy of payload formation by up to 25%. We propose ACE, an automated tool creation and enrichment framework that transforms enterprise APIs into LLM-compatible tools. ACE, (i) generates enriched tool specifications with parameter descriptions and examples to improve selection and invocation accuracy, and (ii) incorporates a dynamic shortlisting mechanism that filters relevant tools at runtime, reducing prompt complexity while maintaining scalability. We validate our framework on both proprietary and open-source APIs and demonstrate its integration with agentic frameworks. To the best of our knowledge, ACE is the first end-to-end framework that automates the creation, enrichment, and dynamic selection of enterprise API tools for LLM agents.
- North America > United States (0.04)
- Europe > Norway > Norwegian Sea (0.04)
- Asia > Singapore (0.04)
- (2 more...)
Enabling Down Syndrome Research through a Knowledge Graph-Driven Analytical Framework
Krishnamurthy, Madan, Saha, Surya, Lo, Pierrette, Whetzel, Patricia L., Issabekova, Tursynay, Vargas, Jamed Ferreris, DiGiovanna, Jack, Haendel, Melissa A
Trisomy 21 results in Down syndrome, a multifaceted genetic disorder with diverse clinical phenotypes, including heart defects, immune dysfunction, neurodevelopmental differences, and early-onset dementia risk. Heterogeneity and fragmented data across studies challenge comprehensive research and translational discovery. The NIH INCLUDE (INvestigation of Co-occurring conditions across the Lifespan to Understand Down syndromE) initiative has assembled harmonized participant-level datasets, yet realizing their potential requires integrative analytical frameworks. We developed a knowledge graph-driven platform transforming nine INCLUDE studies, comprising 7,148 participants, 456 conditions, 501 phenotypes, and over 37,000 biospecimens, into a unified semantic infrastructure. Cross-resource enrichment with Monarch Initiative data expands coverage to 4,281 genes and 7,077 variants. The resulting knowledge graph contains over 1.6 million semantic associations, enabling AI-ready analysis with graph embeddings and path-based reasoning for hypothesis generation. Researchers can query the graph via SPARQL or natural language interfaces. This framework converts static data repositories into dynamic discovery environments, supporting cross-study pattern recognition, predictive modeling, and systematic exploration of genotype-phenotype relationships in Down syndrome.
- North America > United States > Tennessee > Davidson County > Nashville (0.04)
- North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Therapeutic Area > Genetic Disease (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Generation of structure-guided pMHC-I libraries using Diffusion Models
Mares, Sergio, Weinberger, Ariel Espinoza, Ioannidis, Nilah M.
Personalized vaccines and T-cell immunotherapies depend critically on identifying peptide-MHC class I (pMHC-I) interactions capable of eliciting potent immune responses. However, current benchmarks and models inherit biases present in mass-spectrometry and binding-assay datasets, limiting discovery of novel peptide ligands. To address this issue, we introduce a structure-guided benchmark of pMHC-I peptides designed using diffusion models conditioned on crystal structure interaction distances. Spanning twenty high-priority HLA alleles, this benchmark is independent of previously characterized peptides yet reproduces canonical anchor residue preferences, indicating structural generalization without experimental dataset bias. Using this resource, we demonstrate that state-of-the-art sequence-based predictors perform poorly at recognizing the binding potential of these structurally stable designs, indicating allele-specific limitations invisible in conventional evaluations. Our geometry-aware design pipeline yields peptides with high predicted structural integrity and higher residue diversity than existing datasets, representing a key resource for unbiased model training and evaluation. Our code, and data are available at: https://github.com/sermare/struct-mhc-dev.
- North America > United States > California > Alameda County > Berkeley (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)