Goto

Collaborating Authors

 Meier, Christoph


TEDDY: A Family Of Foundation Models For Understanding Single Cell Biology

arXiv.org Artificial Intelligence

The complexity of cell biology and the mechanisms of disease pathogenesis are driven by an intricate regulatory network of genes [Chatterjee and Ahituv, 2017, Theodoris et al., 2015, 2021]. A better resolution of this complex interactome network would enhance our ability to design drugs that target the causal mechanism of the disease rather than interventions that aim to modulate the downstream effects [Ding et al., 2022]. However, accurate inference of gene regulatory networks is challenging. The possible space for genetic interactions is vast [Bunne et al., 2024], the networks to be inferred are highly context-dependent, different cell types and tissue types exhibit different regulatory networks and exhibit significant variations across donors [Chen and Dahl, 2024]. Moreover, the data required to study gene regulatory networks for a specific disease is usually limited and highly specialized, often plagued by experimental artifacts [Hicks et al., 2018]. However, a confluence of recent technological progress promises to make this challenging problem more tractable. The advent of accurate single-cell sequencing technologies that remove the artifacts of bulk cell data, better reflect natural variability, and provide signals at higher resolutions. This, along with the increasing availability of atlas-scale scRNAseq datasets that span an extensive range of diseases, cell types, tissue types, and donors provide an unprecedented opportunity for studying disease mechanisms at scale.


From RAGs to riches: Using large language models to write documents for clinical trials

arXiv.org Artificial Intelligence

Clinical trials require numerous documents to be written -- protocols, consent forms, clinical study reports and others. Large language models (LLMs) offer the potential to rapidly generate first versions of these documents, however there are concerns about the quality of their output. Here we report an evaluation of LLMs in generating parts of one such document, clinical trial protocols. We find that an offthe-shelf LLM delivers reasonable results, especially when assessing content relevance and the correct use of terminology. However, deficiencies remain: specifically clinical thinking and logic, and appropriate use of references. To improve performance, we used retrieval-augmented generation (RAG) to prompt an LLM with accurate up-to-date information. As a result of using RAG, the writing quality of the LLM improves substantially, which has implications for the practical useability of LLMs in clinical trial-related writing.