Goto

Collaborating Authors

 Liu, Jinze


Deciphering genomic codes using advanced NLP techniques: a scoping review

arXiv.org Artificial Intelligence

Objectives: The vast and complex nature of human genomic sequencing data presents challenges for effective analysis. This review aims to investigate the application of Natural Language Processing (NLP) techniques, particularly Large Language Models (LLMs) and transformer architectures, in deciphering genomic codes, focusing on tokenization, transformer models, and regulatory annotation prediction. The goal of this review is to assess data and model accessibility in the most recent literature, gaining a better understanding of the existing capabilities and constraints of these tools in processing genomic sequencing data. Methods: Following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, our scoping review was conducted across PubMed, Medline, Scopus, Web of Science, Embase, and ACM Digital Library. Studies were included if they focused on NLP methodologies applied to genomic sequencing data analysis, without restrictions on publication date or article type. Results: A total of 26 studies published between 2021 and April 2024 were selected for review. The review highlights that tokenization and transformer models enhance the processing and understanding of genomic data, with applications in predicting regulatory annotations like transcription-factor binding sites and chromatin accessibility. Discussion: The application of NLP and LLMs to genomic sequencing data interpretation is a promising field that can help streamline the processing of large-scale genomic data while also providing a better understanding of its complex structures. It has the potential to drive advancements in personalized medicine by offering more efficient and scalable solutions for genomic analysis. Further research is also needed to discuss and overcome current limitations, enhancing model transparency and applicability.


Environment Scan of Generative AI Infrastructure for Clinical and Translational Science

arXiv.org Artificial Intelligence

This study reports a comprehensive environmental scan of the generative AI (GenAI) infrastructure in the national network for clinical and translational science across 36 institutions supported by the Clinical and Translational Science Award (CTSA) Program led by the National Center for Advancing Translational Sciences (NCATS) of the National Institutes of Health (NIH) at the United States. With the rapid advancement of GenAI technologies, including large language models (LLMs), healthcare institutions face unprecedented opportunities and challenges. This research explores the current status of GenAI integration, focusing on stakeholder roles, governance structures, and ethical considerations by administering a survey among leaders of health institutions (i.e., representing academic medical centers and health systems) to assess the institutional readiness and approach towards GenAI adoption. Key findings indicate a diverse range of institutional strategies, with most organizations in the experimental phase of GenAI deployment. The study highlights significant variations in governance models, with a strong preference for centralized decision-making but notable gaps in workforce training and ethical oversight. Moreover, the results underscore the need for a more coordinated approach to GenAI governance, emphasizing collaboration among senior leaders, clinicians, information technology staff, and researchers. Our analysis also reveals concerns regarding GenAI bias, data security, and stakeholder trust, which must be addressed to ensure the ethical and effective implementation of GenAI technologies. This study offers valuable insights into the challenges and opportunities of GenAI integration in healthcare, providing a roadmap for institutions aiming to leverage GenAI for improved quality of care and operational efficiency.


Realtime Safety Control for Bipedal Robots to Avoid Multiple Obstacles via CLF-CBF Constraints

arXiv.org Artificial Intelligence

To explore safely in such environments, it is critical for robots to generate quick, yet smooth responses to any changes in the obstacles, map, and environment. In this paper, we propose a means to design and compose control barrier functions (CBFs) for multiple non-overlapping obstacles and evaluate the system on a 20-degree-of-freedom (DoF) bipedal robot. In an autonomous system, the task of avoiding obstacles is usually handled by a planning algorithm because it has access to the map of an entire environment. Given the map, the planning algorithm is then able to design a collision-free path from the robot's current position to a goal. If the map is updated due to a change in the environment, the planner then needs to update the planned path, so-called replanning, to accommodate the new environment. Such maps are typically large and contain rich information such as semantics, terrain characteristics, and uncertainty, and thus are slow to update. This raises a concern when obstacles either move into the planned path but the map has not been updated or a robot's new pose allows the detection of previously unseen obstacles. The slow update rate of the map leads to either collision or abrupt maneuvers to avoid collisions. The non-smooth aspects arising from the map updates or changes in the perceived environment can be detrimental to the stability of the overall system.