AITopics | Zong, Licheng

Collaborating Authors

Zong, Licheng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Benchmarking Large Language Models on Multiple Tasks in Bioinformatics NLP with Prompting

Jiang, Jiyue, Chen, Pengan, Wang, Jiuming, He, Dongchen, Wei, Ziqin, Hong, Liang, Zong, Licheng, Wang, Sheng, Yu, Qinze, Ma, Zixian, Chen, Yanyu, Fan, Yimin, Shi, Xiangyu, Sun, Jiawei, Wu, Chuan, Li, Yu

arXiv.org Artificial IntelligenceMar-5-2025

Large language models (LLMs) have become important tools in solving biological problems, offering improvements in accuracy and adaptability over conventional methods. Several benchmarks have been proposed to evaluate the performance of these LLMs. However, current benchmarks can hardly evaluate the performance of these models across diverse tasks effectively. In this paper, we introduce a comprehensive prompting-based benchmarking framework, termed Bio-benchmark, which includes 30 key bioinformatics tasks covering areas such as proteins, RNA, drugs, electronic health records, and traditional Chinese medicine. Using this benchmark, we evaluate six mainstream LLMs, including GPT-4o and Llama-3.1-70b, etc., using 0-shot and few-shot Chain-of-Thought (CoT) settings without fine-tuning to reveal their intrinsic capabilities. To improve the efficiency of our evaluations, we demonstrate BioFinder, a new tool for extracting answers from LLM responses, which increases extraction accuracy by round 30% compared to existing methods. Our benchmark results show the biological tasks suitable for current LLMs and identify specific areas requiring enhancement. Furthermore, we propose targeted prompt engineering strategies for optimizing LLM performance in these contexts. Based on these findings, we provide recommendations for the development of more robust LLMs tailored for various biological applications. This work offers a comprehensive evaluation framework and robust tools to support the application of LLMs in bioinformatics.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.04013

Country: Asia > China (0.46)

Genre: Research Report > New Finding (0.87)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.54)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Enhancing Biomedical Knowledge Retrieval-Augmented Generation with Self-Rewarding Tree Search and Proximal Policy Optimization

Hu, Minda, Zong, Licheng, Wang, Hongru, Zhou, Jingyan, Li, Jingjing, Gao, Yichen, Wong, Kam-Fai, Li, Yu, King, Irwin

arXiv.org Artificial IntelligenceJun-17-2024

Large Language Models (LLMs) have shown great potential in the biomedical domain with the advancement of retrieval-augmented generation (RAG). However, existing retrieval-augmented approaches face challenges in addressing diverse queries and documents, particularly for medical knowledge queries, resulting in sub-optimal performance. To address these limitations, we propose a novel plug-and-play LLM-based retrieval method called Self-Rewarding Tree Search (SeRTS) based on Monte Carlo Tree Search (MCTS) and a self-rewarding paradigm. By combining the reasoning capabilities of LLMs with the effectiveness of tree search, SeRTS boosts the zero-shot performance of retrieving high-quality and informative results for RAG. We further enhance retrieval performance by fine-tuning LLMs with Proximal Policy Optimization (PPO) objectives using the trajectories collected by SeRTS as feedback. Controlled experiments using the BioASQ-QA dataset with GPT-3.5-Turbo and LLama2-7b demonstrate that our method significantly improves the performance of the BM25 retriever and surpasses the strong baseline of self-reflection in both efficiency and scalability. Moreover, SeRTS generates higher-quality feedback for PPO training than self-reflection. Our proposed method effectively adapts LLMs to document retrieval tasks, enhancing their ability to retrieve highly relevant documents for RAG in the context of medical knowledge queries. This work presents a significant step forward in leveraging LLMs for accurate and comprehensive biomedical question answering.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.11258

Country:

North America > Canada (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine (1.00)
Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

HMD-AMP: Protein Language-Powered Hierarchical Multi-label Deep Forest for Annotating Antimicrobial Peptides

Yu, Qinze, Dong, Zhihang, Fan, Xingyu, Zong, Licheng, Li, Yu

arXiv.org Artificial IntelligenceNov-10-2021

Identifying the targets of an antimicrobial peptide is a fundamental step in studying the innate immune response and combating antibiotic resistance, and more broadly, precision medicine and public health. There have been extensive studies on the statistical and computational approaches to identify (i) whether a peptide is an antimicrobial peptide (AMP) or a non-AMP and (ii) which targets are these sequences effective to (Gram-positive, Gram-negative, etc.). Despite the existing deep learning methods on this problem, most of them are unable to handle the small AMP classes (anti-insect, anti-parasite, etc.). And more importantly, some AMPs can have multiple targets, which the previous methods fail to consider. In this study, we build a diverse and comprehensive multi-label protein sequence database by collecting and cleaning amino acids from various AMP databases. To generate efficient representations and features for the small classes dataset, we take advantage of a protein language model trained on 250 million protein sequences. Based on that, we develop an end-to-end hierarchical multi-label deep forest framework, HMD-AMP, to annotate AMP comprehensively. After identifying an AMP, it further predicts what targets the AMP can effectively kill from eleven available classes. Extensive experiments suggest that our framework outperforms state-of-the-art models in both the binary classification task and the multi-label classification task, especially on the minor classes.The model is robust against reduced features and small perturbations and produces promising results. We believe HMD-AMP contributes to both the future wet-lab investigations of the innate structural properties of different antimicrobial peptides and build promising empirical underpinnings for precise medicine with antibiotics.

artificial intelligence, health & medicine, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2111.06023

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback