Huang, Yixian
SCOPE-DTI: Semi-Inductive Dataset Construction and Framework Optimization for Practical Usability Enhancement in Deep Learning-Based Drug Target Interaction Prediction
Chen, Yigang, Ji, Xiang, Zhang, Ziyue, Zhou, Yuming, Lin, Yang-Chi-Dung, Huang, Hsi-Yuan, Zhang, Tao, Lai, Yi, Chen, Ke, Su, Chang, Lin, Xingqiao, Zhu, Zihao, Zhang, Yanggyi, Wei, Kangping, Fu, Jiehui, Huang, Yixian, Cui, Shidong, Yen, Shih-Chung, Warshel, Ariel, Huang, Hsien-Da
Deep learning-based drug-target interaction (DTI) prediction methods have demonstrated strong performance; however, real-world applicability remains constrained by limited data diversity and modeling complexity. To address these challenges, we propose SCOPE-DTI, a unified framework combining a large-scale, balanced semi-inductive human DTI dataset with advanced deep learning modeling. Constructed from 13 public repositories, the SCOPE dataset expands data volume by up to 100-fold compared to common benchmarks such as the Human dataset. The SCOPE model integrates three-dimensional protein and compound representations, graph neural networks, and bilinear attention mechanisms to effectively capture cross domain interaction patterns, significantly outperforming state-of-the-art methods across various DTI prediction tasks. Additionally, SCOPE-DTI provides a user-friendly interface and database. We further validate its effectiveness by experimentally identifying anticancer targets of Ginsenoside Rh1. By offering comprehensive data, advanced modeling, and accessible tools, SCOPE-DTI accelerates drug discovery research.
Beyond English: Unveiling Multilingual Bias in LLM Copyright Compliance
Chen, Yupeng, Zhang, Xiaoyu, Huang, Yixian, Xie, Qian
Large Language Models (LLMs) have raised significant concerns regarding the fair use of copyright-protected content. While prior studies have examined the extent to which LLMs reproduce copyrighted materials, they have predominantly focused on English, neglecting multilingual dimensions of copyright protection. In this work, we investigate multilingual biases in LLM copyright protection by addressing two key questions: (1) Do LLMs exhibit bias in protecting copyrighted works across languages? (2) Is it easier to elicit copyrighted content using prompts in specific languages? To explore these questions, we construct a dataset of popular song lyrics in English, French, Chinese, and Korean and systematically probe seven LLMs using prompts in these languages. Our findings reveal significant imbalances in LLMs' handling of copyrighted content, both in terms of the language of the copyrighted material and the language of the prompt. These results highlight the need for further research and development of more robust, language-agnostic copyright protection mechanisms to ensure fair and consistent protection across languages.