Goto

Collaborating Authors

 Zhang, Jinghong


TrustDataFilter:Leveraging Trusted Knowledge Base Data for More Effective Filtering of Unknown Information

arXiv.org Artificial Intelligence

With the advancement of technology and changes in the market, the demand for the construction of domain-specific knowledge bases has been increasing, either to improve model performance or to promote enterprise innovation and competitiveness. The construction of domain-specific knowledge bases typically relies on web crawlers or existing industry databases, leading to problems with accuracy and consistency of the data. To address these challenges, we considered the characteristics of domain data, where internal knowledge is interconnected, and proposed the Self-Natural Language Inference Data Filtering (self-nli-TDF) framework. This framework compares trusted filtered knowledge with the data to be filtered, deducing the reasoning relationship between them, thus improving filtering performance. The framework uses plug-and-play large language models for trustworthiness assessment and employs the RoBERTa-MNLI model from the NLI domain for reasoning. We constructed three datasets in the domains of biology, radiation, and science, and conducted experiments using RoBERTa, GPT3.5, and the local Qwen2 model. The experimental results show that this framework improves filter quality, producing more consistent and reliable filtering results.


Tree Search-Based Evolutionary Bandits for Protein Sequence Optimization

arXiv.org Artificial Intelligence

Even with the best and largest pre-trained protein language models such Advances in biotechnology have demonstrated human's unprecedented as ESM-1b [33] and ProGen2 [29], one often needs to explore capabilities to engineer proteins. They make it an almost unknown domain and learn a new function possible to directly design the amino acid sequences that map in order to discover new drugs. This is especially true encode proteins for desired functions, towards improving with antibody engineering. Antibodies have highly diverse biochemical or enzymatic properties such as stability, binding complementarity-determining region (CDR) sequences that affinity, or catalytic activity. Directed evolution (DE), can be altered, resulting in a huge sequence space to explore for example, is a method for exploring new protein designs for optimal properties. The binding of antibodies to their targets with properties of interest and maximal utility, by mimicking are extrinsic properties of antibodies and it is difficult to the natural evolution process. The development of DE accurately model the sequence-binding relationships solely was honored in 2018 with the awarding of the Nobel Prize from the sequences alone. Further, most of the exploration in Chemistry to Frances Arnold for the directed evolution strategies used in practice lack theoretical guarantees. of enzymes, and George Smith and Gregory Winter for the development of phage display [3, 41, 48].