AITopics | Xie, Li

Collaborating Authors

Xie, Li

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Data Quality Enhancement on the Basis of Diversity with Large Language Models for Text Classification: Uncovered, Difficult, and Noisy

Zeng, Min, Liu, Caiquan, Zhang, Shiqi, Xie, Li, Sang, Chen, Chen, Xiaoxin

arXiv.org Artificial IntelligenceDec-9-2024

In recent years, the use of large language models (LLMs) for text classification has attracted widespread attention. Despite this, the classification accuracy of LLMs has not yet universally surpassed that of smaller models. LLMs can enhance their performance in text classification through fine-tuning. However, existing data quality research based on LLMs is challenging to apply directly to solve text classification problems. To further improve the performance of LLMs in classification tasks, this paper proposes a data quality enhancement (DQE) method for text classification based on LLMs. This method starts by using a greedy algorithm to select data, dividing the dataset into sampled and unsampled subsets, and then performing fine-tuning of the LLMs using the sampled data. Subsequently, this model is used to predict the outcomes for the unsampled data, categorizing incorrectly predicted data into uncovered, difficult, and noisy data. Experimental results demonstrate that our method effectively enhances the performance of LLMs in text classification tasks and significantly improves training efficiency, saving nearly half of the training time. Our method has achieved state-of-the-art performance in several open-source classification tasks.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.06575

Country: North America > Mexico (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Output-Constrained Lossy Source Coding With Application to Rate-Distortion-Perception Theory

Xie, Li, Li, Liangyan, Chen, Jun, Zhang, Zhongshan

arXiv.org Artificial IntelligenceMar-21-2024

The distortion-rate function of output-constrained lossy source coding with limited common randomness is analyzed for the special case of squared error distortion measure. An explicit expression is obtained when both source and reconstruction distributions are Gaussian. This further leads to a partial characterization of the information-theoretic limit of quadratic Gaussian rate-distortion-perception coding with the perception measure given by Kullback-Leibler divergence or squared quadratic Wasserstein distance.

artificial intelligence, information management, randomness, (17 more...)

arXiv.org Artificial Intelligence

2403.14849

Country: North America > United States (0.46)

Genre: Research Report (0.40)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science (0.71)
Information Technology > Artificial Intelligence (0.68)

Add feedback

Learning Universal and Robust 3D Molecular Representations with Graph Convolutional Networks

Zhang, Shuo, Liu, Yang, Xie, Li, Xie, Lei

arXiv.org Artificial IntelligenceJul-23-2023

To learn accurate representations of molecules, it is essential to consider both chemical and geometric features. To encode geometric information, many descriptors have been proposed in constrained circumstances for specific types of molecules and do not have the properties to be ``robust": 1. Invariant to rotations and translations; 2. Injective when embedding molecular structures. In this work, we propose a universal and robust Directional Node Pair (DNP) descriptor based on the graph representations of 3D molecules. Our DNP descriptor is robust compared to previous ones and can be applied to multiple molecular types. To combine the DNP descriptor and chemical features in molecules, we construct the Robust Molecular Graph Convolutional Network (RoM-GCN) which is capable to take both node and edge features into consideration when generating molecule representations. We evaluate our model on protein and small molecule datasets. Our results validate the superiority of the DNP descriptor in incorporating 3D geometric information of molecules. RoM-GCN outperforms all compared baselines.

artificial intelligence, descriptor, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2307.12491

Genre: Research Report (0.70)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Exploration of Dark Chemical Genomics Space via Portal Learning: Applied to Targeting the Undruggable Genome and COVID-19 Anti-Infective Polypharmacology

Cai, Tian, Xie, Li, Chen, Muge, Liu, Yang, He, Di, Zhang, Shuo, Mura, Cameron, Bourne, Philip E., Xie, Lei

arXiv.org Artificial IntelligenceNov-23-2021

Advances in biomedicine are largely fueled by exploring uncharted territories of human biology. Machine learning can both enable and accelerate discovery, but faces a fundamental hurdle when applied to unseen data with distributions that differ from previously observed ones -- a common dilemma in scientific inquiry. We have developed a new deep learning framework, called {\textit{Portal Learning}}, to explore dark chemical and biological space. Three key, novel components of our approach include: (i) end-to-end, step-wise transfer learning, in recognition of biology's sequence-structure-function paradigm, (ii) out-of-cluster meta-learning, and (iii) stress model selection. Portal Learning provides a practical solution to the out-of-distribution (OOD) problem in statistical machine learning. Here, we have implemented Portal Learning to predict chemical-protein interactions on a genome-wide scale. Systematic studies demonstrate that Portal Learning can effectively assign ligands to unexplored gene families (unknown functions), versus existing state-of-the-art methods, thereby allowing us to target previously "undruggable" proteins and design novel polypharmacological agents for disrupting interactions between SARS-CoV-2 and human proteins. Portal Learning is general-purpose and can be further applied to other areas of scientific inquiry.

artificial intelligence, machine learning, protein, (21 more...)

arXiv.org Artificial Intelligence

2111.14283

Country: North America > United States > New York (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Promising Solution (0.87)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback