AITopics | He, Mutian

Collaborating Authors

He, Mutian

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity

He, Mutian, Garner, Philip N.

arXiv.org Artificial IntelligenceDec-23-2024

Architectures such as Linformer and Mamba have recently emerged as competitive linear time replacements for transformers. However, corresponding large pretrained models are often unavailable, especially in non-text domains. To remedy this, we present a Cross-Architecture Layerwise Distillation (CALD) approach that jointly converts a transformer model to a linear time substitute and fine-tunes it to a target task. We also compare several means to guide the fine-tuning to optimally retain the desired inference capability from the original model. The methods differ in their use of the target model and the trajectory of the parameters. In a series of empirical studies on language processing, language modeling, and speech processing, we show that CALD can effectively recover the result of the original model, and that the guiding strategy contributes to the result. Some reasons for the variation are suggested.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2410.06846

Country: Europe > Switzerland (0.14)

Genre: Research Report (0.90)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

DEL-Ranking: Ranking-Correction Denoising Framework for Elucidating Molecular Affinities in DNA-Encoded Libraries

Cao, Hanqun, He, Mutian, Ma, Ning, Hsieh, Chang-yu, Gu, Chunbin, Heng, Pheng-Ann

arXiv.org Artificial IntelligenceDec-4-2024

DNA-encoded library (DEL) screening has revolutionized protein-ligand binding detection, enabling rapid exploration of vast chemical spaces through read count analysis. However, two critical challenges limit its effectiveness: distribution noise in low copy number regimes and systematic shifts between read counts and true binding affinities. We present DEL-Ranking, a comprehensive framework that simultaneously addresses both challenges through innovative rankingbased denoising and activity-referenced correction. Our approach introduces a dual-perspective ranking strategy combining Pair-wise Soft Rank (PSR) and Listwise Global Rank (LGR) constraints to preserve both local and global count relationships. Additionally, we develop an Activity-Referenced Correction (ARC) module that bridges the gap between read counts and binding affinities through iterative refinement and biological consistency enforcement. Another key contribution of this work is the curation and release of three comprehensive DEL datasets that uniquely combine ligand 2D sequences, 3D conformational information, and experimentally validated activity labels. We validate our framework on five diverse DEL datasets and introduce three new comprehensive datasets featuring 2D sequences, 3D structures, and activity labels. DEL-Ranking achieves state-of-the-art performance across multiple correlation metrics and demonstrates strong generalization ability across different protein targets. Importantly, our approach successfully identifies key functional groups associated with binding affinity, providing actionable insights for drug discovery.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2410.14946

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Acquiring and Modelling Abstract Commonsense Knowledge via Conceptualization

He, Mutian, Fang, Tianqing, Wang, Weiqi, Song, Yangqiu

arXiv.org Artificial IntelligenceMay-18-2024

Conceptualization, or viewing entities and situations as instances of abstract concepts in mind and making inferences based on that, is a vital component in human intelligence for commonsense reasoning. Despite recent progress in artificial intelligence to acquire and model commonsense attributed to neural language models and commonsense knowledge graphs (CKGs), conceptualization is yet to be introduced thoroughly, making current approaches ineffective to cover knowledge about countless diverse entities and situations in the real world. To address the problem, we thoroughly study the role of conceptualization in commonsense reasoning, and formulate a framework to replicate human conceptual induction by acquiring abstract knowledge about events regarding abstract concepts, as well as higher-level triples or inferences upon them. We then apply the framework to ATOMIC, a large-scale human-annotated CKG, aided by the taxonomy Probase. We annotate a dataset on the validity of contextualized conceptualizations from ATOMIC on both event and triple levels, develop a series of heuristic rules based on linguistic features, and train a set of neural models to generate and verify abstract knowledge. Based on these components, a pipeline to acquire abstract knowledge is built. A large abstract CKG upon ATOMIC is then induced, ready to be instantiated to infer about unseen entities or situations. Finally, we empirically show the benefits of augmenting CKGs with abstract knowledge in downstream tasks like commonsense inference and zero-shot commonsense QA.

conceptualization, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.artint.2024.104149

2206.01532

Country:

North America > Canada (0.93)
Asia > China (0.68)
South America (0.67)
(5 more...)

Genre: Research Report (0.81)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis > Beverages (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.67)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(2 more...)

Add feedback

The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation

He, Mutian, Garner, Philip N.

arXiv.org Artificial IntelligenceOct-17-2023

End-to-end spoken language understanding (SLU) remains elusive even with current large pretrained language models on text and speech, especially in multilingual cases. Machine translation has been established as a powerful pretraining objective on text as it enables the model to capture high-level semantics of the input utterance and associations between different languages, which is desired for speech models that work on lower-level acoustic frames. Motivated particularly by the task of cross-lingual SLU, we demonstrate that the task of speech translation (ST) is a good means of pretraining speech models for end-to-end SLU on both intra- and cross-lingual scenarios. By introducing ST, our models reach higher performance over baselines on monolingual and multilingual intent classification as well as spoken question answering using SLURP, MINDS-14, and NMSQA benchmarks. To verify the effectiveness of our methods, we also create new benchmark datasets from both synthetic and real sources, for speech summarization and low-resource/zero-shot transfer from English to French or Spanish. We further show the value of preserving knowledge for the ST pretraining task for better downstream performance, possibly using Bayesian transfer regularizers.

machine learning, natural language, translation, (21 more...)

arXiv.org Artificial Intelligence

2305.09652

Country:

Europe (0.67)
North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Can ChatGPT Detect Intent? Evaluating Large Language Models for Spoken Language Understanding

He, Mutian, Garner, Philip N.

arXiv.org Artificial IntelligenceAug-17-2023

Recently, large pretrained language models have demonstrated strong language understanding capabilities. This is particularly reflected in their zero-shot and in-context learning abilities on downstream tasks through prompting. To assess their impact on spoken language understanding (SLU), we evaluate several such models like ChatGPT and OPT of different sizes on multiple benchmarks. We verify the emergent ability unique to the largest models as they can reach intent classification accuracy close to that of supervised models with zero or few shots on various languages given oracle transcripts. By contrast, the results for smaller models fitting a single GPU fall far behind. We note that the error cases often arise from the annotation scheme of the dataset; responses from ChatGPT are still reasonable. We show, however, that the model is worse at slot filling, and its performance is sensitive to ASR errors, suggesting serious challenges for the application of those textual models on SLU.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.13512

Country: Europe > Switzerland (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback