AITopics | Zhao, Zihan

Collaborating Authors

Zhao, Zihan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards Efficient Large Multimodal Model Serving

Qiu, Haoran, Biswas, Anish, Zhao, Zihan, Mohan, Jayashree, Khare, Alind, Choukse, Esha, Goiri, Íñigo, Zhang, Zeyu, Shen, Haiying, Bansal, Chetan, Ramjee, Ramachandran, Fonseca, Rodrigo

arXiv.org Artificial IntelligenceFeb-2-2025

Recent advances in generative AI have led to large multi-modal models (LMMs) capable of simultaneously processing inputs of various modalities such as text, images, video, and audio. While these models demonstrate impressive capabilities, efficiently serving them in production environments poses significant challenges due to their complex architectures and heterogeneous resource requirements. We present the first comprehensive systems analysis of two prominent LMM architectures, decoder-only and cross-attention, on six representative open-source models. We investigate their multi-stage inference pipelines and resource utilization patterns that lead to unique systems design implications. We also present an in-depth analysis of production LMM inference traces, uncovering unique workload characteristics, including variable, heavy-tailed request distributions, diverse modal combinations, and bursty traffic patterns. Our key findings reveal that different LMM inference stages exhibit highly heterogeneous performance characteristics and resource demands, while concurrent requests across modalities lead to significant performance interference. To address these challenges, we propose a decoupled serving architecture that enables independent resource allocation and adaptive scaling for each stage. We further propose optimizations such as stage colocation to maximize throughput and resource utilization while meeting the latency objectives.

large language model, latency, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.00937

Country:

North America > United States (0.14)
Europe > France (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)

Add feedback

ChemDFM-X: Towards Large Multimodal Model for Chemistry

Zhao, Zihan, Chen, Bo, Li, Jingpiao, Chen, Lu, Wen, Liyang, Wang, Pengyu, Zhu, Zichen, Zhang, Danyang, Wan, Ziping, Li, Yansi, Dai, Zhongyang, Chen, Xin, Yu, Kai

arXiv.org Artificial IntelligenceJan-2-2025

Rapid developments of AI tools are expected to offer unprecedented assistance to the research of natural science including chemistry. However, neither existing unimodal task-specific specialist models nor emerging general large multimodal models (LMM) can cover the wide range of chemical data modality and task categories. To address the real demands of chemists, a cross-modal Chemical General Intelligence (CGI) system, which serves as a truly practical and useful research assistant utilizing the great potential of LMMs, is in great need. In this work, we introduce the first Cross-modal Dialogue Foundation Model for Chemistry (ChemDFM-X). Diverse multimodal data are generated from an initial modality by approximate calculations and task-specific model predictions. This strategy creates sufficient chemical training corpora, while significantly reducing excessive expense, resulting in an instruction-tuning dataset containing 7.6M data. After instruction finetuning, ChemDFM-X is evaluated on extensive experiments of different chemical tasks with various data modalities. The results demonstrate the capacity of ChemDFM-X for multimodal and inter-modal knowledge comprehension. ChemDFM-X marks a significant milestone toward aligning all modalities in chemistry, a step closer to CGI.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s11432-024-4243-0

2409.13194

Country:

North America > United States (0.14)
Asia > China (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)

Add feedback

Learning Symmetry-Independent Jet Representations via Jet-Based Joint Embedding Predictive Architecture

Katel, Subash, Li, Haoyang, Zhao, Zihan, Kansal, Raghav, Mokhtar, Farouk, Duarte, Javier

arXiv.org Artificial IntelligenceDec-5-2024

In high energy physics, self-supervised learning (SSL) methods have the potential to aid in the creation of machine learning models without the need for labeled datasets for a variety of tasks, including those related to jets -- narrow sprays of particles produced by quarks and gluons in high energy particle collisions. This study introduces an approach to learning jet representations without hand-crafted augmentations using a jet-based joint embedding predictive architecture (J-JEPA), which aims to predict various physical targets from an informative context. As our method does not require hand-crafted augmentation like other common SSL techniques, J-JEPA avoids introducing biases that could harm downstream tasks. Since different tasks generally require invariance under different augmentations, this training without hand-crafted augmentation enables versatile applications, offering a pathway toward a cross-task foundation model. We finetune the representations learned by J-JEPA for jet tagging and benchmark them against task-specific representations.

artificial intelligence, machine learning, representation, (15 more...)

arXiv.org Artificial Intelligence

2412.05333

Country: North America > United States > California (0.48)

Genre: Research Report (0.64)

Industry: Government > Regional Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

SciDFM: A Large Language Model with Mixture-of-Experts for Science

Sun, Liangtai, Luo, Danyu, Ma, Da, Zhao, Zihan, Chen, Baocai, Shen, Zhennan, Zhu, Su, Chen, Lu, Chen, Xin, Yu, Kai

arXiv.org Artificial IntelligenceNov-12-2024

Recently, there has been a significant upsurge of interest in leveraging large language models (LLMs) to assist scientific discovery. However, most LLMs only focus on general science, while they lack domain-specific knowledge, such as chemical molecules and amino acid sequences. To bridge these gaps, we introduce SciDFM, a mixture-of-experts LLM, which is trained from scratch and is able to conduct college-level scientific reasoning and understand molecules and amino acid sequences. We collect a large-scale training corpus containing numerous scientific papers and books from different disciplines as well as data from domain-specific databases. We further fine-tune the pre-trained model on lots of instruction data to improve performances on downstream benchmarks. From experiment results, we show that SciDFM achieves strong performance on general scientific benchmarks such as SciEval and SciQ, and it reaches a SOTA performance on domain-specific benchmarks among models of similar size. We further analyze the expert layers and show that the results of expert selection vary with data from different disciplines.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2409.18412

Country: Asia > China (0.15)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Education > Curriculum > Subject-Specific Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

SEEKR: Selective Attention-Guided Knowledge Retention for Continual Learning of Large Language Models

He, Jinghan, Guo, Haiyun, Zhu, Kuan, Zhao, Zihan, Tang, Ming, Wang, Jinqiao

arXiv.org Artificial IntelligenceNov-9-2024

Continual learning (CL) is crucial for language models to dynamically adapt to the evolving real-world demands. To mitigate the catastrophic forgetting problem in CL, data replay has been proven a simple and effective strategy, and the subsequent data-replay-based distillation can further enhance the performance. However, existing methods fail to fully exploit the knowledge embedded in models from previous tasks, resulting in the need for a relatively large number of replay samples to achieve good results. In this work, we first explore and emphasize the importance of attention weights in knowledge retention, and then propose a SElective attEntion-guided Knowledge Retention method (SEEKR) for data-efficient replay-based continual learning of large language models (LLMs). Specifically, SEEKR performs attention distillation on the selected attention heads for finer-grained knowledge retention, where the proposed forgettability-based and task-sensitivity-based measures are used to identify the most valuable attention heads. Experimental results on two continual learning benchmarks for LLMs demonstrate the superiority of SEEKR over the existing methods on both performance and efficiency. Explicitly, SEEKR achieves comparable or even better performance with only 1/10 of the replayed data used by other methods, and reduces the proportion of replayed data to 1%.

large language model, natural language, seekr, (14 more...)

arXiv.org Artificial Intelligence

2411.06171

Country: Asia > China (0.46)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Boosting Medical Image Classification with Segmentation Foundation Model

Gu, Pengfei, Zhao, Zihan, Wang, Hongxiao, Peng, Yaopeng, Zhang, Yizhe, Sapkota, Nishchal, Wang, Chaoli, Chen, Danny Z.

arXiv.org Artificial IntelligenceJun-16-2024

The Segment Anything Model (SAM) exhibits impressive capabilities in zero-shot segmentation for natural images. Recently, SAM has gained a great deal of attention for its applications in medical image segmentation. However, to our best knowledge, no studies have shown how to harness the power of SAM for medical image classification. To fill this gap and make SAM a true ``foundation model'' for medical image analysis, it is highly desirable to customize SAM specifically for medical image classification. In this paper, we introduce SAMAug-C, an innovative augmentation method based on SAM for augmenting classification datasets by generating variants of the original images. The augmented datasets can be used to train a deep learning classification model, thereby boosting the classification performance. Furthermore, we propose a novel framework that simultaneously processes raw and SAMAug-C augmented image input, capitalizing on the complementary information that is offered by both. Experiments on three public datasets validate the effectiveness of our new approach.

artificial intelligence, classification, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2406.11026

Country:

Asia > China (0.15)
North America > United States (0.14)

Genre:

Instructional Material > Online (0.83)
Instructional Material > Course Syllabus & Notes (0.83)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding

Xu, Hongshen, Chen, Lu, Zhao, Zihan, Ma, Da, Cao, Ruisheng, Zhu, Zichen, Yu, Kai

arXiv.org Artificial IntelligenceFeb-28-2024

The growing prevalence of visually rich documents, such as webpages and scanned/digital-born documents (images, PDFs, etc.), has led to increased interest in automatic document understanding and information extraction across academia and industry. Although various document modalities, including image, text, layout, and structure, facilitate human information retrieval, the interconnected nature of these modalities presents challenges for neural networks. In this paper, we introduce WebLM, a multimodal pre-training network designed to address the limitations of solely modeling text and structure modalities of HTML in webpages. Instead of processing document images as unified natural images, WebLM integrates the hierarchical structure of document images to enhance the understanding of markup-language-based documents. Additionally, we propose several pre-training tasks to model the interaction among text, structure, and image modalities effectively. Empirical results demonstrate that the pre-trained WebLM significantly surpasses previous state-of-the-art pre-trained models across several webpage understanding tasks. The pre-trained models and code are available at https://github.com/X-LANCE/weblm.

hierarchical multimodal pre-training, machine learning, natural language, (1 more...)

arXiv.org Artificial Intelligence

2402.18262

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Multi: Multimodal Understanding Leaderboard with Text and Images

Zhu, Zichen, Xu, Yang, Chen, Lu, Yang, Jingkai, Ma, Yichuan, Sun, Yiming, Wen, Hailin, Liu, Jiaqi, Cai, Jinyu, Ma, Yingzi, Zhang, Situo, Zhao, Zihan, Sun, Liangtai, Yu, Kai

arXiv.org Artificial IntelligenceFeb-5-2024

Rapid progress in multimodal large language models (MLLMs) highlights the need to introduce challenging yet realistic benchmarks to the academic community. Existing benchmarks primarily focus on simple natural image understanding, but Multi emerges as a cutting-edge benchmark for MLLMs, offering a comprehensive dataset for evaluating MLLMs against understanding complex figures and tables, and scientific questions. This benchmark, reflecting current realistic examination styles, provides multimodal inputs and requires responses that are either precise or open-ended, similar to real-life school tests. It challenges MLLMs with a variety of tasks, ranging from formula derivation to image detail analysis, and cross-modality reasoning. Multi includes over 18,000 questions, with a focus on science-based QA in diverse formats. We also introduce Multi-Elite, a 500-question subset for testing the extremities of MLLMs, and Multi-Extend, which enhances In-Context Learning research with more than 4,500 knowledge pieces. Our evaluation indicates significant potential for MLLM advancement, with GPT-4V achieving a 63.7% accuracy rate on Multi, in contrast to other MLLMs scoring between 31.3% and 53.7%. Multi serves not only as a robust evaluation platform but also paves the way for the development of expert-level AI.

information, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2402.03173

Country:

Asia > China (0.14)
Europe > Switzerland (0.14)
Europe > Spain (0.14)

Genre: Research Report (0.82)

Industry: Education (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Computational Experiments Meet Large Language Model Based Agents: A Survey and Perspective

Ma, Qun, Xue, Xiao, Zhou, Deyu, Yu, Xiangning, Liu, Donghua, Zhang, Xuwen, Zhao, Zihan, Shen, Yifan, Ji, Peilin, Li, Juanjuan, Wang, Gang, Ma, Wanpeng

arXiv.org Artificial IntelligenceJan-31-2024

Computational experiments have emerged as a valuable method for studying complex systems, involving the algorithmization of counterfactuals. However, accurately representing real social systems in Agent-based Modeling (ABM) is challenging due to the diverse and intricate characteristics of humans, including bounded rationality and heterogeneity. To address this limitation, the integration of Large Language Models (LLMs) has been proposed, enabling agents to possess anthropomorphic abilities such as complex reasoning and autonomous learning. These agents, known as LLM-based Agent, offer the potential to enhance the anthropomorphism lacking in ABM. Nonetheless, the absence of explicit explainability in LLMs significantly hinders their application in the social sciences. Conversely, computational experiments excel in providing causal analysis of individual behaviors and complex phenomena. Thus, combining computational experiments with LLM-based Agent holds substantial research potential. This paper aims to present a comprehensive exploration of this fusion. Primarily, it outlines the historical development of agent structures and their evolution into artificial societies, emphasizing their importance in computational experiments. Then it elucidates the advantages that computational experiments and LLM-based Agents offer each other, considering the perspectives of LLM-based Agent for computational experiments and vice versa. Finally, this paper addresses the challenges and future trends in this research domain, offering guidance for subsequent related studies.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2402.00262

Country:

Asia > China (1.00)
Europe (0.92)
North America > United States > New York (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > Experimental Study (0.92)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

ChemDFM: Dialogue Foundation Model for Chemistry

Zhao, Zihan, Ma, Da, Chen, Lu, Sun, Liangtai, Li, Zihao, Xu, Hongshen, Zhu, Zichen, Zhu, Su, Fan, Shuai, Shen, Guodong, Chen, Xin, Yu, Kai

arXiv.org Artificial IntelligenceJan-26-2024

Large language models (LLMs) have established great success in the general domain of natural language processing. Their emerging task generalization and free-form dialogue capabilities can greatly help to design Chemical General Intelligence (CGI) to assist real-world research in chemistry. However, the existence of specialized language and knowledge in the field of chemistry, such as the highly informative SMILES notation, hinders the performance of general-domain LLMs in chemistry. To this end, we develop ChemDFM, the first LLM towards CGI. ChemDFM-13B is trained on 34B tokens from chemical literature, textbooks, and instructions as well as various data from the general domain. Therefore, it can store, understand, and reason over chemical knowledge and languages while still possessing advanced free-form language comprehension capabilities. Extensive quantitative evaluation shows that ChemDFM can significantly outperform the representative open-sourced LLMs. Moreover, ChemDFM can also surpass GPT-4 on a great portion of chemical tasks, despite the significant size difference. Further qualitative evaluations demonstrate the efficiency and effectiveness of ChemDFM in real-world research scenarios. We will open-source the ChemDFM model soon.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2401.14818

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Energy > Oil & Gas (1.00)
Materials > Chemicals > Commodity Chemicals > Petrochemicals (0.93)
Health & Medicine > Pharmaceuticals & Biotechnology (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback