AITopics | Ding, Zixiang

Collaborating Authors

Ding, Zixiang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Accelerating Large Batch Training via Gradient Signal to Noise Ratio (GSNR)

Jiang, Guo-qing, Liu, Jinlong, Ding, Zixiang, Guo, Lin, Lin, Wei

arXiv.org Artificial IntelligenceSep-24-2023

As models for nature language processing (NLP), computer vision (CV) and recommendation systems (RS) require surging computation, a large number of GPUs/TPUs are paralleled as a large batch (LB) to improve training throughput. However, training such LB tasks often meets large generalization gap and downgrades final precision, which limits enlarging the batch size. In this work, we develop the variance reduced gradient descent technique (VRGD) based on the gradient signal to noise ratio (GSNR) and apply it onto popular optimizers such as SGD/Adam/LARS/LAMB. We carry out a theoretical analysis of convergence rate to explain its fast training dynamics, and a generalization analysis to demonstrate its smaller generalization gap on LB training. Comprehensive experiments demonstrate that VRGD can accelerate training ($1\sim 2 \times$), narrow generalization gap and improve final accuracy. We push the batch size limit of BERT pretraining up to 128k/64k and DLRM to 512k without noticeable accuracy loss. We improve ImageNet Top-1 accuracy at 96k by $0.52pp$ than LARS. The generalization gap of BERT and ImageNet training is significantly reduce by over $65\%$.

artificial intelligence, batch training, gradient signal, (2 more...)

arXiv.org Artificial Intelligence

2309.13681

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Heterogeneous Knowledge Fusion: A Novel Approach for Personalized Recommendation via LLM

Yin, Bin, Xie, Junjie, Qin, Yu, Ding, Zixiang, Feng, Zhichao, Li, Xiang, Lin, Wei

arXiv.org Artificial IntelligenceAug-18-2023

In the context of Meituan Waimai, user behavior exhibits heterogeneous characteristics, including various behavior subjects, content, scenarios. The current industry approach mostly involves continuously adding various heterogeneous behavior to the traditional recommendation models, which brings two obvious problems. Firstly, the multitude of behavior subjects leads to sparse features that pose challenges to efficient modeling. Secondly, separating the modeling of user, merchant, and commodity behavior ignores the fusion of heterogeneous knowledge among behavior. However, we have noticed that heterogeneous user behavior contain rich semantic knowledge, and using semantics to represent and reason about user behavior can more effectively promote heterogeneous knowledge fusion and capture user interests. LLMs have shown remarkable capabilities in various fields, thanks to rich semantic knowledge and powerful inferential reasoning [1, 10]. We have designed a new user behavior modeling framework via LLM, which extracts and integrates heterogeneous knowledge from heterogeneous behavior information of users, and transforms structured user behavior into unstructured heterogeneous knowledge. In the field of recommendation, there have been some attempts to use LLM for personalized recommendation.

artificial intelligence, heterogeneous knowledge, survey article, (11 more...)

arXiv.org Artificial Intelligence

2308.03333

Country:

Asia > Singapore (0.20)
Asia > China (0.15)
North America > United States (0.14)

Genre:

Research Report > Promising Solution (0.42)
Overview > Innovation (0.42)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Is ChatGPT a Good Sentiment Analyzer? A Preliminary Study

Wang, Zengzhi, Xie, Qiming, Ding, Zixiang, Feng, Yi, Xia, Rui

arXiv.org Artificial IntelligenceApr-9-2023

Recently, ChatGPT has drawn great attention from both the research community and the public. We are particularly curious about whether it can serve as a universal sentiment analyzer. To this end, in this work, we provide a preliminary evaluation of ChatGPT on the understanding of opinions, sentiments, and emotions contained in the text. Specifically, we evaluate it in four settings, including standard evaluation, polarity shift evaluation, open-domain evaluation, and sentiment inference evaluation. The above evaluation involves 18 benchmark datasets and 5 representative sentiment analysis tasks, and we compare ChatGPT with fine-tuned BERT and corresponding state-of-the-art (SOTA) models on end-task. Moreover, we also conduct human evaluation and present some qualitative case studies to gain a deep comprehension of its sentiment analysis capabilities.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2304.04339

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.95)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
(2 more...)

Add feedback

SKDBERT: Compressing BERT via Stochastic Knowledge Distillation

Ding, Zixiang, Jiang, Guoqing, Zhang, Shuai, Guo, Lin, Lin, Wei

arXiv.org Artificial IntelligenceNov-28-2022

In this paper, we propose Stochastic Knowledge Distillation (SKD) to obtain compact BERT-style language model dubbed SKDBERT. In each iteration, SKD samples a teacher model from a pre-defined teacher ensemble, which consists of multiple teacher models with multi-level capacities, to transfer knowledge into student model in an one-to-one manner. Sampling distribution plays an important role in SKD. We heuristically present three types of sampling distributions to assign appropriate probabilities for multi-level teacher models. SKD has two advantages: 1) it can preserve the diversities of multi-level teacher models via stochastically sampling single teacher model in each iteration, and 2) it can also improve the efficacy of knowledge distillation via multi-level teacher models when large capacity gap exists between the teacher model and the student model. Experimental results on GLUE benchmark show that SKDBERT reduces the size of a BERT$_{\rm BASE}$ model by 40% while retaining 99.5% performances of language understanding and being 100% faster.

machine learning, natural language, teacher model, (14 more...)

arXiv.org Artificial Intelligence

2211.14466

Genre: Research Report (0.40)

Industry: Education (0.89)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Add feedback

ABCP: Automatic Block-wise and Channel-wise Network Pruning via Joint Search

Li, Jiaqi, Li, Haoran, Chen, Yaran, Ding, Zixiang, Li, Nannan, Ma, Mingjun, Duan, Zicheng, Zhao, Dongbing

arXiv.org Artificial IntelligenceOct-7-2021

Currently, an increasing number of model pruning methods are proposed to resolve the contradictions between the computer powers required by the deep learning models and the resource-constrained devices. However, most of the traditional rule-based network pruning methods can not reach a sufficient compression ratio with low accuracy loss and are time-consuming as well as laborious. In this paper, we propose Automatic Block-wise and Channel-wise Network Pruning (ABCP) to jointly search the block-wise and channel-wise pruning action with deep reinforcement learning. A joint sample algorithm is proposed to simultaneously generate the pruning choice of each residual block and the channel pruning ratio of each convolutional layer from the discrete and continuous search space respectively. The best pruning action taking both the accuracy and the complexity of the model into account is obtained finally. Compared with the traditional rule-based pruning method, this pipeline saves human labor and achieves a higher compression ratio with lower accuracy loss. Tested on the mobile robot detection dataset, the pruned YOLOv3 model saves 99.5% FLOPs, reduces 99.5% parameters, and achieves 37.3 times speed up with only 2.8% mAP loss. The results of the transfer task on the sim2real detection dataset also show that our pruned model has much better robustness performance.

artificial intelligence, machine learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

2110.03858

Country:

Asia > China (0.15)
North America > United States > California (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Heuristic Rank Selection with Progressively Searching Tensor Ring Network

Li, Nannan, Pan, Yu, Chen, Yaran, Ding, Zixiang, Zhao, Dongbin, Xu, Zenglin

arXiv.org Artificial IntelligenceSep-22-2020

Recently, Tensor Ring Networks (TRNs) have been applied in deep networks, achieving remarkable successes in compression ratio and accuracy. Although highly related to the performance of TRNs, rank is seldom studied in previous works and usually set to equal in experiments. Meanwhile, there is not any heuristic method to choose the rank, and an enumerating way to find appropriate rank is extremely time-consuming. Interestingly, we discover that part of the rank elements is sensitive and usually aggregate in a certain region, namely an interest region. Therefore, based on the above phenomenon, we propose a novel progressive genetic algorithm named Progressively Searching Tensor Ring Network Search (PSTRN), which has the ability to find optimal rank precisely and efficiently. Through the evolutionary phase and progressive phase, PSTRN can converge to the interest region quickly and harvest good performance. Experimental results show that PSTRN can significantly reduce the complexity of seeking rank, compared with the enumerating method. Furthermore, our method is validated on public benchmarks like MNIST, CIFAR10/100 and HMDB51, achieving state-of-the-art performance.

deep learning, neural network, rank element, (17 more...)

arXiv.org Artificial Intelligence

2009.1058

Country:

Asia > China (0.94)
North America > United States > Hawaii (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Faster Gradient-based NAS Pipeline Combining Broad Scalable Architecture with Confident Learning Rate

Ding, Zixiang, Chen, Yaran, Li, Nannan, Zhao, Dongbin

arXiv.org Machine LearningSep-21-2020

In order to further improve the search efficiency of Neural Architecture Search (NAS), we propose B-DARTS, a novel pipeline combining broad scalable architecture with Confident Learning Rate (CLR). In B-DARTS, Broad Convolutional Neural Network (BCNN) is employed as the scalable architecture for DARTS, a popular differentiable NAS approach. On one hand, BCNN is a broad scalable architecture whose topology achieves two advantages compared with the deep one, mainly including faster single-step training speed and higher memory efficiency (i.e. larger batch size for architecture search), which are all contributed to the search efficiency improvement of NAS. On the other hand, DARTS discovers the optimal architecture by gradient-based optimization algorithm, which benefits from two superiorities of BCNN simultaneously. Similar to vanilla DARTS, B-DARTS also suffers from the performance collapse issue, where those weight-free operations are prone to be selected by the search strategy. Therefore, we propose CLR, that considers the confidence of gradient for architecture weights update increasing with the training time of over-parameterized model, to mitigate the above issue. Experimental results on CIFAR-10 and ImageNet show that 1) B-DARTS delivers state-of-the-art efficiency of 0.09 GPU day using first order approximation on CIFAR-10; 2) the learned architecture by B-DARTS achieves competitive performance using state-of-the-art composite multiply-accumulate operations and parameters on ImageNet; and 3) the proposed CLR is effective for performance collapse issue alleviation of both B-DARTS and DARTS.

architecture, deep learning, neural network, (20 more...)

arXiv.org Machine Learning

2009.08886

Country: Asia > China (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback