AITopics | Wang, Zedong

Plotting

Wang, Zedong

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning

Li, Siyuan, Tian, Juanxi, Wang, Zedong, Zhang, Luyuan, Liu, Zicheng, Jin, Weiyang, Liu, Yang, Sun, Baigui, Li, Stan Z.

arXiv.org Artificial IntelligenceOct-8-2024

This paper delves into the interplay between vision backbones and optimizers, unvealing an inter-dependent phenomenon termed \textit{\textbf{b}ackbone-\textbf{o}ptimizer \textbf{c}oupling \textbf{b}ias} (BOCB). We observe that canonical CNNs, such as VGG and ResNet, exhibit a marked co-dependency with SGD families, while recent architectures like ViTs and ConvNeXt share a tight coupling with the adaptive learning rate ones. We further show that BOCB can be introduced by both optimizers and certain backbone designs and may significantly impact the pre-training and downstream fine-tuning of vision models. Through in-depth empirical analysis, we summarize takeaways on recommended optimizers and insights into robust vision backbone architectures. We hope this work can inspire the community to question long-held assumptions on backbones and optimizers, stimulate further explorations, and thereby contribute to more robust vision systems. The source code and models are publicly available at https://bocb-ai.github.io/.

artificial intelligence, machine learning, optimizer, (17 more...)

arXiv.org Artificial Intelligence

2410.06373

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences

Liu, Zicheng, Li, Siyuan, Wang, Li, Wang, Zedong, Liu, Yunfan, Li, Stan Z.

arXiv.org Artificial IntelligenceJun-13-2024

To mitigate the computational complexity in the self-attention mechanism on long sequences, linear attention utilizes computation tricks to achieve linear complexity, while state space models (SSMs) popularize a favorable practice of using non-data-dependent memory pattern, i.e., emphasize the near and neglect the distant, to processing sequences. Recent studies have shown the priorities by combining them as one. However, the efficiency of linear attention remains only at the theoretical level in a causal setting, and SSMs require various designed constraints to operate effectively on specific data. Therefore, in order to unveil the true power of the hybrid design, the following two issues need to be addressed: (1) hardware-efficient implementation for linear attention and (2) stabilization of SSMs. To achieve this, we leverage the thought of tiling and hierarchy to propose CHELA (short-long Convolutions with Hardware-Efficient Linear Attention), which replaces SSMs with short-long convolutions and implements linear attention in a divide-and-conquer manner. This approach enjoys global abstraction and data-dependent selection from stable SSM and linear attention while maintaining real linear complexity. Our comprehensive experiments on the Long Range Arena benchmark and language modeling tasks demonstrate the effectiveness of the proposed method.

convolution, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2406.08128

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling

Li, Siyuan, Wang, Zedong, Liu, Zicheng, Wu, Di, Tan, Cheng, Zheng, Jiangbin, Huang, Yufei, Li, Stan Z.

arXiv.org Artificial IntelligenceJun-2-2024

Similar to natural language models, pre-trained genome language models are proposed to capture the underlying intricacies within genomes with unsupervised sequence modeling. They have become essential tools for researchers and practitioners in biology. However, the hand-crafted tokenization policies used in these models may not encode the most discriminative patterns from the limited vocabulary of genomic data. In this paper, we introduce VQDNA, a general-purpose framework that renovates genome tokenization from the perspective of genome vocabulary learning. By leveraging vector-quantized codebooks as learnable vocabulary, VQDNA can adaptively tokenize genomes into pattern-aware embeddings in an end-to-end manner. To further push its limits, we propose Hierarchical Residual Quantization (HRQ), where varying scales of codebooks are designed in a hierarchy to enrich the genome vocabulary in a coarse-to-fine manner. Extensive experiments on 32 genome datasets demonstrate VQDNA's superiority and favorable parameter efficiency compared to existing genome language models. Notably, empirical analysis of SARS-CoV-2 mutations reveals the fine-grained pattern awareness and biological significance of learned HRQ vocabulary, highlighting its untapped potential for broader applications in genomics.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2405.10812

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report (0.81)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory

Liu, Zicheng, Wang, Li, Li, Siyuan, Wang, Zedong, Lin, Haitao, Li, Stan Z.

arXiv.org Artificial IntelligenceApr-18-2024

Transformer models have been successful in various sequence processing tasks, but the self-attention mechanism's computational cost limits its practicality for long sequences. Although there are existing attention variants that improve computational efficiency, they have a limited ability to abstract global information effectively based on their hand-crafted mixing strategies. On the other hand, state-space models (SSMs) are tailored for long sequences but cannot capture complicated local information. Therefore, the combination of them as a unified token mixer is a trend in recent long-sequence models. However, the linearized attention degrades performance significantly even when equipped with SSMs. To address the issue, we propose a new method called LongVQ. LongVQ uses the vector quantization (VQ) technique to compress the global abstraction as a length-fixed codebook, enabling the linear-time computation of the attention matrix. This technique effectively maintains dynamic global and local patterns, which helps to complement the lack of long-range dependency issues. Our experiments on the Long Range Arena benchmark, autoregressive language modeling, and image and speech classification demonstrate the effectiveness of LongVQ. Our model achieves significant improvements over other sequence models, including variants of Transformers, Convolutions, and recent State Space Models.

longvq, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2404.11163

Country:

Asia > China (0.14)
Europe (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Switch EMA: A Free Lunch for Better Flatness and Sharpness

Li, Siyuan, Liu, Zicheng, Tian, Juanxi, Wang, Ge, Wang, Zedong, Jin, Weiyang, Wu, Di, Tan, Cheng, Lin, Tao, Liu, Yang, Sun, Baigui, Li, Stan Z.

arXiv.org Artificial IntelligenceFeb-14-2024

From both theoretical and empirical aspects, we demonstrate The complexity and high-dimensional parameter space of that SEMA can help DNNs to reach generalization modern DNNs has posed great challenges in optimization, optima that better trade-off between such as gradient vanishing or exploding, overfitting, and degeneration flatness and sharpness. To verify the effectiveness of large batch size (You et al., 2020). To address of SEMA, we conduct comparison experiments these obstacles, two branches of research have been conducted: with discriminative, generative, and regression improving optimizers or enhancing optimization by tasks on vision and language datasets, including regularization techniques. According to their characteristics image classification, self-supervised learning, object in Tab. 1, the improved optimizers (Kingma & Ba, 2014; detection and segmentation, image generation, Loshchilov & Hutter, 2019; Ginsburg et al., 2018; Zhang video prediction, attribute regression, and et al., 2019; Foret et al., 2021) tend to be more expensive language modeling. Comprehensive results with and focus on sharpness(deeper optimal) by refining the gradient, popular optimizers and networks show that SEMA while the popular regularizations (Srivastava et al., is a free lunch for DNN training by improving performances 2014; Cubuk et al., 2019; Zhang et al., 2018; Izmailov et al., and boosting convergence speeds.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2402.0924

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.63)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Masked Modeling for Self-supervised Representation Learning on Vision and Beyond

Li, Siyuan, Zhang, Luyuan, Wang, Zedong, Wu, Di, Wu, Lirong, Liu, Zicheng, Xia, Jun, Tan, Cheng, Liu, Yang, Sun, Baigui, Li, Stan Z.

arXiv.org Artificial IntelligenceJan-9-2024

As the deep learning revolution marches on, self-supervised learning has garnered increasing attention in recent years thanks to its remarkable representation learning ability and the low dependence on labeled data. Among these varied self-supervised techniques, masked modeling has emerged as a distinctive approach that involves predicting parts of the original data that are proportionally masked during training. This paradigm enables deep models to learn robust representations and has demonstrated exceptional performance in the context of computer vision, natural language processing, and other modalities. In this survey, we present a comprehensive review of the masked modeling framework and its methodology. We elaborate on the details of techniques within masked modeling, including diverse masking strategies, recovering targets, network architectures, and more. Then, we systematically investigate its wide-ranging applications across domains. Furthermore, we also explore the commonalities and differences between masked modeling methods in different fields. Toward the end of this paper, we conclude by discussing the limitations of current techniques and point out several potential avenues for advancing masked modeling research. A paper list project with this survey is available at \url{https://github.com/Lupin1998/Awesome-MIM}.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2401.00897

Country: Asia > China > Zhejiang Province (0.14)

Genre: Overview (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

OpenSTL: A Comprehensive Benchmark of Spatio-Temporal Predictive Learning

Tan, Cheng, Li, Siyuan, Gao, Zhangyang, Guan, Wenfei, Wang, Zedong, Liu, Zicheng, Wu, Lirong, Li, Stan Z.

arXiv.org Artificial IntelligenceOct-17-2023

Spatio-temporal predictive learning is a learning paradigm that enables models to learn spatial and temporal patterns by predicting future frames from given past frames in an unsupervised manner. Despite remarkable progress in recent years, a lack of systematic understanding persists due to the diverse settings, complex implementation, and difficult reproducibility. Without standardization, comparisons can be unfair and insights inconclusive. To address this dilemma, we propose OpenSTL, a comprehensive benchmark for spatio-temporal predictive learning that categorizes prevalent approaches into recurrent-based and recurrent-free models. OpenSTL provides a modular and extensible framework implementing various state-of-the-art methods. We conduct standard evaluations on datasets across various domains, including synthetic moving object trajectory, human motion, driving scenes, traffic flow and weather forecasting. Based on our observations, we provide a detailed analysis of how model architecture and dataset properties affect spatio-temporal predictive learning performance. Surprisingly, we find that recurrent-free models achieve a good balance between efficiency and performance than recurrent models. Thus, we further extend the common MetaFormers to boost recurrent-free spatial-temporal predictive learning. We open-source the code and models at https://github.com/chengtan9907/OpenSTL.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2306.11249

Country: Asia > China (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SemiReward: A General Reward Model for Semi-supervised Learning

Li, Siyuan, Jin, Weiyang, Wang, Zedong, Wu, Fang, Liu, Zicheng, Tan, Cheng, Li, Stan Z.

arXiv.org Artificial IntelligenceOct-4-2023

Semi-supervised learning (SSL) has witnessed great progress with various improvements in the self-training framework with pseudo labeling. The main challenge is how to distinguish high-quality pseudo labels against the confirmation bias. However, existing pseudo-label selection strategies are limited to pre-defined schemes or complex hand-crafted policies specially designed for classification, failing to achieve high-quality labels, fast convergence, and task versatility simultaneously. To these ends, we propose a Semi-supervised Reward framework (SemiReward) that predicts reward scores to evaluate and filter out high-quality pseudo labels, which is pluggable to mainstream SSL methods in wide task types and scenarios. To mitigate confirmation bias, SemiReward is trained online in two stages with a generator model and subsampling strategy. With classification and regression tasks on 13 standard SSL benchmarks of three modalities, extensive experiments verify that SemiReward achieves significant performance gains and faster convergence speeds upon Pseudo Label, FlexMatch, and Free/SoftMatch.

artificial intelligence, machine learning, semi-supervised learning, (2 more...)

arXiv.org Artificial Intelligence

2310.03013

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.60)

Add feedback

Efficient Multi-order Gated Aggregation Network

Li, Siyuan, Wang, Zedong, Liu, Zicheng, Tan, Cheng, Lin, Haitao, Wu, Di, Chen, Zhiyuan, Zheng, Jiangbin, Li, Stan Z.

arXiv.org Artificial IntelligenceMar-19-2023

Since the recent success of Vision Transformers (ViTs), explorations toward ViT-style architectures have triggered the resurgence of ConvNets. In this work, we explore the representation ability of modern ConvNets from a novel view of multi-order game-theoretic interaction, which reflects inter-variable interaction effects w.r.t.~contexts of different scales based on game theory. Within the modern ConvNet framework, we tailor the two feature mixers with conceptually simple yet effective depthwise convolutions to facilitate middle-order information across spatial and channel spaces respectively. In this light, a new family of pure ConvNet architecture, dubbed MogaNet, is proposed, which shows excellent scalability and attains competitive results among state-of-the-art models with more efficient use of parameters on ImageNet and multifarious typical vision benchmarks, including COCO object detection, ADE20K semantic segmentation, 2D\&3D human pose estimation, and video prediction. Typically, MogaNet hits 80.0\% and 87.8\% top-1 accuracy with 5.2M and 181M parameters on ImageNet, outperforming ParC-Net-S and ConvNeXt-L while saving 59\% FLOPs and 17M parameters. The source code is available at \url{https://github.com/Westlake-AI/MogaNet}.

artificial intelligence, efficient multi-order gated aggregation network, game theory

arXiv.org Artificial Intelligence

2211.03295

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Vision (0.87)

Add feedback