AITopics | Cheng, Zhiyuan

Collaborating Authors

Cheng, Zhiyuan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts

Li, Dengchun, Ma, Yingzi, Wang, Naizheng, Ye, Zhengmao, Cheng, Zhiyuan, Tang, Yinghao, Zhang, Yan, Duan, Lei, Zuo, Jie, Yang, Cal, Tang, Mingjie

arXiv.org Artificial IntelligenceMay-23-2024

Fine-tuning Large Language Models (LLMs) is a common practice to adapt pre-trained models for specific applications. While methods like LoRA have effectively addressed GPU memory constraints during fine-tuning, their performance often falls short, especially in multi-task scenarios. In contrast, Mixture-of-Expert (MoE) models, such as Mixtral 8x7B, demonstrate remarkable performance in multi-task learning scenarios while maintaining a reduced parameter count. However, the resource requirements of these MoEs remain challenging, particularly for consumer-grade GPUs with less than 24GB memory. To tackle these challenges, we propose MixLoRA, an approach to construct a resource-efficient sparse MoE model based on LoRA. MixLoRA inserts multiple LoRA-based experts within the feed-forward network block of a frozen pre-trained dense model and employs a commonly used top-k router. Unlike other LoRA-based MoE methods, MixLoRA enhances model performance by utilizing independent attention-layer LoRA adapters. Additionally, an auxiliary load balance loss is employed to address the imbalance problem of the router. Our evaluations show that MixLoRA improves about 9% accuracy compared to state-of-the-art PEFT methods in multi-task learning scenarios. We also propose a new high-throughput framework to alleviate the computation and memory bottlenecks during the training and inference of MOE models. This framework reduces GPU memory consumption by 40% and token computation latency by 30% during both training and inference.

arxiv preprint arxiv, large language model, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2404.15159

Country:

North America > United States (0.14)
Asia > China (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Adversarial Training of Self-supervised Monocular Depth Estimation against Physical-World Attacks

Cheng, Zhiyuan, Liang, James, Tao, Guanhong, Liu, Dongfang, Zhang, Xiangyu

arXiv.org Artificial IntelligenceApr-2-2023

Monocular Depth Estimation (MDE) is a critical component in applications such as autonomous driving. There are various attacks against MDE networks. These attacks, especially the physical ones, pose a great threat to the security of such systems. Traditional adversarial training method requires ground-truth labels hence cannot be directly applied to self-supervised MDE that does not have ground-truth depth. Some self-supervised model hardening techniques (e.g., contrastive learning) ignore the domain knowledge of MDE and can hardly achieve optimal performance. In this work, we propose a novel adversarial training method for self-supervised MDE models based on view synthesis without using ground-truth depth. We improve adversarial robustness against physical-world attacks using L0-norm-bounded perturbation in training. We compare our method with supervised learning based and contrastive learning based methods that are tailored for MDE. Results on two representative MDE networks show that we achieve better robustness against various adversarial attacks with nearly no benign performance degradation.

artificial intelligence, machine learning, perturbation, (18 more...)

arXiv.org Artificial Intelligence

2301.13487

Genre: Research Report > New Finding (0.67)

Industry:

Automobiles & Trucks > Manufacturer (0.93)
Information Technology > Security & Privacy (0.89)
Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.86)

Add feedback

Exploring Millions of Footprints in Location Sharing Services

Cheng, Zhiyuan (Texas A&M University) | Caverlee, James (Texas A&M University) | Lee, Kyumin (Texas A&M University) | Sui, Daniel Z. (Ohio State University)

AAAI ConferencesJul-12-2011

Location sharing services (LSS) like Foursquare, Gowalla, and Facebook Places support hundreds of millions of user-driven footprints (i.e., "checkins"). Those global-scale footprints provide a unique opportunity to study the social and temporal characteristics of how people use these services and to model patterns of human mobility, which are significant factors for the design of future mobile+location-based services, traffic forecasting, urban planning, as well as epidemiological models of disease spread. In this paper, we investigate 22 million checkins across 220,000 users and report a quantitative assessment of human mobility patterns by analyzing the spatial, temporal, social, and textual aspects associated with these footprints. We find that: (i) LSS users follow the “Levy Flight” mobility pattern and adopt periodic behaviors; (ii) While geographic and economic constraints affect mobility patterns, so does individual social status; and (iii) Content and sentiment-based analysis of posts associated with checkins can provide a rich source of context for better understanding how users engage with these services.

checkin, social media, spatial reasoning, (22 more...)

AAAI Conferences

Fifth International AAAI Conference on Weblogs and Social Media

Country:

Europe (1.00)
Asia (0.94)
North America > United States > Texas > Brazos County > College Station (0.14)

Industry:

Transportation (0.48)
Health & Medicine > Epidemiology (0.48)
Information Technology (0.47)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Temporal Reasoning (0.34)

Add feedback