AITopics | Zhao, Changsheng

Collaborating Authors

Zhao, Changsheng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization

Liu, Zechun, Zhao, Changsheng, Huang, Hanxian, Chen, Sijia, Zhang, Jing, Zhao, Jiawei, Roy, Scott, Jin, Lisa, Xiong, Yunyang, Shi, Yangyang, Xiao, Lin, Tian, Yuandong, Soran, Bilge, Krishnamoorthi, Raghuraman, Blankevoort, Tijmen, Chandra, Vikas

arXiv.org Artificial IntelligenceFeb-4-2025

The optimal bit-width for achieving the best trade-off between quantized model size and accuracy has been a subject of ongoing debate. While some advocate for 4-bit quantization, others propose that 1.58-bit offers superior results. However, the lack of a cohesive framework for different bits has left such conclusions relatively tenuous. We present ParetoQ, the first unified framework that facilitates rigorous comparisons across 1-bit, 1.58-bit, 2-bit, 3-bit, and 4-bit quantization settings. Our findings reveal a notable learning transition between 2 and 3 bits: For 3-bits and above, the fine-tuned models stay close to their original pre-trained distributions, whereas for learning 2-bit networks or below, the representations change drastically. By optimizing training schemes and refining quantization functions, ParetoQ surpasses all previous methods tailored to specific bit widths. Remarkably, our ParetoQ ternary 600M-parameter model even outperforms the previous SoTA ternary 3B-parameter model in accuracy, using only one-fifth of the parameters. Extensive experimentation shows that ternary, 2-bit, and 3-bit quantization maintains comparable performance in the size-accuracy trade-off and generally exceeds 4-bit and binary quantization. Considering hardware constraints, 2-bit quantization offers promising potential for memory reduction and speedup.

large language model, machine learning, quantization, (20 more...)

arXiv.org Artificial Intelligence

2502.02631

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.68)

Add feedback

Agent-as-a-Judge: Evaluate Agents with Agents

Zhuge, Mingchen, Zhao, Changsheng, Ashley, Dylan, Wang, Wenyi, Khizbullin, Dmitrii, Xiong, Yunyang, Liu, Zechun, Chang, Ernie, Krishnamoorthi, Raghuraman, Tian, Yuandong, Shi, Yangyang, Chandra, Vikas, Schmidhuber, Jürgen

arXiv.org Artificial IntelligenceOct-16-2024

Recent years have seen multimodal agentic systems move from occasionally being able to solve small toy problems to being regularly deployed for challenging real-world problems (the dream of most AI research). Yet, the current evaluation methods and the available benchmarks for agentic systems are struggling to keep up with these rapid advances, dramatically slowing true progress. We believe that the current issue with evaluating agentic systems stems from the lack of feedback during the intermediate task-solving stages for these nontraditional systems. Agentic systems think more like humans, often act step-by-step (Wooldridge, 1999) and often host very human-like symbolic communications internally to solve problems (Zhuge et al., 2023). And thus agentic systems should be evaluated like a human, with rich evaluative feedback which looks at the full thought and action trajectory; evaluating an agentic system in the traditional way is like evaluating a student using multiple-choice testing--a comparatively unreliable estimator (Park, 2010). For example, while SWE-Bench (Yang et al., 2024a) is widespread, its evaluation method, which relies solely on the final resolve rate for long-term automated repair tasks, does not effectively pinpoint what is happening within agentic systems that affects the resolve rate. On the other hand, performing a better evaluation with a human is prohibitively expensive. We instead propose that agentic systems should be used to evaluate agentic systems. Inspired by LLM-as-a-Judge (Zheng et al., 2024; Fu et al., 2023; Chen et al., 2024b), which uses LLMs to evaluate LLMs, we call this framework Agent-as-a-Judge, of which it is

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2410.10934

Country: North America > United States (0.28)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)

Industry:

Information Technology (0.67)
Education (0.48)
Banking & Finance > Economy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Scaling Parameter-Constrained Language Models with Quality Data

Chang, Ernie, Paltenghi, Matteo, Li, Yang, Lin, Pin-Jie, Zhao, Changsheng, Huber, Patrick, Liu, Zechun, Rabatin, Rastislav, Shi, Yangyang, Chandra, Vikas

arXiv.org Artificial IntelligenceOct-3-2024

Scaling laws in language modeling traditionally quantify training loss as a function of dataset size and model parameters, providing compute-optimal estimates but often neglecting the impact of data quality on model generalization. In this paper, we extend the conventional understanding of scaling law by offering a microscopic view of data quality within the original formulation -- effective training tokens -- which we posit to be a critical determinant of performance for parameter-constrained language models. Specifically, we formulate the proposed term of effective training tokens to be a combination of two readily-computed indicators of text: (i) text diversity and (ii) syntheticity as measured by a teacher model. We pretrained over $200$ models of 25M to 1.5B parameters on a diverse set of sampled, synthetic data, and estimated the constants that relate text quality, model size, training tokens, and eight reasoning task accuracy scores. We demonstrated the estimated constants yield +0.83 Pearson correlation with true accuracies, and analyzed it in scenarios involving widely-used data techniques such as data sampling and synthesis which aim to improve data quality.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.03083

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.68)
(3 more...)

Add feedback

Target-Aware Language Modeling via Granular Data Sampling

Chang, Ernie, Lin, Pin-Jie, Li, Yang, Zhao, Changsheng, Kim, Daeil, Rabatin, Rastislav, Liu, Zechun, Shi, Yangyang, Chandra, Vikas

arXiv.org Artificial IntelligenceSep-23-2024

Language model pretraining generally targets a broad range of use cases and incorporates data from diverse sources. However, there are instances where we desire a model that excels in specific areas without markedly compromising performance in other areas. A cost-effective and straightforward approach is sampling with low-dimensional data features, which allows to select large-scale pretraining data for domain-specific use cases. In this work, we revisit importance sampling with n-gram features consisting of multi-granular tokens, which strikes a good balance between sentence compression and representation capabilities. We observed the sampled data to have a high correlation with the target downstream task performance while preserving its effectiveness on other tasks. This leads to the proposed data sampling paradigm where language models can be pretrained more efficiently on selected documents. On eight benchmarks we demonstrate with $\sim$1% of the data, pretrained models perform on par with the full RefinedWeb data and outperform randomly selected samples for model sizes ranging from 125M to 1.5B.

artificial intelligence, chatbot, natural language, (18 more...)

arXiv.org Artificial Intelligence

2409.14705

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.51)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.46)

Add feedback

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Liu, Zechun, Zhao, Changsheng, Iandola, Forrest, Lai, Chen, Tian, Yuandong, Fedorov, Igor, Xiong, Yunyang, Chang, Ernie, Shi, Yangyang, Krishnamoorthi, Raghuraman, Lai, Liangzhen, Chandra, Vikas

arXiv.org Artificial IntelligenceJun-26-2024

This paper addresses the growing need for efficient large language models (LLMs) on mobile devices, driven by increasing cloud costs and latency concerns. We focus on designing top-quality LLMs with fewer than a billion parameters, a practical choice for mobile deployment. Contrary to prevailing belief emphasizing the pivotal role of data and parameter quantity in determining model quality, our investigation underscores the significance of model architecture for sub-billion scale LLMs. Leveraging deep and thin architectures, coupled with embedding sharing and grouped-query attention mechanisms, we establish a strong baseline network denoted as MobileLLM, which attains a remarkable 2.7%/4.3% accuracy boost over preceding 125M/350M state-of-the-art models. Additionally, we propose an immediate block-wise weight-sharing approach with no increase in model size and only marginal latency overhead. The resultant models, denoted as MobileLLM-LS, demonstrate a further accuracy enhancement of 0.7%/0.8% than MobileLLM 125M/350M. Moreover, MobileLLM model family shows significant improvements compared to previous sub-billion models on chat benchmarks, and demonstrates close correctness to LLaMA-v2 7B in API calling tasks, highlighting the capability of small models for common on-device use cases.

arxiv preprint arxiv, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2402.14905

Country:

North America > United States > Missouri > Jackson County > Kansas City (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (1.00)
Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)
Education > Educational Setting > K-12 Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SpinQuant: LLM quantization with learned rotations

Liu, Zechun, Zhao, Changsheng, Fedorov, Igor, Soran, Bilge, Choudhary, Dhruv, Krishnamoorthi, Raghuraman, Chandra, Vikas, Tian, Yuandong, Blankevoort, Tijmen

arXiv.org Artificial IntelligenceMay-28-2024

Post-training quantization (PTQ) techniques applied to weights, activations, and the KV cache greatly reduce memory usage, latency, and power consumption of Large Language Models (LLMs), but may lead to large quantization errors when outliers are present. Recent findings suggest that rotating activation or weight matrices helps remove outliers and benefits quantization. In this work, we identify a collection of applicable rotation parameterizations that lead to identical outputs in full-precision Transformer architectures, and find that some random rotations lead to much better quantization than others, with an up to 13 points difference in downstream zero-shot reasoning performance. As a result, we propose SpinQuant that optimizes (or learns) the rotation matrices with Cayley optimization on a small validation set. With 4-bit quantization of weight, activation, and KV-cache, SpinQuant narrows the accuracy gap on zero-shot reasoning tasks with full precision to merely 2.9 points on the LLaMA-2 7B model, surpassing LLM-QAT by 19.1 points and SmoothQuant by 25.0 points. SpinQuant also outperforms concurrent work QuaRot, which applies random rotations to remove outliers. In particular, for LLaMA-2 7B/LLaMA-3 8B models that are hard to quantize, SpinQuant reduces the gap to full precision by 30.2%/34.1% relative to QuaRot.

large language model, machine learning, quantization, (19 more...)

arXiv.org Artificial Intelligence

2405.16406

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications

Li, Yang, Zhao, Changsheng, Lee, Hyungtak, Chang, Ernie, Shi, Yangyang, Chandra, Vikas

arXiv.org Artificial IntelligenceMay-24-2024

Large language models (LLMs) significantly enhance the performance of various applications, but they are computationally intensive and energy-demanding. This makes it challenging to deploy them on devices with limited resources, such as personal computers and mobile/wearable devices, and results in substantial inference costs in resource-rich environments like cloud servers. To extend the use of LLMs, we introduce a low-rank decomposition approach to effectively compress these models, tailored to the requirements of specific applications. We observe that LLMs pretrained on general datasets contain many redundant components not needed for particular applications. Our method focuses on identifying and removing these redundant parts, retaining only the necessary elements for the target applications. Specifically, we represent the weight matrices of LLMs as a linear combination of base components. We then prune the irrelevant bases and enhance the model with new bases beneficial for specific applications. Deep compression results on the Llama 2-7b and -13B models, conducted on target applications including mathematical reasoning and code generation, show that our method significantly reduces model size while maintaining comparable accuracy to state-of-the-art low-rank compression techniques.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2405.15877

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Add feedback

Not All Weights Are Created Equal: Enhancing Energy Efficiency in On-Device Streaming Speech Recognition

Li, Yang, Shangguan, Yuan, Wang, Yuhao, Lai, Liangzhen, Chang, Ernie, Zhao, Changsheng, Shi, Yangyang, Chandra, Vikas

arXiv.org Artificial IntelligenceFeb-20-2024

Power consumption plays an important role in on-device streaming speech recognition, as it has a direct impact on the user experience. This study delves into how weight parameters in speech recognition models influence the overall power consumption of these models. We discovered that the impact of weight parameters on power consumption varies, influenced by factors including how often they are invoked and their placement in memory. Armed with this insight, we developed design guidelines aimed at optimizing on-device speech recognition models. These guidelines focus on minimizing power use without substantially affecting accuracy. Our method, which employs targeted compression based on the varying sensitivities of weight parameters, demonstrates superior performance compared to state-of-the-art compression methods. It achieves a reduction in energy usage of up to 47% while maintaining similar model accuracy and improving the real-time factor.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2402.13076

Genre: Research Report > New Finding (0.68)

Industry: Energy (0.47)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

On The Open Prompt Challenge In Conditional Audio Generation

Chang, Ernie, Srinivasan, Sidd, Luthra, Mahi, Lin, Pin-Jie, Nagaraja, Varun, Iandola, Forrest, Liu, Zechun, Ni, Zhaoheng, Zhao, Changsheng, Shi, Yangyang, Chandra, Vikas

arXiv.org Artificial IntelligenceNov-1-2023

Text-to-audio generation (TTA) produces audio from a text description, learning from pairs of audio samples and hand-annotated text. However, commercializing audio generation is challenging as user-input prompts are often under-specified when compared to text descriptions used to train TTA models. In this work, we treat TTA models as a ``blackbox'' and address the user prompt challenge with two key insights: (1) User prompts are generally under-specified, leading to a large alignment gap between user prompts and training prompts. (2) There is a distribution of audio descriptions for which TTA models are better at generating higher quality audio, which we refer to as ``audionese''. To this end, we rewrite prompts with instruction-tuned models and propose utilizing text-audio alignment as feedback signals via margin ranking learning for audio improvements. On both objective and subjective human evaluations, we observed marked improvements in both text-audio alignment and music audio quality.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2311.00897

Genre: Research Report (0.50)

Industry:

Media > Music (0.47)
Leisure & Entertainment (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)

Add feedback

Revisiting Sample Size Determination in Natural Language Understanding

Chang, Ernie, Rashid, Muhammad Hassan, Lin, Pin-Jie, Zhao, Changsheng, Demberg, Vera, Shi, Yangyang, Chandra, Vikas

arXiv.org Artificial IntelligenceJul-1-2023

Knowing exactly how many data points need to be labeled to achieve a certain model performance is a hugely beneficial step towards reducing the overall budgets for annotation. It pertains to both active learning and traditional data annotation, and is particularly beneficial for low resource scenarios. Nevertheless, it remains a largely under-explored area of research in NLP. We therefore explored various techniques for estimating the training sample size necessary to achieve a targeted performance value. We derived a simple yet effective approach to predict the maximum achievable model performance based on small amount of training samples - which serves as an early indicator during data annotation for data quality and sample size determination. We performed ablation studies on four language understanding tasks, and showed that the proposed approach allows us to forecast model performance within a small margin of mean absolute error (~ 0.9%) with only 10% data.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2307.00374

Country: Europe (0.47)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.47)

Add feedback