AITopics | Jin, Lisa

Collaborating Authors

Jin, Lisa

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PARQ: Piecewise-Affine Regularized Quantization

Jin, Lisa, Ma, Jianhao, Liu, Zechun, Gromov, Andrey, Defazio, Aaron, Xiao, Lin

arXiv.org Artificial IntelligenceMar-19-2025

Modern deep learning models exhibit exceptional vision and language processing capabilities, but come with excessive sizes and demands on memory and computing. Quantization is an effective approach for model compression, which can significantly reduce their memory footprint, computing cost, as well as latency for inference (e.g., Han et al., 2016; Sze et al., 2017). There are two main classes of quantization methods: post-training quantization (PTQ) and quantization-aware training (QAT). Both are widely adopted and receive extensive research--see the recent survey papers (Gholami et al., 2022; Fournarakis et al., 2022) and references therein. PTQ converts the weights of a pre-trained model directly into lower precision without repeating the training pipeline; it thus has less overhead and is relatively easy to apply Nagel et al. (2020); Cai et al. (2020); Chee et al. (2024). However, it is mainly limited to 4 or more bit regimes and can suffer steep performance drops with fewer bits Yao et al. (2022); Dettmers & Zettlemoyer (2023). This is especially the case for transformer-based models, which prove harder to quantize Bai et al. (2021); Qin et al. (2022) compared to convolutional architectures Martinez et al. (2019); Qin et al. (2020). On the other hand, QAT integrates quantization into pre-training and/or fine-tuning processes and can produce low-bit (especially binary) models with mild performance degradation (e.g.

machine learning, natural language, quantization, (18 more...)

arXiv.org Artificial Intelligence

2503.15748

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization

Liu, Zechun, Zhao, Changsheng, Huang, Hanxian, Chen, Sijia, Zhang, Jing, Zhao, Jiawei, Roy, Scott, Jin, Lisa, Xiong, Yunyang, Shi, Yangyang, Xiao, Lin, Tian, Yuandong, Soran, Bilge, Krishnamoorthi, Raghuraman, Blankevoort, Tijmen, Chandra, Vikas

arXiv.org Artificial IntelligenceFeb-4-2025

The optimal bit-width for achieving the best trade-off between quantized model size and accuracy has been a subject of ongoing debate. While some advocate for 4-bit quantization, others propose that 1.58-bit offers superior results. However, the lack of a cohesive framework for different bits has left such conclusions relatively tenuous. We present ParetoQ, the first unified framework that facilitates rigorous comparisons across 1-bit, 1.58-bit, 2-bit, 3-bit, and 4-bit quantization settings. Our findings reveal a notable learning transition between 2 and 3 bits: For 3-bits and above, the fine-tuned models stay close to their original pre-trained distributions, whereas for learning 2-bit networks or below, the representations change drastically. By optimizing training schemes and refining quantization functions, ParetoQ surpasses all previous methods tailored to specific bit widths. Remarkably, our ParetoQ ternary 600M-parameter model even outperforms the previous SoTA ternary 3B-parameter model in accuracy, using only one-fifth of the parameters. Extensive experimentation shows that ternary, 2-bit, and 3-bit quantization maintains comparable performance in the size-accuracy trade-off and generally exceeds 4-bit and binary quantization. Considering hardware constraints, 2-bit quantization offers promising potential for memory reduction and speedup.

large language model, machine learning, quantization, (20 more...)

arXiv.org Artificial Intelligence

2502.02631

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.68)

Add feedback

Hierarchical Context Tagging for Utterance Rewriting

Jin, Lisa, Song, Linfeng, Jin, Lifeng, Yu, Dong, Gildea, Daniel

arXiv.org Artificial IntelligenceAug-7-2022

Utterance rewriting aims to recover coreferences and omitted information from the latest turn of a multi-turn dialogue. Recently, methods that tag rather than linearly generate sequences have proven stronger in both in- and out-of-domain rewriting settings. This is due to a tagger's smaller search space as it can only copy tokens from the dialogue context. However, these methods may suffer from low coverage when phrases that must be added to a source utterance cannot be covered by a single context span. This can occur in languages like English that introduce tokens such as prepositions into the rewrite for grammaticality. We propose a hierarchical context tagger (HCT) that mitigates this issue by predicting slotted rules (e.g., "besides_") whose slots are later filled with context spans. HCT (i) tags the source string with token-level edit actions and slotted rules and (ii) fills in the resulting rule slots with spans from the dialogue context. This rule tagging allows HCT to add out-of-context tokens and multiple spans at once; we further cluster the rules to truncate the long tail of the rule distribution. Experiments on several benchmarks show that HCT can outperform state-of-the-art rewriting systems by ~2 BLEU points.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1609/aaai.v36i10.21331

2206.11218

Country: North America > United States (0.46)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.94)

Add feedback