vs Standard Experimental Setup Details

Apr-25-2026, 18:36:29 GMT–Neural Information Processing Systems

A.1 Hyperparameters for QLORA We do a hyperparameter search for LoRA over the following variables: LoRA dropout { 0.0, 0.05, 0.1}, LoRA r { 8, 16, 32, 64, 128, 256}, LoRA layers {key+query, all attention layers, all FFN layers, all layers, attention + FFN output layers}. We keep LoRA α fixed and search the learning rate, since LoRA α is always proportional to the learning rate. We find that LoRA dropout 0.05 is useful for small models (7B, 13B), but not for larger models (33B, 65B). Each dot represents a combination of hyperparameters and for each LoRA r we run 3 random seed with each hyperparameter combination. The performance of specific LoRA r values appears to be independent of other hyperparameters.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Apr-25-2026, 18:36:29 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.36)

Duplicate Docs Excel Report

Title
A QLoRA vs Standard Finetuning Experimental Setup Details A.1 Hyperparameters for QL

Similar Docs Excel Report more

Title	Similarity	Source
None found