fsl
the tight space constraint, we have done our best to address a majority of each reviewer's questions / comments
We thank all the reviewers for their diligence, appreciation of our work, and valuable comments / suggestions. ACM TOG 2019.) that similarly go from lower to higher number of parameters, progressively. "Unsupervised visual representation learning by context prediction", ICCV 2015, which proposed a SSL task The text suggests..) Y es, this is a typo that distorts the meaning. The blue box in Figure 1 just maps the point's indices to balls they are part of, to further compute the ball vectors. An ablation study can certainly be added.
the tight space constraint, we have done our best to address a majority of each reviewer's questions / comments
We thank all the reviewers for their diligence, appreciation of our work, and valuable comments / suggestions. ACM TOG 2019.) that similarly go from lower to higher number of parameters, progressively. "Unsupervised visual representation learning by context prediction", ICCV 2015, which proposed a SSL task The text suggests..) Y es, this is a typo that distorts the meaning. The blue box in Figure 1 just maps the point's indices to balls they are part of, to further compute the ball vectors. An ablation study can certainly be added.
- Asia > Singapore (0.04)
- North America > United States > California (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Unveiling the Role of Learning Rate Schedules via Functional Scaling Laws
Li, Binghui, Chen, Fengling, Huang, Zixun, Wang, Lean, Wu, Lei
Scaling laws have played a cornerstone role in guiding the training of large language models (LLMs). However, most existing works on scaling laws primarily focus on the final-step loss, overlooking the loss dynamics during the training process and, crucially, the impact of learning rate schedule (LRS). In this paper, we aim to bridge this gap by studying a teacher-student kernel regression setup trained via online stochastic gradient descent (SGD). Leveraging a novel intrinsic time viewpoint and stochastic differential equation (SDE) modeling of SGD, we introduce the Functional Scaling Law (FSL), which characterizes the evolution of population risk during the training process for general LRSs. Remarkably, the impact of the LRSs is captured through an explicit convolution-type functional term, making their effects fully tractable. To illustrate the utility of FSL, we analyze three widely used LRSs -- constant, exponential decay, and warmup-stable-decay (WSD) -- under both data-limited and compute-limited regimes. We provide theoretical justification for widely adopted empirical practices in LLMs pre-training such as (i) higher-capacity models are more data- and compute-efficient; (ii) learning rate decay can improve training efficiency; (iii) WSD-like schedules can outperform direct-decay schedules. Lastly, we explore the practical relevance of FSL as a surrogate model for fitting, predicting and optimizing the loss curves in LLM pre-training, with experiments conducted across model sizes ranging from 0.1B to 1B parameters. We hope our FSL framework can deepen the understanding of LLM pre-training dynamics and provide insights for improving large-scale model training.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
. We thank R1 for pointing some expositions issues and the proposed
We thank reviewers for detailed and helpful reviews. Table 1 shows the results. If we understand correctly, R2's main concern is that the word embeddings of We believe that it would hardly happen. The reasons are as follows. Second, we can easily assume a FSL scenario in which we have access to the labels of the test set.
An Analysis of LLM Fine-Tuning and Few-Shot Learning for Flaky Test Detection and Classification
More, Riddhi, Bradbury, Jeremy S.
Flaky tests exhibit non-deterministic behavior during execution and they may pass or fail without any changes to the program under test. Detecting and classifying these flaky tests is crucial for maintaining the robustness of automated test suites and ensuring the overall reliability and confidence in the testing. However, flaky test detection and classification is challenging due to the variability in test behavior, which can depend on environmental conditions and subtle code interactions. Large Language Models (LLMs) offer promising approaches to address this challenge, with fine-tuning and few-shot learning (FSL) emerging as viable techniques. With enough data fine-tuning a pre-trained LLM can achieve high accuracy, making it suitable for organizations with more resources. Alternatively, we introduce FlakyXbert, an FSL approach that employs a Siamese network architecture to train efficiently with limited data. To understand the performance and cost differences between these two methods, we compare fine-tuning on larger datasets with FSL in scenarios restricted by smaller datasets. Our evaluation involves two existing flaky test datasets, FlakyCat and IDoFT. Our results suggest that while fine-tuning can achieve high accuracy, FSL provides a cost-effective approach with competitive accuracy, which is especially beneficial for organizations or projects with limited historical data available for training. These findings underscore the viability of both fine-tuning and FSL in flaky test detection and classification with each suited to different organizational needs and resource availability.
- North America > Canada > Ontario > Durham Region > Oshawa (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York > New York County > New York City (0.04)