Goto

Collaborating Authors

 frugal


FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training

arXiv.org Artificial Intelligence

With the increase in the number of parameters in large language models, the process of pre-training and fine-tuning increasingly demands larger volumes of GPU memory. A significant portion of this memory is typically consumed by the optimizer state. To overcome this challenge, recent approaches such as low-rank adaptation (LoRA (Hu et al., 2021)), low-rank gradient projection (GaLore (Zhao et al., 2024a)), and blockwise optimization (BAdam (Luo et al., 2024)) have been proposed. However, in all these algorithms, the effective rank of the weight updates remains low-rank, which can lead to a substantial loss of information from the gradient. This loss can be critically important, especially during the pre-training stage. In this paper, we introduce FRUGAL (Full-Rank Updates with GrAdient spLitting), a new memory-efficient optimization framework. FRUGAL leverages gradient splitting to perform low-dimensional updates using advanced algorithms (such as Adam), while updates along the remaining directions are executed via statefree methods like SGD or signSGD (Bernstein et al., 2018). Our framework can be integrated with various low-rank update selection techniques, including GaLore and BAdam. We provide theoretical convergence guarantees for our framework when using SGDM for low-dimensional updates and SGD for state-free updates. Additionally, our method consistently outperforms concurrent approaches across various fixed memory budgets, achieving state-of-the-art results in pre-training and fine-tuning tasks while balancing memory efficiency and performance metrics. In recent years, Large Language Models (LLMs) such as GPT (OpenAI, 2023) and LLaMA-3 Dubey et al. (2024) have demonstrated remarkable performance across various disciplines (Brown, 2020; Yang et al., 2024; Romera-Paredes et al., 2024). However, a critical factor in achieving these results is the size of these models (Hoffmann et al., 2022). A larger number of parameters not only increases computational cost but also significantly raises memory requirements. For instance, training an 8 billion parameter LLaMA model in a 16-bit format necessitates each parameter to occupy 2 bytes, resulting in 16GB for storing the parameters and an additional 16GB for gradients.


Less, but Stronger: On the Value of Strong Heuristics in Semi-supervised Learning for Software Analytics

arXiv.org Artificial Intelligence

In many domains, there are many examples and far fewer labels for those examples; e.g. we may have access to millions of lines of source code, but access to only a handful of warnings about that code. In those domains, semi-supervised learners (SSL) can extrapolate labels from a small number of examples to the rest of the data. Standard SSL algorithms use ``weak'' knowledge (i.e. those not based on specific SE knowledge) such as (e.g.) co-train two learners and use good labels from one to train the other. Another approach of SSL in software analytics is potentially use ``strong'' knowledge that use SE knowledge. For example, an often-used heuristic in SE is that unusually large artifacts contain undesired properties (e.g. more bugs). This paper argues that such ``strong'' algorithms perform better than those standard, weaker, SSL algorithms. We show this by learning models from labels generated using weak SSL or our ``stronger'' FRUGAL algorithm. In four domains (distinguishing security-related bug reports; mitigating bias in decision-making; predicting issue close time; and (reducing false alarms in static code warnings), FRUGAL required only 2.5% of the data to be labeled yet out-performed standard semi-supervised learners that relied on (e.g.) some domain-independent graph theory concepts. Hence, for future work, we strongly recommend the use of strong heuristics for semi-supervised learning for SE applications. To better support other researchers, our scripts and data are on-line at https://github.com/HuyTu7/FRUGAL.


Frugal Bribery in Voting

arXiv.org Artificial Intelligence

Bribery in elections is an important problem in computational social choice theory. However, bribery with money is often illegal in elections. Motivated by this, we introduce the notion of frugal bribery and formulate two new pertinent computational problems which we call Frugal-bribery and Frugal- $bribery to capture bribery without money in elections. In the proposed model, the briber is frugal in nature and this is captured by her inability to bribe votes of a certain kind, namely, non-vulnerable votes. In the Frugal-bribery problem, the goal is to make a certain candidate win the election by changing only vulnerable votes. In the Frugal-{dollar}bribery problem, the vulnerable votes have prices and the goal is to make a certain candidate win the election by changing only vulnerable votes, subject to a budget constraint of the briber. We further formulate two natural variants of the Frugal-{dollar}bribery problem namely Uniform-frugal-{dollar}bribery and Nonuniform-frugal-{dollar}bribery where the prices of the vulnerable votes are, respectively, all the same or different. We study the computational complexity of the above problems for unweighted and weighted elections for several commonly used voting rules. We observe that, even if we have only a small number of candidates, the problems are intractable for all voting rules studied here for weighted elections, with the sole exception of the Frugal-bribery problem for the plurality voting rule. In contrast, we have polynomial time algorithms for the Frugal-bribery problem for plurality, veto, k-approval, k-veto, and plurality with runoff voting rules for unweighted elections. However, the Frugal-{dollar}bribery problem is intractable for all the voting rules studied here barring the plurality and the veto voting rules for unweighted elections.