Guldogan, Ozgur
Multi-Bin Batching for Increasing LLM Inference Throughput
Guldogan, Ozgur, Kunde, Jackson, Lee, Kangwook, Pedarsani, Ramtin
Large Language Model (LLM) inference systems are becoming increasingly popular due to their various abilities, such as text generation (Li et al., 2024), coding assistance (Chen et al., 2021), and question answering (Jiang et al., 2021). As the demand for LLM inference systems grows, so does the need to optimize their efficiency. Several techniques have been proposed to improve the efficiency of LLM inference systems, and batched inference (Sheng et al., 2023; Kwon et al., 2023; Jin et al., 2023) is one of the most promising techniques among them. With batched inference, multiple requests are processed simultaneously, using the underlying hardware's parallelism to improve throughput. Figure 1(a) shows the measured throughput of the Phi-3.5 Mini Instruct model (Abdin et al., 2024) for various batch sizes on an NVIDIA A100 80G GPU. Throughput is calculated as the number of total tokens generated across all requests divided by time. However, batched inference comes with some critical drawbacks. The execution time of each request depends on the number of tokens generated, which varies across requests. In standard batched inference systems, a computing unit remains locked until all requests in the batch are completed, leading to resource underutilization when requests within a batch have widely differing execution times.
Long-Term Fairness in Sequential Multi-Agent Selection with Positive Reinforcement
Puranik, Bhagyashree, Guldogan, Ozgur, Madhow, Upamanyu, Pedarsani, Ramtin
While much of the rapidly growing literature on fair decision-making focuses on metrics for one-shot decisions, recent work has raised the intriguing possibility of designing sequential decision-making to positively impact long-term social fairness. In selection processes such as college admissions or hiring, biasing slightly towards applicants from under-represented groups is hypothesized to provide positive feedback that increases the pool of under-represented applicants in future selection rounds, thus enhancing fairness in the long term. In this paper, we examine this hypothesis and its consequences in a setting in which multiple agents are selecting from a common pool of applicants. We propose the Multi-agent Fair-Greedy policy, that balances greedy score maximization and fairness. Under this policy, we prove that the resource pool and the admissions converge to a long-term fairness target set by the agents when the score distributions across the groups in the population are identical. We provide empirical evidence of existence of equilibria under non-identical score distributions through synthetic and adapted real-world datasets. We then sound a cautionary note for more complex applicant pool evolution models, under which uncoordinated behavior by the agents can cause negative reinforcement, leading to a reduction in the fraction of under-represented applicants. Our results indicate that, while positive reinforcement is a promising mechanism for long-term fairness, policies must be designed carefully to be robust to variations in the evolution model, with a number of open issues that remain to be explored by algorithm designers, social scientists, and policymakers.
Equal Improvability: A New Fairness Notion Considering the Long-term Impact
Guldogan, Ozgur, Zeng, Yuchen, Sohn, Jy-yong, Pedarsani, Ramtin, Lee, Kangwook
Devising a fair classifier that does not discriminate against different groups is an important problem in machine learning. Recently, effort-based fairness notions are getting attention, which considers the scenarios of each individual making effort to improve its feature over time. Such scenarios happen in the real world, e.g., college admission and credit loaning, where each rejected sample makes effort to change its features to get accepted afterward. In this paper, we propose a new effortbased fairness notion called Equal Improvability (EI), which equalizes the potential acceptance rate of the rejected samples across different groups assuming a bounded level of effort will be spent by each rejected sample. We also propose and study three different approaches for finding a classifier that satisfies the EI requirement. Through experiments on both synthetic and real datasets, we demonstrate that the proposed EI-regularized algorithms encourage us to find a fair classifier in terms of EI. Additionally, we ran experiments on dynamic scenarios which highlight the advantages of our EI metric in equalizing the distribution of features across different groups, after the rejected samples make some effort to improve. Finally, we provide mathematical analyses of several aspects of EI: the relationship between EI and existing fairness notions, and the effect of EI in dynamic scenarios. Over the past decade, machine learning has been used in a wide variety of applications. However, these machine learning approaches are observed to be unfair to individuals having different ethnicity, race, and gender. As the implicit bias in artificial intelligence tools raised concerns over potential discrimination and equity issues, various researchers suggested defining fairness notions and developing classifiers that achieve fairness. One popular fairness notion is demographic parity (DP), which requires the decision-making system to provide output such that the groups are equally likely to be assigned to the desired prediction classes, e.g., acceptance in the admission procedure. DP and related fairness notions are largely employed to mitigate the bias in many realistic problems such as recruitment, credit lending, and university admissions (Zafar et al., 2017b; Hardt et al., 2016; Dwork et al., 2012; Zafar et al., 2017a). However, most of the existing fairness notions only focus on immediate fairness, without taking potential follow-up inequity risk into consideration.