AITopics | cheaply

Cheaply Estimating Inference Efficiency Metrics for Autoregressive Transformer Models

Neural Information Processing SystemsDec-26-2025, 20:39:40 GMT

Large language models (LLMs) are highly capable but also computationally expensive.

autoregressive transformer model, inference efficiency metric, name change, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.63)

Add feedback

FrugalML: How to use ML Prediction APIs more accurately and cheaply

Neural Information Processing SystemsDec-24-2025, 05:13:35 GMT

Offering prediction APIs for fee is a fast growing industry and is an important aspect of machine learning as a service. While many such services are available, the heterogeneity in their price and performance makes it challenging for users to decide which API or combination of APIs to use for their own data and budget. We take a first step towards addressing this challenge by proposing FrugalML, a principled framework that jointly learns the strength and weakness of each API on different data, and performs an efficient optimization to automatically identify the best sequential strategy to adaptively use the available APIs within a budget constraint. Our theoretical analysis shows that natural sparsity in the formulation can be leveraged to make FrugalML efficient. We conduct systematic experiments using ML APIs from Google, Microsoft, Amazon, IBM, Baidu and other providers for tasks including facial emotion recognition, sentiment analysis and speech recognition. Across various tasks, FrugalML can achieve up to 90% cost reduction while matching the accuracy of the best single API, or up to 5% better accuracy while matching the best API's cost.

frugalml, name change, use ml prediction api, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Cheaply Estimating Inference Efficiency Metrics for Autoregressive Transformer Models

Neural Information Processing SystemsAug-13-2025, 06:09:20 GMT

Large language models (LLMs) are highly capable but also computationally expensive.

autoregressive transformer model, large language model, natural language, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)

Add feedback

FrugalML: How to use ML Prediction APIs more accurately and cheaply

Neural Information Processing SystemsMay-27-2025, 03:48:26 GMT

Offering prediction APIs for fee is a fast growing industry and is an important aspect of machine learning as a service. While many such services are available, the heterogeneity in their price and performance makes it challenging for users to decide which API or combination of APIs to use for their own data and budget. We take a first step towards addressing this challenge by proposing FrugalML, a principled framework that jointly learns the strength and weakness of each API on different data, and performs an efficient optimization to automatically identify the best sequential strategy to adaptively use the available APIs within a budget constraint. Our theoretical analysis shows that natural sparsity in the formulation can be leveraged to make FrugalML efficient. We conduct systematic experiments using ML APIs from Google, Microsoft, Amazon, IBM, Baidu and other providers for tasks including facial emotion recognition, sentiment analysis and speech recognition.

api, artificial intelligence, machine learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.43)

Add feedback

Review for NeurIPS paper: FrugalML: How to use ML Prediction APIs more accurately and cheaply

Neural Information Processing SystemsJan-25-2025, 23:59:16 GMT

Additional Feedback: The paper covers the interesting topic of efficient API-reuse and, in general, presents a solid method with promising results. The result section is insightful, but am I missing how the conditional accuracies are estimated. From the paper I extract that you learn a model which performs instance-wise predictions, correct? How much left-out training data of the particular dataset (or other datasets) do you use for this? How easy/difficult is this task and do the results vary on the used datasets?

dataset, quality score, use ml prediction api, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

Review for NeurIPS paper: FrugalML: How to use ML Prediction APIs more accurately and cheaply

Neural Information Processing SystemsJan-25-2025, 23:59:09 GMT

Each API has some predictive accuracy and quality score (confidence) but also has an assigned cost, which we'd like to minimize. The authors give a method to accomplish this: a base API is chosen based on learnt conditional accuracies which might be overruled by an add-on API if the quality score is not sufficiently high. The optimal strategy is generated via solving a stated optimization problem. The paper presents some neat experiments with this method on computer vision and NLP datasets with real-world APIs. These appear promising in that the generated strategy reduces costs while still achieving high predictive accuracies.

frugalml, neurips paper, use ml prediction api, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.75)

Add feedback

Cheaply Estimating Inference Efficiency Metrics for Autoregressive Transformer Models

Neural Information Processing SystemsJan-19-2025, 23:12:30 GMT

Large language models (LLMs) are highly capable but also computationally expensive.

autoregressive transformer model, inference efficiency metric, performance contention, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)

Add feedback

FrugalML: How to use ML Prediction APIs more accurately and cheaply

Neural Information Processing SystemsOct-10-2024, 14:36:16 GMT

Offering prediction APIs for fee is a fast growing industry and is an important aspect of machine learning as a service. While many such services are available, the heterogeneity in their price and performance makes it challenging for users to decide which API or combination of APIs to use for their own data and budget. We take a first step towards addressing this challenge by proposing FrugalML, a principled framework that jointly learns the strength and weakness of each API on different data, and performs an efficient optimization to automatically identify the best sequential strategy to adaptively use the available APIs within a budget constraint. Our theoretical analysis shows that natural sparsity in the formulation can be leveraged to make FrugalML efficient. We conduct systematic experiments using ML APIs from Google, Microsoft, Amazon, IBM, Baidu and other providers for tasks including facial emotion recognition, sentiment analysis and speech recognition.

api, frugalml, use ml prediction api, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.43)

Add feedback

BadLlama: cheaply removing safety fine-tuning from Llama 2-Chat 13B

Gade, Pranav, Lermen, Simon, Rogers-Smith, Charlie, Ladish, Jeffrey

arXiv.org Artificial IntelligenceOct-31-2023

Llama 2-Chat is a collection of large language models that Meta developed and released to the public. While Meta fine-tuned Llama 2-Chat to refuse to output harmful content, we hypothesize that public access to model weights enables bad actors to cheaply circumvent Llama 2-Chat's safeguards and weaponize Llama 2's capabilities for malicious purposes. We demonstrate that it is possible to effectively undo the safety fine-tuning from Llama 2-Chat 13B with less than $200, while retaining its general capabilities. Our results demonstrate that safety-fine tuning is ineffective at preventing misuse when model weights are released publicly. Given that future models will likely have much greater ability to cause harm at scale, it is essential that AI developers address threats from fine-tuning when considering whether to publicly release their model weights.

badllama, cheaply, safety fine-tuning

arXiv.org Artificial Intelligence

2311.00117

Genre: Research Report > New Finding (0.53)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs

Narayanan, Deepak, Santhanam, Keshav, Henderson, Peter, Bommasani, Rishi, Lee, Tony, Liang, Percy

arXiv.org Artificial IntelligenceMay-3-2023

Large language models (LLMs) power many state-of-the-art systems in natural language processing. However, these models are extremely computationally expensive, even at inference time, raising the natural question: when is the extra cost of deploying a larger model worth the anticipated boost in capabilities? Better understanding this tradeoff fundamentally could benefit from an inference efficiency metric that is both (i) easily comparable across models from different providers, and (ii) representative of the true cost of running queries in an isolated performance environment. Unfortunately, access to LLMs today is largely restricted to black-box text generation APIs and raw runtimes measured through this interface do not satisfy these desiderata: model providers can apply various software and hardware optimizations orthogonal to the model, and models served on shared infrastructure are susceptible to performance contention. To circumvent these problems, we propose a new metric for comparing inference efficiency across models. This metric puts models on equal footing as though they were served (i) on uniform hardware and software, and (ii) without performance contention. We call this metric the \emph{idealized runtime}, and we propose a methodology to efficiently estimate this metric for autoregressive Transformer models. We also propose cost-aware variants that incorporate the number of accelerators needed to serve the model. Using these metrics, we compare ten state-of-the-art LLMs to provide the first analysis of inference efficiency-capability tradeoffs; we make several observations from this analysis, including the fact that the superior inference runtime performance of certain APIs is often a byproduct of optimizations within the API rather than the underlying model. Our methodology also facilitates the efficient comparison of different software and hardware stacks.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.0244

Country: Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filters

Collaborating Authors

cheaply

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Cheaply Estimating Inference Efficiency Metrics for Autoregressive Transformer Models

FrugalML: How to use ML Prediction APIs more accurately and cheaply

Cheaply Estimating Inference Efficiency Metrics for Autoregressive Transformer Models

FrugalML: How to use ML Prediction APIs more accurately and cheaply

Review for NeurIPS paper: FrugalML: How to use ML Prediction APIs more accurately and cheaply

Review for NeurIPS paper: FrugalML: How to use ML Prediction APIs more accurately and cheaply

Cheaply Estimating Inference Efficiency Metrics for Autoregressive Transformer Models

FrugalML: How to use ML Prediction APIs more accurately and cheaply

BadLlama: cheaply removing safety fine-tuning from Llama 2-Chat 13B

Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs