AITopics | cost reduction

Collaborating Authors

cost reduction

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Best Arm Identification with LLM Judges and Limited Human

Ao, Ruicheng, Chen, Hongyu, Gao, Siyang, Li, Hanwei, Simchi-Levi, David

arXiv.org Machine LearningJan-30-2026

We study fixed-confidence best-arm identification (BAI) where a cheap but potentially biased proxy (e.g., LLM judge) is available for every sample, while an expensive ground-truth label can only be acquired selectively when using a human for auditing. Unlike classical multi-fidelity BAI, the proxy is biased (arm- and context-dependent) and ground truth is selectively observed. Consequently, standard multi-fidelity methods can mis-select the best arm, and uniform auditing, though accurate, wastes scarce resources and is inefficient. We prove that without bias correction and propensity adjustment, mis-selection probability may not vanish (even with unlimited proxy data). We then develop an estimator for the mean of each arm that combines proxy scores with inverse-propensity-weighted residuals and form anytime-valid confidence sequences for that estimator. Based on the estimator and confidence sequence, we propose an algorithm that adaptively selects and audits arms. The algorithm concentrates audits on unreliable contexts and close arms and we prove that a plug-in Neyman rule achieves near-oracle audit efficiency. Numerical experiments confirm the theoretical guarantees and demonstrate the superior empirical performance of the proposed algorithm.

large language model, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

2601.21471

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > China > Hong Kong (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.85)

Add feedback

Energy Management for Renewable-Colocated Artificial Intelligence Data Centers

Li, Siying, Tong, Lang, Mount, Timothy D.

arXiv.org Artificial IntelligenceSep-25-2025

Abstract--We develop an energy management system (EMS) for artificial intelligence (AI) data centers with colocate d renewable generation. Under a cost-minimizing framework, th e EMS of renewable-colocated data center (RCDC) co-optimize s AI workload scheduling, on-site renewable utilization, an d electricity market participation. Within both wholesale and re tail market participation models, the economic benefit of the RCD C operation is maximized. Empirical evaluations using real-world traces of electricity prices, data center power consumptio n, and renewable generation demonstrate significant electric ity cost reduction from renewable and AI data center colocations. Index T erms --AI data center power system, energy management system, flexible demand, large load colocation, worklo ad scheduling.

artificial intelligence, cloud computing, cost reduction, (19 more...)

arXiv.org Artificial Intelligence

2507.08011

Country: North America > United States > New York > Tompkins County > Ithaca (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology > Services (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Energy > Power Industry > Utilities (0.72)

Technology:

Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Add feedback

AI Agents with Human-Like Collaborative Tools: Adaptive Strategies for Enhanced Problem-Solving

Reed, Harper, Sugimura, Michael, Zangari, Angelo

arXiv.org Artificial IntelligenceSep-18-2025

We investigate whether giving LLM agents the collaborative tools and autonomy that humans naturally use for problem solving can improve their performance. We equip Claude Code agents with MCP-based social media and journaling tools and allow them to use these tools as they see fit. Across 34 Aider Polyglot Python programming challenges, collaborative tools substantially improve performance on the hardest problems, delivering 15-40% lower cost, 12-27% fewer turns, and 12-38% faster completion than baseline agents. Effects on the full challenge set are mixed, suggesting these tools act as performance enhancers when additional reasoning scaffolding is most needed. Surprisingly, Different models naturally adopted distinct collaborative strategies without explicit instruction. Sonnet 3.7 engaged broadly across tools and benefited from articulation-based cognitive scaffolding. Sonnet 4 showed selective adoption, leaning on journal-based semantic search when problems were genuinely difficult. This mirrors how human developers adjust collaboration based on expertise and task complexity. Behavioral analysis shows agents prefer writing over reading by about 2-9x, indicating that structured articulation drives much of the improvement rather than information access alone. Overall, AI agents can systematically benefit from human-inspired collaboration tools at the edge of their capabilities, pointing to adaptive collaborative interfaces as reasoning enhancers rather than universal efficiency boosts.

artificial intelligence, information management, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.13547

Country: North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(2 more...)

Add feedback

BEST-Route: Adaptive LLM Routing with Test-Time Optimal Compute

Ding, Dujian, Mallick, Ankur, Zhang, Shaokun, Wang, Chi, Madrigal, Daniel, Garcia, Mirian Del Carmen Hipolito, Xia, Menglin, Lakshmanan, Laks V. S., Wu, Qingyun, Rühle, Victor

arXiv.org Artificial IntelligenceJul-1-2025

Large language models (LLMs) are powerful tools but are often expensive to deploy at scale. LLM query routing mitigates this by dynamically assigning queries to models of varying cost and quality to obtain a desired trade-off. Prior query routing approaches generate only one response from the selected model and a single response from a small (inexpensive) model was often not good enough to beat a response from a large (expensive) model due to which they end up overusing the large model and missing out on potential cost savings. However, it is well known that for small models, generating multiple responses and selecting the best can enhance quality while remaining cheaper than a single large-model response. We leverage this idea to propose BEST-Route, a novel routing framework that chooses a model and the number of responses to sample from it based on query difficulty and the quality thresholds. Experiments on real-world datasets demonstrate that our method reduces costs by up to 60% with less than 1% performance drop.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2506.22716

Country:

North America > United States > Pennsylvania (0.04)
North America > Canada > British Columbia (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Ride-pool Assignment Algorithms: Modern Implementation and Swapping Heuristics

Zalesak, Matthew, Hu, Hins, Samaranayake, Samitha

arXiv.org Artificial IntelligenceApr-16-2025

On-demand ride-pooling has emerged as a popular urban transportation solution, addressing the efficiency limitations of traditional ride-hailing services by grouping multiple riding requests with spatiotemporal proximity into a single vehicle. Although numerous algorithms have been developed for the Ride-pool Assignment Problem (RAP) -- a core component of ride-pooling systems, there is a lack of open-source implementations, making it difficult to benchmark these algorithms on a common dataset and objective. In this paper, we present the implementation details of a ride-pool simulator that encompasses several key ride-pool assignment algorithms, along with associated components such as vehicle routing and rebalancing. We also open-source a highly optimized and modular C++ codebase, designed to facilitate the extension of new algorithms and features. Additionally, we introduce a family of swapping-based local-search heuristics to enhance existing ride-pool assignment algorithms, achieving a better balance between performance and computational efficiency. Extensive experiments on a large-scale, real-world dataset from Manhattan, NYC reveal that while all selected algorithms perform comparably, the newly proposed Multi-Round Linear Assignment with Cyclic Exchange (LA-MR-CE) algorithm achieves a state-of-the-art service rate with significantly reduced computational time. Furthermore, an in-depth analysis suggests that a performance barrier exists for all myopic ride-pool assignment algorithms due to the system's capacity bottleneck, and incorporating future information could be key to overcoming this limitation.

algorithm, artificial intelligence, machine learning, (11 more...)

arXiv.org Artificial Intelligence

2504.10649

Country: North America > United States (0.28)

Genre: Research Report (0.81)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Stratified Topological Autonomy for Long-Range Coordination (STALC)

Dimmig, Cora A., Goertz, Adam, Polevoy, Adam, Gonzales, Mark, Wolfe, Kevin C., Woosley, Bradley, Rogers, John, Moore, Joseph

arXiv.org Artificial IntelligenceMar-13-2025

Achieving unified multi-robot coordination and motion planning in complex environments is a challenging problem. In this paper, we present a hierarchical approach to long-range coordination, which we call Stratified Topological Autonomy for Long-Range Coordination (STALC). In particular, we look at the problem of minimizing visibility to observers and maximizing safety with a multi-robot team navigating through a hazardous environment. At its core, our approach relies on the notion of a dynamic topological graph, where the edge weights vary dynamically based on the locations of the robots in the graph. To create this dynamic topological graph, we evaluate the visibility of the robot team from a discrete set of observer locations (both adversarial and friendly), and construct a topological graph whose edge weights depend on both adversary position and robot team configuration. We then impose temporal constraints on the evolution of those edge weights based on robot team state and use Mixed-Integer Programming (MIP) to generate optimal multirobot plans through the graph. The visibility information also informs the lower layers of the autonomy stack to plan minimal visibility paths through the environment for the team of robots. Our approach presents methods to reduce the computational complexity for a team of robots that interact and coordinate across the team to accomplish a common goal. We demonstrate our approach in simulated and hardware experiments in forested and urban environments.

graph, overwatch, robot, (15 more...)

arXiv.org Artificial Intelligence

2503.10475

Country:

North America > United States > Nebraska (0.04)
North America > United States > Maryland > Prince George's County > Adelphi (0.04)
Europe > Slovenia > Central Slovenia > Municipality of Komenda > Komenda (0.04)

Genre:

Research Report (0.50)
Workflow (0.46)

Industry: Energy (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.67)

Add feedback

PTEENet: Post-Trained Early-Exit Neural Networks Augmentation for Inference Cost Optimization

Lahiany, Assaf, Aperstein, Yehudit

arXiv.org Artificial IntelligenceJan-5-2025

For many practical applications, a high computational cost of inference over deep network architectures might be unacceptable. A small degradation in the overall inference accuracy might be a reasonable price to pay for a significant reduction in the required computational resources. In this work, we describe a method for introducing "shortcuts" into the DNN feedforward inference process by skipping costly feedforward computations whenever possible. The proposed method is based on the previously described BranchyNet (Teerapittayanon et al., 2016) and the EEnet (Demir, 2019) architectures that jointly train the main network and early exit branches. We extend those methods by attaching branches to pre-trained models and, thus, eliminating the need to alter the original weights of the network. We also suggest a new branch architecture based on convolutional building blocks to allow enough training capacity when applied on large DNNs. The proposed architecture includes confidence heads that are used for predicting the confidence level in the corresponding early exits. By defining adjusted thresholds on these confidence extensions, we can control in real-time the amount of data exiting from each branch and the overall tradeoff between speed and accuracy of our model. In our experiments, we evaluate our method using image datasets (SVHN and CIFAR10) and several DNN architectures (ResNet, DenseNet, VGG) with varied depth. Our results demonstrate that the proposed method enables us to reduce the average inference computational cost and further controlling the tradeoff between the model accuracy and the computation cost.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ACCESS.2022.3187002

2501.02508

Country:

Europe (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

CAISSON: Concept-Augmented Inference Suite of Self-Organizing Neural Networks

Halperin, Igor

arXiv.org Artificial IntelligenceDec-3-2024

We present CAISSON, a novel hierarchical approach to Retrieval-Augmented Generation (RAG) that transforms traditional single-vector search into a multi-view clustering framework. At its core, CAISSON leverages dual Self-Organizing Maps (SOMs) to create complementary organizational views of the document space, where each view captures different aspects of document relationships through specialized embeddings. The first view processes combined text and metadata embeddings, while the second operates on metadata enriched with concept embeddings, enabling a comprehensive multi-view analysis that captures both fine-grained semantic relationships and high-level conceptual patterns. This dual-view approach enables more nuanced document discovery by combining evidence from different organizational perspectives. To evaluate CAISSON, we develop SynFAQA, a framework for generating synthetic financial analyst notes and question-answer pairs that systematically tests different aspects of information retrieval capabilities. Drawing on HotPotQA's methodology for constructing multi-step reasoning questions, SynFAQA generates controlled test cases where each question is paired with the set of notes containing its ground-truth answer, progressing from simple single-entity queries to complex multi-hop retrieval tasks involving multiple entities and concepts. Our experimental results demonstrate substantial improvements over both basic and enhanced RAG implementations, particularly for complex multi-entity queries, while maintaining practical response times suitable for interactive applications.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.02835

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (1.00)
Banking & Finance > Trading (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

OCCAM: Towards Cost-Efficient and Accuracy-Aware Image Classification Inference

Ding, Dujian, Xu, Bicheng, Lakshmanan, Laks V. S.

arXiv.org Artificial IntelligenceJun-6-2024

Image classification is a fundamental building block for a majority of computer vision applications. With the growing popularity and capacity of machine learning models, people can easily access trained image classifiers as a service online or offline. However, model use comes with a cost and classifiers of higher capacity usually incur higher inference costs. To harness the respective strengths of different classifiers, we propose a principled approach, OCCAM, to compute the best classifier assignment strategy over image classification queries (termed as the optimal model portfolio) so that the aggregated accuracy is maximized, under user-specified cost budgets. Our approach uses an unbiased and low-variance accuracy estimator and effectively computes the optimal solution by solving an integer linear programming problem. On a variety of real-world datasets, OCCAM achieves 40% cost reduction with little to no accuracy drop.

accuracy, classifier, occam, (13 more...)

arXiv.org Artificial Intelligence

2406.04508

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > Ontario > Toronto (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > Canada > British Columbia (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.86)

Add feedback

Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models

Chan, Alan, Bucknall, Ben, Bradley, Herbie, Krueger, David

arXiv.org Artificial IntelligenceDec-22-2023

Public release of the weights of pretrained foundation models, otherwise known as downloadable access [Solaiman, 2023], enables fine-tuning without the prohibitive expense of pretraining. Our work argues that increasingly accessible fine-tuning of downloadable models may increase hazards. First, we highlight research to improve the accessibility of fine-tuning. We split our discussion into research that A) reduces the computational cost of fine-tuning and B) improves the ability to share that cost across more actors. Second, we argue that increasingly accessible finetuning methods may increase hazard through facilitating malicious use and making oversight of models with potentially dangerous capabilities more difficult. Third, we discuss potential mitigatory measures, as well as benefits of more accessible fine-tuning. Given substantial remaining uncertainty about hazards, we conclude by emphasizing the urgent need for the development of mitigations.

arxiv, downloadable model, fine-tuning, (13 more...)

arXiv.org Artificial Intelligence

2312.14751

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre: Research Report (0.47)

Industry:

Information Technology > Security & Privacy (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.68)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback