Audit, Alignment, and Optimization of LM-Powered Subroutines with Application to Public Comment Processing

Raab, Reilly, Parker, Mike, Nally, Dan, Montgomery, Sadie, Bernat, Anastasia, Munikoti, Sai, Horawalavithana, Sameera

Jul-14-2025–arXiv.org Artificial Intelligence

Contemporary organizations have shown great interest in integrating language models (LMs) into workflows traditionally performed by human subject matter experts (SMEs), such as in medical diagnostics (Artsi et al., 2025), legal assistance (Padiu et al., 2024), financial risk analysis (AI21 labs, 2025), and governmental permitting or regulatory reviews (Phan et al., 2024). Despite this interest, however, the use of LMs (e.g., via a standard conversational interface) in high-stakes contexts is constrained by the need for decision-making reliability, objectivity, transparency, and accountability that SMEs currently provide (Mori, 2024). Effective reconciliation between LMs and SMEs thus represents a critical frontier in real-world deployments of artificial intelligence. LMs have demonstrated remarkable capabilities in extracting information from large volumes of multi-modal, multi-domain data; synthesizing multi-document concepts; and performing tasks associated with basic reasoning. Nonetheless, LMs are susceptible to "hallucinations" (i.e., inaccurate generation) (Ji et al., 2023), difficulty in handling nuanced, domain-specific requirements (Ashqar, 2025), historical biases inherited from training data (Ranjan et al., 2024), and opaque reasoning in decision-making (Machot et al., 2024). Notably, these weaknesses are often precisely the strengths of SMEs, who are conversely burdened with the inefficient and labor-intensive tasks of cross-document, multi-modal search and information extraction. We can see the need to delineate and integrate the often low-stakes or tedious work that can be performed by LMs with the discerning, high-stakes decision-making tasks performed by SMEs in the real world: The challenge is to harness the time efficiency and broad knowledge capabilities of LMs while preserving the domain expertise, contextual judgment, oversight, and accountability of SMEs. Moreover, we must do so without creating additional burdens for SMEs to work with LMs (e.g., "prompt-engineering" or manual review of all LM tasks), and we wish to minimize the introduction of new risks (e.g., a loss of clarity regarding where or how LMs may be used by each SME, or, in the case of governmental work, the erosion of public trust). In this work, we propose a novel auditable and interactive refinement framework for the effective integration of LMs with SMEs for decision-making workflows.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

Jul-14-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (1.00)

Genre:
- Research Report (0.65)

Industry:
- Law (1.00)
- Government > Regional Government
  - North America Government > United States Government (1.00)
- Energy
  - Power Industry > Utilities (0.94)
  - Renewable > Solar (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (0.67)
  - Machine Learning > Neural Networks (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found