Plotting

 Krishnan, Ramayya


Firm or Fickle? Evaluating Large Language Models Consistency in Sequential Interactions

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have shown remarkable capabilities across various tasks, but their deployment in high-stake domains requires consistent performance across multiple interaction rounds. This paper introduces a comprehensive framework for evaluating and improving LLM response consistency, making three key contributions. First, we propose a novel Position-Weighted Consistency (PWC) score that captures both the importance of early-stage stability and recovery patterns in multi-turn interactions. Second, we present a carefully curated benchmark dataset spanning diverse domains and difficulty levels, specifically designed to evaluate LLM consistency under various challenging follow-up scenarios. Third, we introduce Confidence-Aware Response Generation (CARG), a framework that significantly improves response stability by incorporating model confidence signals into the generation process. Empirical results demonstrate that CARG significantly improves response stability without sacrificing accuracy, underscoring its potential for reliable LLM deployment in critical applications.


Reliability, Resilience and Human Factors Engineering for Trustworthy AI Systems

arXiv.org Artificial Intelligence

As AI systems become integral to critical operations across industries and services, ensuring their reliability and safety is essential. We offer a framework that integrates established reliability and resilience engineering principles into AI systems. By applying traditional metrics such as failure rate and Mean Time Between Failures (MTBF) along with resilience engineering and human reliability analysis, we propose an integrate framework to manage AI system performance, and prevent or efficiently recover from failures. Our work adapts classical engineering methods to AI systems and outlines a research agenda for future technical studies. We apply our framework to a real-world AI system, using system status data from platforms such as openAI, to demonstrate its practical applicability. This framework aligns with emerging global standards and regulatory frameworks, providing a methodology to enhance the trustworthiness of AI systems. Our aim is to guide policy, regulation, and the development of reliable, safe, and adaptable AI technologies capable of consistent performance in real-world environments.


Artificial Intelligence/Operations Research Workshop 2 Report Out

arXiv.org Artificial Intelligence

Artificial intelligence (AI) has received significant attention in recent years, primarily due to breakthroughs in game playing, computer vision, and natural language processing that captured the imagination of the scientific community and the public at large. Many businesses, industries, and academic disciplines are now contemplating the application of AI to their own challenges. The federal government in the US and other countries have also invested significantly in advancing AI research and created funding initiatives and programs to promote greater collaboration across multiple communities. Some of the investment examples in the US include the establishment of the National AI Initiative Office, the launch of the National AI Research Resource Task Force, and more recently, the establishment of the National AI Advisory Committee. In 2021 INFORMS and ACM SIGAI joined together with the Computing Community Consortium (CCC) to organize a series of three workshops. The objective for this workshop series is to explore ways to exploit the synergies of the AI and Operations Research (OR) communities to transform decision making.


Changes in Commuter Behavior from COVID-19 Lockdowns in the Atlanta Metropolitan Area

arXiv.org Artificial Intelligence

This paper analyzes the impact of COVID-19 related lockdowns in the Atlanta, Georgia metropolitan area by examining commuter patterns in three periods: prior to, during, and after the pandemic lockdown. A cellular phone location dataset is utilized in a novel pipeline to infer the home and work locations of thousands of users from the Density-based Spatial Clustering of Applications with Noise (DBSCAN) algorithm. The coordinates derived from the clustering are put through a reverse geocoding process from which word embeddings are extracted in order to categorize the industry of each work place based on the workplace name and Point of Interest (POI) mapping. Frequencies of commute from home locations to work locations are analyzed in and across all three time periods. Public health and economic factors are discussed to explain potential reasons for the observed changes in commuter patterns.


An Efficient Simulation-Based Approach to Ambulance Fleet Allocation and Dynamic Redeployment

AAAI Conferences

We present an efficient approach to ambulance fleet allocation and dynamic redeployment, where the goal is to position an entire fleet of ambulances to base locations to maximize the service level (or utility) of the Emergency Medical Services (EMS) system. We take a simulation-based approach, where the utility of an allocation is measured by directly simulating emergency requests. In both the static and dynamic settings, this modeling approach leads to an exponentially large action space (with respect to the number of ambulances). Futhermore, the utility of any particular allocation can only be measured via a seemingly “black box” simulator. Despite this complexity, we show that embedding our simulator within a simple and efficient greedy allocation algorithm produces good solutions. We derive data-driven performance guarantees which yield small optimality gap. Given its efficiency, we can repeatedly employ this approach in real-time for dynamic repositioning. We conduct simulation experiments based on real usage data of an EMS system from a large Asian city, and demonstrate significant improvement in the system’s service levels using static allocations and redeployment policies discovered by our approach.