Goto

Collaborating Authors

 count 0


The Value of Information in Human-AI Decision-making

Guo, Ziyang, Wu, Yifan, Hartline, Jason, Hullman, Jessica

arXiv.org Artificial Intelligence

As the performance of artificial intelligence (AI) models improves, workflows in which human and AI model-based judgments are combined to make decisions are sought in medicine, finance, and other domains. Though statistical models often make more accurate predictions than human experts on average [Ægisdóttir et al., 2006, Grove et al., 2000, Meehl, 1954], whenever humans have access to additional information over the AI, there is potential to achieve complementary performance by pairing the two, i.e., better performance than either the human or AI alone. For example, a physician may have access to additional information that may not be captured in tabular electronic health records or other structured data [Alur et al., 2024b]. However, evidence of complementary performance between humans and AI is limited, with many studies showing that human-AI teams underperform an AI alone [Buçinca et al., 2020, Bussone et al., 2015, Green and Chen, 2019, Jacobs et al., 2021, Lai and Tan, 2019, Vaccaro and Waldo, 2019, Kononenko, 2001]. A solid understanding of such results is limited by the fact that most analyses of human-AI decision-making focus on ranking the performance of human-AI teams or each individually using measures like posthoc decision accuracy.


Transforming CCTV cameras into NO$_2$ sensors at city scale for adaptive policymaking

Ibrahim, Mohamed R., Lyons, Terry

arXiv.org Artificial Intelligence

Air pollution in cities, especially NO\textsubscript{2}, is linked to numerous health problems, ranging from mortality to mental health challenges and attention deficits in children. While cities globally have initiated policies to curtail emissions, real-time monitoring remains challenging due to limited environmental sensors and their inconsistent distribution. This gap hinders the creation of adaptive urban policies that respond to the sequence of events and daily activities affecting pollution in cities. Here, we demonstrate how city CCTV cameras can act as a pseudo-NO\textsubscript{2} sensors. Using a predictive graph deep model, we utilised traffic flow from London's cameras in addition to environmental and spatial factors, generating NO\textsubscript{2} predictions from over 133 million frames. Our analysis of London's mobility patterns unveiled critical spatiotemporal connections, showing how specific traffic patterns affect NO\textsubscript{2} levels, sometimes with temporal lags of up to 6 hours. For instance, if trucks only drive at night, their effects on NO\textsubscript{2} levels are most likely to be seen in the morning when people commute. These findings cast doubt on the efficacy of some of the urban policies currently being implemented to reduce pollution. By leveraging existing camera infrastructure and our introduced methods, city planners and policymakers could cost-effectively monitor and mitigate the impact of NO\textsubscript{2} and other pollutants.


The Price of Prompting: Profiling Energy Use in Large Language Models Inference

Husom, Erik Johannes, Goknil, Arda, Shar, Lwin Khin, Sen, Sagar

arXiv.org Artificial Intelligence

In the rapidly evolving realm of artificial intelligence, deploying large language models (LLMs) poses increasingly pressing computational and environmental challenges. This paper introduces MELODI - Monitoring Energy Levels and Optimization for Data-driven Inference - a multifaceted framework crafted to monitor and analyze the energy consumed during LLM inference processes. MELODI enables detailed observations of power consumption dynamics and facilitates the creation of a comprehensive dataset reflective of energy efficiency across varied deployment scenarios. The dataset, generated using MELODI, encompasses a broad spectrum of LLM deployment frameworks, multiple language models, and extensive prompt datasets, enabling a comparative analysis of energy use. Using the dataset, we investigate how prompt attributes, including length and complexity, correlate with energy expenditure. Our findings indicate substantial disparities in energy efficiency, suggesting ample scope for optimization and adoption of sustainable measures in LLM deployment. Our contribution lies not only in the MELODI framework but also in the novel dataset, a resource that can be expanded by other researchers. Thus, MELODI is a foundational tool and dataset for advancing research into energy-conscious LLM deployment, steering the field toward a more sustainable future.


Solving Probability and Statistics Problems by Program Synthesis

Tang, Leonard, Ke, Elizabeth, Singh, Nikhil, Verma, Nakul, Drori, Iddo

arXiv.org Artificial Intelligence

We solve university level probability and statistics questions by program synthesis using OpenAI's Codex, a Transformer trained on text and fine-tuned on code. We transform course problems from MIT's 18.05 Introduction to Probability and Statistics and Harvard's STAT110 Probability into programming tasks. We then execute the generated code to get a solution. Since these course questions are grounded in probability, we often aim to have Codex generate probabilistic programs that simulate a large number of probabilistic dependencies to compute its solution. Our approach requires prompt engineering to transform the question from its original form to an explicit, tractable form that results in a correct program and solution. To estimate the amount of work needed to translate an original question into its tractable form, we measure the similarity between original and transformed questions. Our work is the first to introduce a new dataset of university-level probability and statistics problems and solve these problems in a scalable fashion using the program synthesis capabilities of large language models.


Cohort Shapley value for algorithmic fairness

Mase, Masayoshi, Owen, Art B., Seiler, Benjamin B.

arXiv.org Artificial Intelligence

Cohort Shapley value is a model-free method of variable importance grounded in game theory that does not use any unobserved and potentially impossible feature combinations. We use it to evaluate algorithmic fairness, using the well known COMPAS recidivism data as our example. This approach allows one to identify for each individual in a data set the extent to which they were adversely or beneficially affected by their value of a protected attribute such as their race. The method can do this even if race was not one of the original predictors and even if it does not have access to a proprietary algorithm that has made the predictions. The grounding in game theory lets us define aggregate variable importance for a data set consistently with its per subject definitions. We can investigate variable importance for multiple quantities of interest in the fairness literature including false positive predictions.


Towards Data Driven Model Improvement

Qiu, Yumeng (Worcester Polytechnic Institute) | Pardos, Zachary A. (Worcester Polytechnic Institute) | Heffernan, Neil T (Worcester Polytechnic Institute)

AAAI Conferences

In the area of student knowledge assessment, knowledge tracing is a model that has been used for over a decade to predict student knowledge and performance. Many modifications to this model have been proposed and evaluated, however, the modifications are often based on a combination of intuition and experience in the domain. This method of model improvement can be difficult for researchers without high level of domain experience and furthermore, the best improvements to the model could be unintuitive ones. Therefore, we propose a completely data driven approach to model improvement. This alternative allows for researchers to evaluate which aspects of a model are most likely to result in model performance improvement. Our results suggest a variety of different improvements to knowledge tracing many of which have not been explored.