Yadav, Shashank
PHEONA: An Evaluation Framework for Large Language Model-based Approaches to Computational Phenotyping
Pungitore, Sarah, Yadav, Shashank, Subbian, Vignesh
Computational phenotyping is essential for biomedical research but often requires significant time and resources, especially since traditional methods typically involve extensive manual data review. While machine learning and natural language processing advancements have helped, further improvements are needed. Few studies have explored using Large Language Models (LLMs) for these tasks despite known advantages of LLMs for text-based tasks. T o facilitate further research in this area, we developed an evaluation framework, Evaluation of PHEnotyping for Observational Health Data (PHEONA), that outlines context-specific considerations. W e applied and demonstrated PHEONA on concept classification, a specific task within a broader phenotyping process for Acute Respiratory Failure (ARF) respiratory support therapies. From the sample concepts tested, we achieved high classification accuracy, suggesting the potential for LLM-based methods to improve computational phenotyping processes.
Gamified crowd-sourcing of high-quality data for visual fine-tuning
Yadav, Shashank, Tomar, Rohan, Jain, Garvit, Ahooja, Chirag, Chaudhary, Shubham, Elkan, Charles
There are 10 images in a session, 5 tainted and 5 untainted; players don't know which images are tainted. At the end of the session, they get 20 points per tainted image where they chose "Wrong Answer" and the model had been instructed to answer incorrectly. This paper introduces gamified adversarial prompting (GAP), a framework that crowd-sources high-quality data for visual instruction tuning of large multimodal models. GAP transforms the data collection process into an engaging game, incentivizing players to provide fine-grained, challenging questions and answers that target gaps in the model's knowledge. Our contributions include (1) an approach to capture question-answer pairs from humans that directly address weaknesses in a model's knowledge, (2) a method for evaluating and rewarding players that successfully incentivizes them to provide high-quality submissions, and (3) a scalable, gamified platform that succeeds in collecting this data from over 50,000 participants in just a few weeks. Our implementation of GAP has significantly improved the accuracy of a small multimodal model, namely MiniCPM-Llama3-V-2.5-8B, Moreover, we demonstrate that the data generated using MiniCPM-Llama3-V-2.5-8B Specifically, the same data improves the performance of QWEN2-VL-2B and QWEN2-VL-7B on the same multiple benchmarks. Visual question answering (VQA) has emerged as a crucial paradigm in AI, extending beyond mere visual interpretation to facilitate broader and deeper understanding in models. Studies demonstrate VQA's potential in enhancing general knowledge acquisition, transfer learning, and complex reasoning skills. Mahdisoltani et al. (2018) showed that pretraining on complex visual-linguistic tasks significantly improves performance across diverse downstream applications, from textual generation to fine-grained classification. The encoding of visual information as language, explored in works like Something-Else (Materzynska et al., 2020; Girdhar & Ramanan, 2019), and more recently by Alayrac et al. (2022), enables models to develop low-level visual skills that support sophisticated reasoning in multimodal contexts.
Machines and Influence
Yadav, Shashank
Policymakers face a broader challenge of how to view AI capabilities today and where does society stand in terms of those capabilities. This paper surveys AI capabilities and tackles this very issue, exploring it in context of political security in digitally networked societies. We extend the ideas of Information Management to better understand contemporary AI systems as part of a larger and more complex information system. Comprehensively reviewing AI capabilities and contemporary man-machine interactions, we undertake conceptual development to suggest that better information management could allow states to more optimally offset the risks of AI enabled influence and better utilise the emerging capabilities which these systems have to offer to policymakers and political institutions across the world. Hopefully this long essay will actuate further debates and discussions over these ideas, and prove to be a useful contribution towards governing the future of AI.