AITopics | Raghavan, Manish

Collaborating Authors

Raghavan, Manish

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Double Machine Learning for Causal Inference under Shared-State Interference

Hays, Chris, Raghavan, Manish

arXiv.org Machine LearningApr-10-2025

Researchers and practitioners often wish to measure treatment effects in settings where units interact via markets and recommendation systems. In these settings, units are affected by certain shared states, like prices, algorithmic recommendations or social signals. We formalize this structure, calling it shared-state interference, and argue that our formulation captures many relevant applied settings. Our key modeling assumption is that individuals' potential outcomes are independent conditional on the shared state. We then prove an extension of a double machine learning (DML) theorem providing conditions for achieving efficient inference under shared-state interference. We also instantiate our general theorem in several models of interest where it is possible to efficiently estimate the average direct effect (ADE) or global average treatment effect (GATE).

artificial intelligence, estimator, machine learning, (17 more...)

arXiv.org Machine Learning

2504.08836

Genre:

Research Report > Experimental Study (0.67)
Research Report > Strength High (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Evaluating multiple models using labeled and unlabeled data

Shanmugam, Divya, Sadhuka, Shuvom, Raghavan, Manish, Guttag, John, Berger, Bonnie, Pierson, Emma

arXiv.org Artificial IntelligenceJan-22-2025

It remains difficult to evaluate machine learning classifiers in the absence of a large, labeled dataset. While labeled data can be prohibitively expensive or impossible to obtain, unlabeled data is plentiful. Here, we introduce Semi-Supervised Model Evaluation (SSME), a method that uses both labeled and unlabeled data to evaluate machine learning classifiers. SSME is the first evaluation method to take advantage of the fact that: (i) there are frequently multiple classifiers for the same task, (ii) continuous classifier scores are often available for all classes, and (iii) unlabeled data is often far more plentiful than labeled data. The key idea is to use a semi-supervised mixture model to estimate the joint distribution of ground truth labels and classifier predictions. We can then use this model to estimate any metric that is a function of classifier scores and ground truth labels (e.g., accuracy or expected calibration error). We present experiments in four domains where obtaining large labeled datasets is often impractical: (1) healthcare, (2) content moderation, (3) molecular property prediction, and (4) image annotation. Our results demonstrate that SSME estimates performance more accurately than do competing methods, reducing error by 5.1 relative to using labeled data alone and 2.4 relative to the next best competing method. SSME also improves accuracy when evaluating performance across subsets of the test distribution (e.g., specific demographic subgroups) and when evaluating the performance of language models. Rigorous evaluation is essential to the safe deployment of machine learning classifiers. The standard approach is to measure classifier performance using a large labeled dataset. In practice, however, labeled data is often scarce (Culotta & McCallum, 2005; Dutta & Das, 2023). Exacerbating the challenge of evaluation, the number of off-the-shelf classifiers has increased dramatically through the widespread usage of model hubs. The modern machine learning practitioner thus has a myriad of trained models, but little labeled data with which to evaluate them. In many domains, unlabeled data is much more abundant than labeled data (Bepler et al., 2019; Sagawa et al., 2021; Movva et al., 2024).

artificial intelligence, classifier, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2501.11866

Country: North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Competition and Diversity in Generative AI

Raghavan, Manish

arXiv.org Artificial IntelligenceDec-11-2024

A growing body of literature on generative artificial intelligence reveals a surprisingly consistent stylized fact: when people use generative AI tools, the set of content they produce tends to be more homogeneous than content produced by more traditional means [4, 22, 49, 56, 67, 69, 84, 106, 108]. Across a wide range of domains including peer review [56], writing [67], digital art [108], and survey responses [106], access to generative AI tools (GAITs) leads to less diverse outcomes. Researchers refer to this phenomenon--where the use of similar or identical underlying AI tools lead to convergence in outcomes--as algorithmic monoculture [50] or homogenization [12]. Much of the empirical literature on the subject treats homogenization itself as the primary object of study, seeking to quantify and deeply understand it. Here, we begin our analysis further downstream. We ask: What are the consequences of monoculture in generation? When homogenization has negative consequences, how should we expect content producers to behave in response?

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.0861

Country: North America > United States (0.92)

Genre: Research Report > New Finding (0.68)

Industry:

Leisure & Entertainment > Games (0.67)
Transportation > Ground > Road (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

Integrating Expert Judgment and Algorithmic Decision Making: An Indistinguishability Framework

Alur, Rohan, Laine, Loren, Li, Darrick K., Shung, Dennis, Raghavan, Manish, Shah, Devavrat

arXiv.org Artificial IntelligenceOct-17-2024

We introduce a novel framework for human-AI collaboration in prediction and decision tasks. Our approach leverages human judgment to distinguish inputs which are algorithmically indistinguishable, or "look the same" to any feasible predictive algorithm. We argue that this framing clarifies the problem of human-AI collaboration in prediction and decision tasks, as experts often form judgments by drawing on information which is not encoded in an algorithm's training data. Algorithmic indistinguishability yields a natural test for assessing whether experts incorporate this kind of "side information", and further provides a simple but principled method for selectively incorporating human feedback into algorithmic predictions. We show that this method provably improves the performance of any feasible algorithmic predictor and precisely quantify this improvement. We demonstrate the utility of our framework in a case study of emergency room triage decisions, where we find that although algorithmic risk scores are highly competitive with physicians, there is strong evidence that physician judgments provide signal which could not be replicated by any predictive algorithm. This insight yields a range of natural decision rules which leverage the complementary strengths of human experts and predictive algorithms.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2410.08783

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Government Relations & Public Policy (0.92)
Health & Medicine > Therapeutic Area > Gastroenterology (0.67)
(2 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(3 more...)

Add feedback

Unstable Unlearning: The Hidden Risk of Concept Resurgence in Diffusion Models

Suriyakumar, Vinith M., Alur, Rohan, Sekhari, Ayush, Raghavan, Manish, Wilson, Ashia C.

arXiv.org Artificial IntelligenceOct-10-2024

Text-to-image diffusion models rely on massive, web-scale datasets. Training them from scratch is computationally expensive, and as a result, developers often prefer to make incremental updates to existing models. These updates often compose fine-tuning steps (to learn new concepts or improve model performance) with "unlearning" steps (to "forget" existing concepts, such as copyrighted works or explicit content). In this work, we demonstrate a critical and previously unknown vulnerability that arises in this paradigm: even under benign, non-adversarial conditions, fine-tuning a text-to-image diffusion model on seemingly unrelated images can cause it to "relearn" concepts that were previously "unlearned." We comprehensively investigate the causes and scope of this phenomenon, which we term concept resurgence, by performing a series of experiments which compose "mass concept erasure" (the current state of the art for unlearning in text-to-image diffusion models (Lu et al., 2024)) with subsequent fine-tuning of Stable Diffusion v1.4. Our findings underscore the fragility of composing incremental model updates, and raise serious new concerns about current approaches to ensuring the safety and alignment of text-to-image diffusion models.

artificial intelligence, concept resurgence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2410.08074

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.86)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Government > Regional Government (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Distinguishing the Indistinguishable: Human Expertise in Algorithmic Prediction

Alur, Rohan, Raghavan, Manish, Shah, Devavrat

arXiv.org Artificial IntelligenceFeb-1-2024

We introduce a novel framework for incorporating human expertise into algorithmic predictions. Our approach focuses on the use of human judgment to distinguish inputs which `look the same' to any feasible predictive algorithm. We argue that this framing clarifies the problem of human/AI collaboration in prediction tasks, as experts often have access to information -- particularly subjective information -- which is not encoded in the algorithm's training data. We use this insight to develop a set of principled algorithms for selectively incorporating human feedback only when it improves the performance of any feasible predictor. We find empirically that although algorithms often outperform their human counterparts on average, human judgment can significantly improve algorithmic predictions on specific instances (which can be identified ex-ante). In an X-ray classification task, we find that this subset constitutes nearly 30% of the patient population. Our approach provides a natural way of uncovering this heterogeneity and thus enabling effective human-AI collaboration.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2402.00793

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Auditing for Human Expertise

Alur, Rohan, Laine, Loren, Li, Darrick K., Raghavan, Manish, Shah, Devavrat, Shung, Dennis

arXiv.org Machine LearningOct-27-2023

High-stakes prediction tasks (e.g., patient diagnosis) are often handled by trained human experts. A common source of concern about automation in these settings is that experts may exercise intuition that is difficult to model and/or have access to information (e.g., conversations with a patient) that is simply unavailable to a would-be algorithm. This raises a natural question whether human experts add value which could not be captured by an algorithmic predictor. We develop a statistical framework under which we can pose this question as a natural hypothesis test. Indeed, as our framework highlights, detecting human expertise is more subtle than simply comparing the accuracy of expert predictions to those made by a particular learning algorithm. Instead, we propose a simple procedure which tests whether expert predictions are statistically independent from the outcomes of interest after conditioning on the available inputs (`features'). A rejection of our test thus suggests that human experts may add value to any algorithm trained on the available data, and has direct implications for whether human-AI `complementarity' is achievable in a given prediction task. We highlight the utility of our procedure using admissions data collected from the emergency department of a large academic hospital system, where we show that physicians' admit/discharge decisions for patients with acute gastrointestinal bleeding (AGIB) appear to be incorporating information that is not available to a standard algorithmic screening tool. This is despite the fact that the screening tool is arguably more accurate than physicians' discretionary decisions, highlighting that -- even absent normative concerns about accountability or interpretability -- accuracy is insufficient to justify algorithmic automation.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

2306.01646

Country: North America > United States (1.00)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.67)

Industry:

Health & Medicine > Government Relations & Public Policy (1.00)
Health & Medicine > Therapeutic Area > Gastroenterology (0.68)
Health & Medicine > Health Care Providers & Services > Reimbursement (0.67)
(2 more...)

Technology:

Information Technology > Data Science > Data Mining (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Simplistic Collection and Labeling Practices Limit the Utility of Benchmark Datasets for Twitter Bot Detection

Hays, Chris, Schutzman, Zachary, Raghavan, Manish, Walk, Erin, Zimmer, Philipp

arXiv.org Artificial IntelligenceMay-1-2023

Accurate bot detection is necessary for the safety and integrity of online platforms. It is also crucial for research on the influence of bots in elections, the spread of misinformation, and financial market manipulation. Platforms deploy infrastructure to flag or remove automated accounts, but their tools and data are not publicly available. Thus, the public must rely on third-party bot detection. These tools employ machine learning and often achieve near perfect performance for classification on existing datasets, suggesting bot detection is accurate, reliable and fit for use in downstream applications. We provide evidence that this is not the case and show that high performance is attributable to limitations in dataset collection and labeling rather than sophistication of the tools. Specifically, we show that simple decision rules -- shallow decision trees trained on a small number of features -- achieve near-state-of-the-art performance on most available datasets and that bot detection datasets, even when combined together, do not generalize well to out-of-sample datasets. Our findings reveal that predictions are highly dependent on each dataset's collection and labeling procedures rather than fundamental differences between bots and humans. These results have important implications for both transparency in sampling and labeling procedures and potential biases in research using existing bot detection tools for pre-processing.

artificial intelligence, machine learning, social media, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3543507.3583214

2301.07015

Country:

Europe (0.67)
North America > United States > California (0.28)

Genre: Research Report > New Finding (0.87)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Greedy Algorithm almost Dominates in Smoothed Contextual Bandits

Raghavan, Manish, Slivkins, Aleksandrs, Vaughan, Jennifer Wortman, Wu, Zhiwei Steven

arXiv.org Machine LearningDec-27-2021

Online learning algorithms, widely used to power search and content optimization on the web, must balance exploration and exploitation, potentially sacrificing the experience of current users in order to gain information that will lead to better decisions in the future. While necessary in the worst case, explicit exploration has a number of disadvantages compared to the greedy algorithm that always "exploits" by choosing an action that currently looks optimal. We ask under what conditions inherent diversity in the data makes explicit exploration unnecessary. We build on a recent line of work on the smoothed analysis of the greedy algorithm in the linear contextual bandits model. We improve on prior results to show that the greedy algorithm almost matches the best possible Bayesian regret rate of any other algorithm on the same problem instance whenever the diversity conditions hold. The key technical finding is that data collected by the greedy algorithm suffices to simulate a run of any other algorithm.

algorithm, artificial intelligence, educational setting, (17 more...)

arXiv.org Machine Learning

2005.10624

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Industry:

Education > Educational Setting (0.34)
Energy > Oil & Gas > Upstream (0.34)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)

Add feedback

Mitigating Bias in Algorithmic Employment Screening: Evaluating Claims and Practices

Raghavan, Manish, Barocas, Solon, Kleinberg, Jon, Levy, Karen

arXiv.org Artificial IntelligenceJun-21-2019

There has been rapidly growing interest in the use of algorithms for employment assessment, especially as a means to address or mitigate bias in hiring. Yet, to date, little is known about how these methods are being used in practice. How are algorithmic assessments built, validated, and examined for bias? In this work, we document and assess the claims and practices of companies offering algorithms for employment assessment, using a methodology that can be applied to evaluate similar applications and issues of bias in other domains. In particular, we identify vendors of algorithmic pre-employment assessments (i.e., algorithms to screen candidates), document what they have disclosed about their development and validation procedures, and evaluate their techniques for detecting and mitigating bias. We find that companies' formulation of "bias" varies, as do their approaches to dealing with it. We also discuss the various choices vendors make regarding data collection and prediction targets, in light of the risks and trade-offs that these choices pose. We consider the implications of these choices and we raise a number of technical and legal considerations.

assessment, employment law, labor law, (23 more...)

arXiv.org Artificial Intelligence

1906.09208

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.68)

Industry:

Law > Labor & Employment Law (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Education (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.46)

Add feedback