AITopics | Suresh, Harini

Collaborating Authors

Suresh, Harini

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

"Ownership, Not Just Happy Talk": Co-Designing a Participatory Large Language Model for Journalism

Tseng, Emily, Young, Meg, Quéré, Marianne Aubin Le, Rinehart, Aimee, Suresh, Harini

arXiv.org Artificial IntelligenceJan-28-2025

Journalism has emerged as an essential domain for understanding the uses, limitations, and impacts of large language models (LLMs) in the workplace. News organizations face divergent financial incentives: LLMs already permeate newswork processes within financially constrained organizations, even as ongoing legal challenges assert that AI companies violate their copyright. At stake are key questions about what LLMs are created to do, and by whom: How might a journalist-led LLM work, and what can participatory design illuminate about the present-day challenges about adapting ``one-size-fits-all'' foundation models to a given context of use? In this paper, we undertake a co-design exploration to understand how a participatory approach to LLMs might address opportunities and challenges around AI in journalism. Our 20 interviews with reporters, data journalists, editors, labor organizers, product leads, and executives highlight macro, meso, and micro tensions that designing for this opportunity space must address. From these desiderata, we describe the result of our co-design work: organizational structures and functionality for a journalist-controlled LLM. In closing, we discuss the limitations of commercial foundation models for workplace use, and the methodological implications of applying participatory methods to LLM co-design.

large language model, machine learning, newsroom, (20 more...)

arXiv.org Artificial Intelligence

2501.17299

Country: North America > United States (0.94)

Genre:

Questionnaire & Opinion Survey (0.93)
Research Report > New Finding (0.68)

Industry: Media > News (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.30)

Add feedback

Participation in the age of foundation models

Suresh, Harini, Tseng, Emily, Young, Meg, Gray, Mary L., Pierson, Emma, Levy, Karen

arXiv.org Artificial IntelligenceMay-29-2024

Growing interest and investment in the capabilities of foundation models has positioned such systems to impact a wide array of public services. Alongside these opportunities is the risk that these systems reify existing power imbalances and cause disproportionate harm to marginalized communities. Participatory approaches hold promise to instead lend agency and decision-making power to marginalized stakeholders. But existing approaches in participatory AI/ML are typically deeply grounded in context - how do we apply these approaches to foundation models, which are, by design, disconnected from context? Our paper interrogates this question. First, we examine existing attempts at incorporating participation into foundation models. We highlight the tension between participation and scale, demonstrating that it is intractable for impacted communities to meaningfully shape a foundation model that is intended to be universally applicable. In response, we develop a blueprint for participatory foundation models that identifies more local, application-oriented opportunities for meaningful participation. In addition to the "foundation" layer, our framework proposes the "subfloor'' layer, in which stakeholders develop shared technical infrastructure, norms and governance for a grounded domain, and the "surface'' layer, in which affected communities shape the use of a foundation model for a specific downstream task. The intermediate "subfloor'' layer scopes the range of potential harms to consider, and affords communities more concrete avenues for deliberation and intervention. At the same time, it avoids duplicative effort by scaling input across relevant use cases. Through three case studies in clinical care, financial services, and journalism, we illustrate how this multi-layer model can create more meaningful opportunities for participation than solely intervening at the foundation layer.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3630106.3658992

2405.19479

Country:

South America > Brazil > Rio de Janeiro (0.15)
North America > United States > New York (0.14)

Genre:

Questionnaire & Opinion Survey (0.46)
Research Report (0.40)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
(3 more...)

Technology:

Information Technology > Human Computer Interaction (0.93)
Information Technology > Artificial Intelligence > Cognitive Science (0.93)
Information Technology > Security & Privacy (0.92)
(3 more...)

Add feedback

Saliency Cards: A Framework to Characterize and Compare Saliency Methods

Boggust, Angie, Suresh, Harini, Strobelt, Hendrik, Guttag, John V., Satyanarayan, Arvind

arXiv.org Artificial IntelligenceMay-30-2023

Saliency methods are a common class of machine learning interpretability techniques that calculate how important each input feature is to a model's output. We find that, with the rapid pace of development, users struggle to stay informed of the strengths and limitations of new methods and, thus, choose methods for unprincipled reasons (e.g., popularity). Moreover, despite a corresponding rise in evaluation metrics, existing approaches assume universal desiderata for saliency methods (e.g., faithfulness) that do not account for diverse user needs. In response, we introduce saliency cards: structured documentation of how saliency methods operate and their performance across a battery of evaluative metrics. Through a review of 25 saliency method papers and 33 method evaluations, we identify 10 attributes that users should account for when choosing a method. We group these attributes into three categories that span the process of computing and interpreting saliency: methodology, or how the saliency is calculated; sensitivity, or the relationship between the saliency and the underlying model and data; and, perceptibility, or how an end user ultimately interprets the result. By collating this information, saliency cards allow users to more holistically assess and compare the implications of different methods. Through nine semi-structured interviews with users from various backgrounds, including researchers, radiologists, and computational biologists, we find that saliency cards provide a detailed vocabulary for discussing individual methods and allow for a more systematic selection of task-appropriate methods. Moreover, with saliency cards, we are able to analyze the research landscape in a more structured fashion to identify opportunities for new methods and evaluation metrics for unmet user needs.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3593013.3593997

2206.02958

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Information Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Intuitively Assessing ML Model Reliability through Example-Based Explanations and Editing Model Inputs

Suresh, Harini, Lewis, Kathleen M., Guttag, John V., Satyanarayan, Arvind

arXiv.org Artificial IntelligenceFeb-16-2021

Interpretability methods aim to help users build trust in and understand the capabilities of machine learning models. However, existing approaches often rely on abstract, complex visualizations that poorly map to the task at hand or require non-trivial ML expertise to interpret. Here, we present two interface modules to facilitate a more intuitive assessment of model reliability. To help users better characterize and reason about a model's uncertainty, we visualize raw and aggregate information about a given input's nearest neighbors in the training dataset. Using an interactive editor, users can manipulate this input in semantically-meaningful ways, determine the effect on the output, and compare against their prior expectations. We evaluate our interface using an electrocardiogram beat classification case study. Compared to a baseline feature importance interface, we find that 9 physicians are better able to align the model's uncertainty with clinically relevant factors and build intuition about its capabilities and limitations.

neighbor, neural network, vascular disease, (22 more...)

arXiv.org Artificial Intelligence

2102.0854

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Georgia > Fulton County > Atlanta (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.46)

Add feedback

Underspecification Presents Challenges for Credibility in Modern Machine Learning

D'Amour, Alexander, Heller, Katherine, Moldovan, Dan, Adlam, Ben, Alipanahi, Babak, Beutel, Alex, Chen, Christina, Deaton, Jonathan, Eisenstein, Jacob, Hoffman, Matthew D., Hormozdiari, Farhad, Houlsby, Neil, Hou, Shaobo, Jerfel, Ghassen, Karthikesalingam, Alan, Lucic, Mario, Ma, Yian, McLean, Cory, Mincu, Diana, Mitani, Akinori, Montanari, Andrea, Nado, Zachary, Natarajan, Vivek, Nielson, Christopher, Osborne, Thomas F., Raman, Rajiv, Ramasamy, Kim, Sayres, Rory, Schrouff, Jessica, Seneviratne, Martin, Sequeira, Shannon, Suresh, Harini, Veitch, Victor, Vladymyrov, Max, Wang, Xuezhi, Webster, Kellie, Yadlowsky, Steve, Yun, Taedong, Zhai, Xiaohua, Sculley, D.

arXiv.org Machine LearningNov-6-2020

ML models often exhibit unexpectedly poor behavior when they are deployed in real-world domains. We identify underspecification as a key reason for these failures. An ML pipeline is underspecified when it can return many predictors with equivalently strong held-out performance in the training domain. Underspecification is common in modern ML pipelines, such as those based on deep learning. Predictors returned by underspecified pipelines are often treated as equivalent based on their training domain performance, but we show here that such predictors can behave very differently in deployment domains. This ambiguity can lead to instability and poor model behavior in practice, and is a distinct failure mode from previously identified issues arising from structural mismatch between training and deployment domains. We show that this problem appears in a wide variety of practical ML pipelines, using examples from computer vision, medical imaging, natural language processing, clinical risk prediction based on electronic health records, and medical genomics. Our results show the need to explicitly account for underspecification in modeling pipelines that are intended for real-world deployment in any domain.

deep learning, neural network, predictor, (26 more...)

arXiv.org Machine Learning

2011.03395

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)
Health & Medicine > Therapeutic Area > Dermatology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

Add feedback

Misplaced Trust: Measuring the Interference of Machine Learning in Human Decision-Making

Suresh, Harini, Lao, Natalie, Liccardi, Ilaria

arXiv.org Artificial IntelligenceMay-21-2020

ML decision-aid systems are increasingly common on the web, but their successful integration relies on people trusting them appropriately: they should use the system to fill in gaps in their ability, but recognize signals that the system might be incorrect. We measured how people's trust in ML recommendations differs by expertise and with more system information through a task-based study of 175 adults. We used two tasks that are difficult for humans: comparing large crowd sizes and identifying similar-looking animals. Our results provide three key insights: (1) People trust incorrect ML recommendations for tasks that they perform correctly the majority of the time, even if they have high prior knowledge about ML or are given information indicating the system is not confident in its prediction; (2) Four different types of system information all increased people's trust in recommendations; and (3) Math and logic skills may be as important as ML for decision-makers working with ML recommendations.

educational setting, health & medicine, recommendation, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3394231.3397922

2005.1096

Country: North America > United States > Massachusetts (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.68)

Industry:

Health & Medicine (1.00)
Education > Educational Setting (1.00)
Law (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

A Framework for Understanding Unintended Consequences of Machine Learning

Suresh, Harini, Guttag, John V.

arXiv.org Machine LearningJan-28-2019

As machine learning increasingly affects people and society, it is important that we strive for a comprehensive and unified understanding of how and why unwanted consequences arise. For instance, downstream harms to particular groups are often blamed on "biased data," but this concept encompass too many issues to be useful in developing solutions. In this paper, we provide a framework that partitions sources of downstream harm in machine learning into five distinct categories spanning the data generation and machine learning pipeline. We describe how these issues arise, how they are relevant to particular applications, and how they motivate different solutions. In doing so, we aim to facilitate the development of solutions that stem from an understanding of application-specific populations and data generation processes, rather than relying on general claims about what may or may not be "fair."

diabetes, health & medicine, survey article, (20 more...)

arXiv.org Machine Learning

1901.10002

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Modeling Mistrust in End-of-Life Care

Boag, Willie, Suresh, Harini, Celi, Leo Anthony, Szolovits, Peter, Ghassemi, Marzyeh

arXiv.org Artificial IntelligenceJun-30-2018

In this work, we characterize the doctor-patient relationship using a machine learning-derived trust score. We show that this score has statistically significant racial associations, and that by modeling trust directly we find stronger disparities in care than by stratifying on race. We further demonstrate that mistrust is indicative of worse outcomes, but is only weakly associated with physiologically-created severity scores. Finally, we describe sentiment analysis experiments indicating patients with higher levels of mistrust have worse experiences and interactions with their caregivers. This work is a step towards measuring fairer machine learning in the healthcare domain.

artificial intelligence, disparity, health & medicine, (20 more...)

arXiv.org Artificial Intelligence

1807.00124

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)

Technology:

Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Learning Tasks for Multitask Learning: Heterogenous Patient Populations in the ICU

Suresh, Harini, Gong, Jen J., Guttag, John

arXiv.org Machine LearningJun-7-2018

Machine learning approaches have been effective in predicting adverse outcomes in different clinical settings. These models are often developed and evaluated on datasets with heterogeneous patient populations. However, good predictive performance on the aggregate population does not imply good performance for specific groups. In this work, we present a two-step framework to 1) learn relevant patient subgroups, and 2) predict an outcome for separate patient populations in a multi-task framework, where each population is a separate task. We demonstrate how to discover relevant groups in an unsupervised way with a sequence-to-sequence autoencoder. We show that using these groups in a multi-task framework leads to better predictive performance of in-hospital mortality both across groups and overall. We also highlight the need for more granular evaluation of performance when dealing with heterogeneous populations.

cohort, deep learning, vascular disease, (23 more...)

arXiv.org Machine Learning

doi: 10.1145/3219819.3219930

1806.02878

Country: North America > United States > Massachusetts (0.14)

Genre:

Research Report > Experimental Study (1.00)
Overview (0.93)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.94)
Health & Medicine > Diagnostic Medicine (0.68)
Health & Medicine > Health Care Providers & Services (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Semi-Supervised Biomedical Translation With Cycle Wasserstein Regression GANs

AAAI ConferencesFeb-8-2018

The biomedical field offers many learning tasks that share unique challenges: large amounts of unpaired data, and a high cost to generate labels. In this work, we develop a method to address these issues with semi-supervised learning in regression tasks (e.g., translation from source to target). Our model uses adversarial signals to learn from unpaired datapoints, and imposes a cycle-loss reconstruction error penalty to regularize mappings in either direction against one another. We first evaluate our method on synthetic experiments, demonstrating two primary advantages of the system: 1) distribution matching via the adversarial loss and 2) regularization towards invertible mappings via the cycle loss. We then show a regularization effect and improved performance when paired data is supplemented by additional unpaired data on two real biomedical regression tasks: estimating the physiological effect of medical treatments, and extrapolating gene expression (transcriptomics) signals. Our proposed technique is a promising initial step towards more robust use of adversarial signals in semi-supervised regression, and could be useful for other tasks (e.g., causal inference or modality translation) in the biomedical field.

arxiv, deep learning, neural network, (19 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback