AITopics | Schrouff, Jessica

Collaborating Authors

Schrouff, Jessica

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Evaluating Model Bias Requires Characterizing its Mistakes

Albuquerque, Isabela, Schrouff, Jessica, Warde-Farley, David, Cemgil, Taylan, Gowal, Sven, Wiles, Olivia

arXiv.org Machine LearningJul-15-2024

The ability to properly benchmark model performance in the face of spurious correlations is important to both build better predictors and increase confidence that models are operating as intended. We demonstrate that characterizing (as opposed to simply quantifying) model mistakes across subgroups is pivotal to properly reflect model biases, which are ignored by standard metrics such as worst-group accuracy or accuracy gap. Inspired by the hypothesis testing framework, we introduce SkewSize, a principled and flexible metric that captures bias from mistakes in a model's predictions. It can be used in multi-class settings or generalised to the open vocabulary setting of generative models. SkewSize is an aggregation of the effect size of the interaction between two categorical variables: the spurious variable representing the bias attribute and the model's prediction. We demonstrate the utility of SkewSize in multiple settings including: standard vision models trained on synthetic data, vision models trained on ImageNet, and large scale vision-and-language models from the BLIP-2 family. In each case, the proposed SkewSize is able to highlight biases not captured by other metrics, while also providing insights on the impact of recently proposed techniques, such as instruction tuning.

machine learning, natural language, prediction, (15 more...)

arXiv.org Machine Learning

2407.10633

Country:

North America > United States (0.28)
Europe > Austria > Vienna (0.14)

Genre:

Research Report > New Finding (0.71)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

FunBO: Discovering Acquisition Functions for Bayesian Optimization with FunSearch

Aglietti, Virginia, Ktena, Ira, Schrouff, Jessica, Sgouritsa, Eleni, Ruiz, Francisco J. R., Malek, Alan, Bellot, Alexis, Chiappa, Silvia

arXiv.org Machine LearningJul-1-2024

The sample efficiency of Bayesian optimization algorithms depends on carefully crafted acquisition functions (AFs) guiding the sequential collection of function evaluations. The best-performing AF can vary significantly across optimization problems, often requiring ad-hoc and problem-specific choices. This work tackles the challenge of designing novel AFs that perform well across a variety of experimental settings. Based on FunSearch, a recent work using Large Language Models (LLMs) for discovery in mathematical sciences, we propose FunBO, an LLM-based method that can be used to learn new AFs written in computer code by leveraging access to a limited number of evaluations for a set of objective functions. We provide the analytic expression of all discovered AFs and evaluate them on various global optimization benchmarks and hyperparameter optimization tasks. We show how FunBO identifies AFs that generalize well in and out of the training distribution of functions, thus outperforming established general-purpose AFs and achieving competitive performance against AFs that are customized to specific function types and are learned via transfer-learning algorithms.

large language model, machine learning, natural language, (22 more...)

arXiv.org Machine Learning

2406.04824

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Mind the Graph When Balancing Data for Fairness or Robustness

Schrouff, Jessica, Bellot, Alexis, Rannen-Triki, Amal, Malek, Alan, Albuquerque, Isabela, Gretton, Arthur, D'Amour, Alexander, Chiappa, Silvia

arXiv.org Artificial IntelligenceJun-25-2024

Failures of fairness or robustness in machine learning predictive settings can be due to undesired dependencies between covariates, outcomes and auxiliary factors of variation. A common strategy to mitigate these failures is data balancing, which attempts to remove those undesired dependencies. In this work, we define conditions on the training distribution for data balancing to lead to fair or robust models. Our results display that, in many cases, the balanced distribution does not correspond to selectively removing the undesired dependencies in a causal graph of the task, leading to multiple failure modes and even interference with other mitigation techniques such as regularization. Overall, our results highlight the importance of taking the causal graph into account before performing data balancing.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2406.17433

Country:

North America > United States (0.68)
Europe > United Kingdom > England (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Detecting Shortcut Learning for Fair Medical AI using Shortcut Testing

Brown, Alexander, Tomasev, Nenad, Freyberg, Jan, Liu, Yuan, Karthikesalingam, Alan, Schrouff, Jessica

arXiv.org Artificial IntelligenceJun-16-2023

Machine learning (ML) holds great promise for improving healthcare, but it is critical to ensure that its use will not propagate or amplify health disparities. An important step is to characterize the (un)fairness of ML models - their tendency to perform differently across subgroups of the population - and to understand its underlying mechanisms. One potential driver of algorithmic unfairness, shortcut learning, arises when ML models base predictions on improper correlations in the training data. However, diagnosing this phenomenon is difficult, especially when sensitive attributes are causally linked with disease. Using multi-task learning, we propose the first method to assess and mitigate shortcut learning as a part of the fairness assessment of clinical ML systems, and demonstrate its application to clinical tasks in radiology and dermatology. Finally, our approach reveals instances when shortcutting is not responsible for unfairness, highlighting the need for a holistic approach to fairness mitigation in medical AI.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1038/s41467-023-39902-7

2207.10384

Country: North America > United States (0.67)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Dermatology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

Adapting to Latent Subgroup Shifts via Concepts and Proxies

Alabdulmohsin, Ibrahim, Chiou, Nicole, D'Amour, Alexander, Gretton, Arthur, Koyejo, Sanmi, Kusner, Matt J., Pfohl, Stephen R., Salaudeen, Olawale, Schrouff, Jessica, Tsai, Katherine

arXiv.org Artificial IntelligenceDec-21-2022

We address the problem of unsupervised domain adaptation when the source domain differs from the target domain because of a shift in the distribution of a latent subgroup. When this subgroup confounds all observed data, neither covariate shift nor label shift assumptions apply. We show that the optimal target predictor can be non-parametrically identified with the help of concept and proxy variables available only in the source domain, and unlabeled data from the target. The identification results are constructive, immediately suggesting an algorithm for estimating the optimal predictor in the target. For continuous observations, when this algorithm becomes impractical, we propose a latent variable model specific to the data generation process at hand. We show how the approach degrades as the size of the shift changes, and verify that it outperforms both covariate and label shift adjustment.

adaptation, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2212.11254

Country: North America (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Epidemiology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.66)

Add feedback

Maintaining fairness across distribution shift: do we have viable solutions for real-world applications?

Schrouff, Jessica, Harris, Natalie, Koyejo, Oluwasanmi, Alabdulmohsin, Ibrahim, Schnider, Eva, Opsahl-Ong, Krista, Brown, Alex, Roy, Subhrajit, Mincu, Diana, Chen, Christina, Dieng, Awa, Liu, Yuan, Natarajan, Vivek, Karthikesalingam, Alan, Heller, Katherine, Chiappa, Silvia, D'Amour, Alexander

arXiv.org Machine LearningFeb-2-2022

Fairness and robustness are often considered as orthogonal dimensions when evaluating machine learning models. However, recent work has revealed interactions between fairness and robustness, showing that fairness properties are not necessarily maintained under distribution shift. In healthcare settings, this can result in e.g. a model that performs fairly according to a selected metric in "hospital A" showing unfairness when deployed in "hospital B". While a nascent field has emerged to develop provable fair and robust models, it typically relies on strong assumptions about the shift, limiting its impact for real-world applications. In this work, we explore the settings in which recently proposed mitigation strategies are applicable by referring to a causal framing. Using examples of predictive models in dermatology and electronic health records, we show that real-world applications are complex and often invalidate the assumptions of such methods. Our work hence highlights technical, practical, and engineering gaps that prevent the development of robustly fair machine learning models for real-world applications. Finally, we discuss potential remedies at each step of the machine learning pipeline.

artificial intelligence, distribution shift, machine learning, (22 more...)

arXiv.org Machine Learning

2202.01034

Country:

North America > United States > California (0.14)
Europe > United Kingdom > England (0.14)
North America > United States > Michigan (0.14)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area > Dermatology (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Diagnostic Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Underspecification Presents Challenges for Credibility in Modern Machine Learning

D'Amour, Alexander, Heller, Katherine, Moldovan, Dan, Adlam, Ben, Alipanahi, Babak, Beutel, Alex, Chen, Christina, Deaton, Jonathan, Eisenstein, Jacob, Hoffman, Matthew D., Hormozdiari, Farhad, Houlsby, Neil, Hou, Shaobo, Jerfel, Ghassen, Karthikesalingam, Alan, Lucic, Mario, Ma, Yian, McLean, Cory, Mincu, Diana, Mitani, Akinori, Montanari, Andrea, Nado, Zachary, Natarajan, Vivek, Nielson, Christopher, Osborne, Thomas F., Raman, Rajiv, Ramasamy, Kim, Sayres, Rory, Schrouff, Jessica, Seneviratne, Martin, Sequeira, Shannon, Suresh, Harini, Veitch, Victor, Vladymyrov, Max, Wang, Xuezhi, Webster, Kellie, Yadlowsky, Steve, Yun, Taedong, Zhai, Xiaohua, Sculley, D.

arXiv.org Machine LearningNov-6-2020

ML models often exhibit unexpectedly poor behavior when they are deployed in real-world domains. We identify underspecification as a key reason for these failures. An ML pipeline is underspecified when it can return many predictors with equivalently strong held-out performance in the training domain. Underspecification is common in modern ML pipelines, such as those based on deep learning. Predictors returned by underspecified pipelines are often treated as equivalent based on their training domain performance, but we show here that such predictors can behave very differently in deployment domains. This ambiguity can lead to instability and poor model behavior in practice, and is a distinct failure mode from previously identified issues arising from structural mismatch between training and deployment domains. We show that this problem appears in a wide variety of practical ML pipelines, using examples from computer vision, medical imaging, natural language processing, clinical risk prediction based on electronic health records, and medical genomics. Our results show the need to explicitly account for underspecification in modeling pipelines that are intended for real-world deployment in any domain.

deep learning, neural network, predictor, (26 more...)

arXiv.org Machine Learning

2011.03395

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)
Health & Medicine > Therapeutic Area > Dermatology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

Add feedback

Inferring Javascript types using Graph Neural Networks

Schrouff, Jessica, Wohlfahrt, Kai, Marnette, Bruno, Atkinson, Liam

arXiv.org Machine LearningMay-16-2019

The recent use of `Big Code' with state-of-the-art deep learning methods offers promising avenues to ease program source code writing and correction. As a first step towards automatic code repair, we implemented a graph neural network model that predicts token types for Javascript programs. The predictions achieve an accuracy above $90\%$, which improves on previous similar work.

ast, deep learning, neural network, (21 more...)

arXiv.org Machine Learning

1905.06707

Country:

Europe > United Kingdom (0.14)
North America > United States (0.14)
Asia > India (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

Interpreting weight maps in terms of cognitive or clinical neuroscience: nonsense?

Schrouff, Jessica, Mourao-Miranda, Janaina

arXiv.org Machine LearningApr-30-2018

Since machine learning models have been applied to neuroimaging data, researchers have drawn conclusions from the derived weight maps. In particular, weight maps of classifiers between two conditions are often described as a proxy for the underlying signal differences between the conditions. Recent studies have however suggested that such weight maps could not reliably recover the source of the neural signals and even led to false positives (FP). In this work, we used semi-simulated data from ElectroCorticoGraphy (ECoG) to investigate how the signal-to-noise ratio and sparsity of the neural signal affect the similarity between signal and weights. We show that not all cases produce FP and that it is unlikely for FP features to have a high weight in most cases.

health & medicine, neurology, weight map, (20 more...)

arXiv.org Machine Learning

1804.11259

Genre: Research Report > New Finding (0.69)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Add feedback