Goto

Collaborating Authors

 experimental evidence


Deep Neural Collapse Is Provably Optimal for the Deep Unconstrained Features Model

Neural Information Processing Systems

Neural collapse (NC) refers to the surprising structure of the last layer of deep neural networks in the terminal phase of gradient descent training. Recently, an increasing amount of experimental evidence has pointed to the propagation of NC to earlier layers of neural networks. However, while the NC in the last layer is well studied theoretically, much less is known about its multi-layered counterpart - deep neural collapse (DNC). In particular, existing work focuses either on linear layers or only on the last two layers at the price of an extra assumption. Our work fills this gap by generalizing the established analytical framework for NC - the unconstrained features model - to multiple non-linear layers. Our key technical contribution is to show that, in a deep unconstrained features model, the unique global optimum for binary classification exhibits all the properties typical of DNC. This explains the existing experimental evidence of DNC. We also empirically show that (i) by optimizing deep unconstrained features models via gradient descent, the resulting solution agrees well with our theory, and (ii) trained networks recover the unconstrained features suitable for the occurrence of DNC, thus supporting the validity of this modeling principle.


Tool or Tutor? Experimental evidence from AI deployment in cancer diagnosis

He, Vivianna Fang, Li, Sihan, Puranam, Phanish

arXiv.org Artificial Intelligence

Professionals increasingly use Artificial Intelligence (AI) to enhance their capabilities and assist with task execution. While prior research has examined these uses separately, their potential interaction remains underexplored. We propose that AI-driven training ("tutor" effect) and AI-assisted task completion ("tool" effect) can be complementary and test this hypothesis in the context of lung cancer diagnosis. In a field experiment with 336 medical students, we manipulated AI deployment in training, in practice, and in both. Our findings reveal that while AI-integrated training and AI assistance independently improved diagnostic performance, their combination yielded the highest accuracy. These results underscore AI's dual role in enhancing human performance through both learning and real-time support, offering insights into AI deployment in professional settings where human expertise remains essential.


Contextualizing biological perturbation experiments through language

Wu, Menghua, Littman, Russell, Levine, Jacob, Qiu, Lin, Biancalani, Tommaso, Richmond, David, Huetter, Jan-Christian

arXiv.org Artificial Intelligence

High-content perturbation experiments allow scientists to probe biomolecular systems at unprecedented resolution, but experimental and analysis costs pose significant barriers to widespread adoption. Machine learning has the potential to guide efficient exploration of the perturbation space and extract novel insights from these data. However, current approaches neglect the semantic richness of the relevant biology, and their objectives are misaligned with downstream biological analyses. In this paper, we hypothesize that large language models (LLMs) present a natural medium for representing complex biological relationships and rationalizing experimental outcomes. We propose PerturbQA, a benchmark for structured reasoning over perturbation experiments. Unlike current benchmarks that primarily interrogate existing knowledge, PerturbQA is inspired by open problems in perturbation modeling: prediction of differential expression and change of direction for unseen perturbations, and gene set enrichment. We evaluate state-of-the-art machine learning and statistical approaches for modeling perturbations, as well as standard LLM reasoning strategies, and we find that current methods perform poorly on PerturbQA. As a proof of feasibility, we introduce Summer (SUMMarize, retrievE, and answeR, a simple, domain-informed LLM framework that matches or exceeds the current state-of-the-art. Our code and data are publicly available at https://github.com/genentech/PerturbQA.


Deep Neural Collapse Is Provably Optimal for the Deep Unconstrained Features Model

Neural Information Processing Systems

Neural collapse (NC) refers to the surprising structure of the last layer of deep neural networks in the terminal phase of gradient descent training. Recently, an increasing amount of experimental evidence has pointed to the propagation of NC to earlier layers of neural networks. However, while the NC in the last layer is well studied theoretically, much less is known about its multi-layered counterpart - deep neural collapse (DNC). In particular, existing work focuses either on linear layers or only on the last two layers at the price of an extra assumption. Our work fills this gap by generalizing the established analytical framework for NC - the unconstrained features model - to multiple non-linear layers.


Reviews: One-Sided Unsupervised Domain Mapping

Neural Information Processing Systems

This paper tackles the problem of unsupervised domain adaptation. The paper introduces a new constraint, which compares samples and enforces high cross-domain correlation between the matching distances computed in each domain. An alternative to pairwise distance is provided, for cases in which we only have access to one data sample at a time. In this case, the same rationale can be applied by splitting the images and comparing the distances between their left/right or up/down halves in both domains. The final unsupervised domain adaptation model is trained by combining previously introduced losses (adversarial loss and circularity loss) with the new distance loss, showing that the new constraint is effective and allows for one directional mapping.


Scaling Laws for Economic Productivity: Experimental Evidence in LLM-Assisted Translation

Merali, Ali

arXiv.org Artificial Intelligence

The amount of training compute used by frontier large language models (LLMs) increased by 5000x between the release of GPT-2 in 2019 and GPT-4 in 2023 and estimates from Epoch AI suggest a similar increase over the next six years. How does this massive increase in model training compute map onto performance? The empirical machine learning literature has derived remarkably consistent'scaling laws' suggesting a strong relationship between a model's training compute and model perplexity, a measure of model loss, across more than seven orders of magnitude. But there is so far a very limited understanding of how this reduction in perplexity affects key economic and social outcomes. This paper aims to offer the first experimental evidence on this question by conducting a randomized controlled trial(RCT) involving 300 professional translators conducting 1800 tasks of varying complexities. The participants were randomly assigned to either treatment groups where they could utilize one of thirteen LLMs of differing model training compute to help them complete their task or to a control group where they completed tasks without any AI assistance. Participants face high-powered incentives with significant bonus payments for high-quality tasks as evaluated by three experienced professionals in the field. The key outcome variables, therefore, were how translator's time taken, quality of tasks completed, and earnings per minute (inclusive of bonuses) varied by model training compute.


Experimental Evidence on Negative Impact of Generative AI on Scientific Learning Outcomes

Ju, Qirui

arXiv.org Artificial Intelligence

In this study, I explored the impact of Generative AI on learning efficacy in academic reading materials using experimental methods. College-educated participants engaged in three cycles of reading and writing tasks. After each cycle, they responded to comprehension questions related to the material. After adjusting for background knowledge and demographic factors, complete reliance on AI for writing tasks led to a 25.1% reduction in accuracy. In contrast, AI-assisted reading resulted in a 12% decline. Interestingly, using AI for summarization significantly improved both quality and output. Accuracy exhibited notable variance in the AI-assisted section. Further analysis revealed that individuals with a robust background in the reading topic and superior reading/writing skills benefitted the most. I conclude the research by discussing educational policy implications, emphasizing the need for educators to warn students about the dangers of over-dependence on AI and provide guidance on its optimal use in educational settings.


PlasmoFAB: A Benchmark to Foster Machine Learning for Plasmodium falciparum Protein Antigen Candidate Prediction

Ditz, Jonas Christian, Wistuba-Hamprecht, Jacqueline, Maier, Timo, Fendel, Rolf, Pfeifer, Nico, Reuter, Bernhard

arXiv.org Artificial Intelligence

Motivation: Machine learning methods can be used to support scientific discovery in healthcare-related research fields. However, these methods can only be reliably used if they can be trained on high-quality and curated datasets. Currently, no such dataset for the exploration of Plasmodium falciparum protein antigen candidates exists. The parasite Plasmodium falciparum causes the infectious disease malaria. Thus, identifying potential antigens is of utmost importance for the development of antimalarial drugs and vaccines. Since exploring antigen candidates experimentally is an expensive and time-consuming process, applying machine learning methods to support this process has the potential to accelerate the development of drugs and vaccines, which are needed for fighting and controlling malaria. Results: We developed PlasmoFAB, a curated benchmark that can be used to train machine learning methods for the exploration of Plasmodium falciparum protein antigen candidates. We combined an extensive literature search with domain expertise to create high-quality labels for Plasmodium falciparum specific proteins that distinguish between antigen candidates and intracellular proteins. Additionally, we used our benchmark to compare different well-known prediction models and available protein localization prediction services on the task of identifying protein antigen candidates. We show that available general-purpose services are unable to provide sufficient performance on identifying protein antigen candidates and are outperformed by our models that were trained on this tailored data. Availability: PlasmoFAB is publicly available on Zenodo with DOI 10.5281/zenodo.7433087. Furthermore, all scripts that were used in the creation of PlasmoFAB and the training and evaluation of machine learning models are open source and publicly available on GitHub here: https://github.com/msmdev/PlasmoFAB.


Experimental Evidence that Empowerment May Drive Exploration in Sparse-Reward Environments

Massari, Francesco, Biehl, Martin, Meeden, Lisa, Kanai, Ryota

arXiv.org Artificial Intelligence

Reinforcement Learning (RL) is known to be often unsuccessful in environments with sparse extrinsic rewards. A possible countermeasure is to endow RL agents with an intrinsic reward function, or 'intrinsic motivation', which rewards the agent based on certain features of the current sensor state. An intrinsic reward function based on the principle of empowerment assigns rewards proportional to the amount of control the agent has over its own sensors. We implemented a variation on a recently proposed intrinsically motivated agent, which we refer to as the 'curious' agent, and an empowerment-inspired agent. The former leverages sensor state encoding with a variational autoencoder, while the latter predicts the next sensor state via a variational information bottleneck. We compared the performance of both agents to that of an advantage actor-critic baseline in four sparse reward grid worlds. Both the empowerment agent and its curious competitor seem to benefit to similar extents from their intrinsic rewards. This provides some experimental support to the conjecture that empowerment can be used to drive exploration.


A Computational Model of Reasoning from the Clinical Literature

AI Magazine

This article explores the premise that a formalized representation of empirical studies can play a central role in computer- based decision support. The specific motivations underlying this research include the following propositions: (1) Reasoning from experimental evidence contained in the clinical literature is central to the decisions physicians make in patient care. Furthermore, the model can help us better understand the general principles of reasoning from experimental evidence both in medicine and other domains. Roundsman is a developmental computer system that draws on structured representations of the clinical literature to critique plans for the management of primary breast cancer. Roundsman is able to produce patient-specific analyses of breast cancer-management options based on the 24 clinical studies currently encoded in its knowledge base.