systematic difference
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (7 more...)
Advanced AI models are not always better than simple ones
Understanding genetic perturbations, when scientists intentionally alter genes to see how this affects cells, is key to understanding what our genes do and how they are controlled. This knowledge has important applications in cell engineering and in developing new treatments. Today, scientists can test many different genetic perturbations in the lab. But there are so many possible combinations that it is impossible to test them all. AI and machine learning have created the opportunity to use information from large biological datasets to predict what will happen when a gene is changed -- even if that change has never been tested in the laboratory.
- North America > Canada > Quebec > Montreal (0.05)
- Asia > China > Guangdong Province > Guangzhou (0.05)
What's the Difference? Supporting Users in Identifying the Effects of Prompt and Model Changes Through Token Patterns
Hedderich, Michael A., Wang, Anyi, Zhao, Raoyuan, Eichin, Florian, Fischer, Jonas, Plank, Barbara
Prompt engineering for large language models is challenging, as even small prompt perturbations or model changes can significantly impact the generated output texts. Existing evaluation methods of LLM outputs, either automated metrics or human evaluation, have limitations, such as providing limited insights or being labor-intensive. We propose Spotlight, a new approach that combines both automation and human analysis. Based on data mining techniques, we automatically distinguish between random (decoding) variations and systematic differences in language model outputs. This process provides token patterns that describe the systematic differences and guide the user in manually analyzing the effects of their prompts and changes in models efficiently. We create three benchmarks to quantitatively test the reliability of token pattern extraction methods and demonstrate that our approach provides new insights into established prompt data. From a human-centric perspective, through demonstration studies and a user study, we show that our token pattern approach helps users understand the systematic differences of language model outputs. We are further able to discover relevant differences caused by prompt and model changes (e.g. related to gender or culture), thus supporting the prompt engineering process and human-centric model behavior research.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Austria > Vienna (0.14)
- (27 more...)
- Questionnaire & Opinion Survey (1.00)
- Research Report > Experimental Study (0.67)
Causal Inference on Outcomes Learned from Text
Modarressi, Iman, Spiess, Jann, Venugopal, Amar
We propose a machine-learning tool that yields causal inference on text in randomized trials. Based on a simple econometric framework in which text may capture outcomes of interest, our procedure addresses three questions: First, is the text affected by the treatment? Second, which outcomes is the effect on? And third, how complete is our description of causal effects? To answer all three questions, our approach uses large language models (LLMs) that suggest systematic differences across two groups of text documents and then provides valid inference based on costly validation. Specifically, we highlight the need for sample splitting to allow for statistical validation of LLM outputs, as well as the need for human labeling to validate substantive claims about how documents differ across groups. We illustrate the tool in a proof-of-concept application using abstracts of academic manuscripts.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (2 more...)
- Research Report > Strength High (1.00)
- Research Report > Experimental Study (1.00)
- Government (0.46)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.45)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Counterpart Fairness -- Addressing Systematic between-group Differences in Fairness Evaluation
Wang, Yifei, Zhou, Zhengyang, Wang, Liqin, Laurentiev, John, Hou, Peter, Zhou, Li, Hong, Pengyu
When using machine learning (ML) to aid decision-making, it is critical to ensure that an algorithmic decision is fair, i.e., it does not discriminate against specific individuals/groups, particularly those from underprivileged populations. Existing group fairness methods require equal group-wise measures, which however fails to consider systematic between-group differences. The confounding factors, which are non-sensitive variables but manifest systematic differences, can significantly affect fairness evaluation. To tackle this problem, we believe that a fairness measurement should be based on the comparison between counterparts (i.e., individuals who are similar to each other with respect to the task of interest) from different groups, whose group identities cannot be distinguished algorithmically by exploring confounding factors. We have developed a propensity-score-based method for identifying counterparts, which prevents fairness evaluation from comparing "oranges" with "apples". In addition, we propose a counterpart-based statistical fairness index, termed Counterpart-Fairness (CFair), to assess fairness of ML models. Various empirical studies were conducted to validate the effectiveness of CFair. We publish our code at \url{https://github.com/zhengyjo/CFair}.
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > Virginia (0.04)
- North America > United States > New York (0.04)
- (5 more...)
- Law (1.00)
- Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
- Banking & Finance (1.00)
- (4 more...)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)