AITopics | Chen, Yanda

Collaborating Authors

Chen, Yanda

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Social Orientation: A New Feature for Dialogue Analysis

Morrill, Todd, Deng, Zhaoyuan, Chen, Yanda, Ananthram, Amith, Leach, Colin Wayne, McKeown, Kathleen

arXiv.org Artificial IntelligenceFeb-25-2024

There are many settings where it is useful to predict and explain the success or failure of a dialogue. Circumplex theory from psychology models the social orientations (e.g., Warm-Agreeable, Arrogant-Calculating) of conversation participants and can be used to predict and explain the outcome of social interactions. Our work is novel in its systematic application of social orientation tags to modeling conversation outcomes. In this paper, we introduce a new data set of dialogue utterances machine-labeled with social orientation tags. We show that social orientation tags improve task performance, especially in low-resource settings, on both English and Chinese language benchmarks. We also demonstrate how social orientation tags help explain the outcomes of social interactions when used in neural models. Based on these results showing the utility of social orientation tags for dialogue outcome prediction tasks, we release our data sets, code, and models that are fine-tuned to predict social orientation tags on dialogue utterances.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2403.0477

Country:

North America > United States (1.00)
Europe (0.93)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre: Research Report > New Finding (0.88)

Industry:

Government (0.68)
Information Technology (0.67)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Cognitive Science (0.68)

Add feedback

Parallel Structures in Pre-training Data Yield In-Context Learning

Chen, Yanda, Zhao, Chen, Yu, Zhou, McKeown, Kathleen, He, He

arXiv.org Artificial IntelligenceFeb-19-2024

Pre-trained language models (LMs) are capable of in-context learning (ICL): they can adapt to a task with only a few examples given in the prompt without any parameter update. However, it is unclear where this capability comes from as there is a stark distribution shift between pre-training text and ICL prompts. In this work, we study what patterns of the pre-training data contribute to ICL. We find that LMs' ICL ability depends on $\textit{parallel structures}$ in the pre-training data -- pairs of phrases following similar templates in the same context window. Specifically, we detect parallel structures by checking whether training on one phrase improves prediction of the other, and conduct ablation experiments to study their effect on ICL. We show that removing parallel structures in the pre-training data reduces LMs' ICL accuracy by 51% (vs 2% from random ablation). This drop persists even when excluding common patterns such as n-gram repetitions and long-range dependency, showing the diversity and generality of parallel structures. A closer look at the detected parallel structures indicates that they cover diverse linguistic tasks and span long distances in the data.

large language model, machine learning, pattern recognition, (22 more...)

arXiv.org Artificial Intelligence

2402.1253

Country:

North America > United States (1.00)
Asia > Middle East > Palestine (0.14)

Genre: Research Report (0.82)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Health & Medicine (0.71)
Government > Military (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.34)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.30)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.30)

Add feedback

Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning

Chen, Yanda, Singh, Chandan, Liu, Xiaodong, Zuo, Simiao, Yu, Bin, He, He, Gao, Jianfeng

arXiv.org Artificial IntelligenceJan-25-2024

Large language models (LLMs) often generate convincing, fluent explanations. However, different from humans, they often generate inconsistent explanations on different inputs. For example, an LLM may generate the explanation "all birds can fly" when answering the question "Can sparrows fly?" but meanwhile answer "no" to the related question "Can penguins fly?". Explanations should be consistent across related examples so that they allow a human to simulate the LLM's decision process on multiple examples. We propose explanation-consistency finetuning (EC-finetuning), a method that adapts LLMs to generate more consistent natural-language explanations on related examples. EC-finetuning involves finetuning LLMs on synthetic data that is carefully constructed to contain consistent explanations. Across a variety of question-answering datasets in various domains, EC-finetuning yields a 10.0% relative explanation consistency improvement on four finetuning datasets, and generalizes to seven out-of-distribution datasets not seen during finetuning (+4.5% relative). Code is available at https://github.com/yandachen/explanation-consistency-finetuning .

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2401.13986

Country:

Europe (0.70)
North America > United States > California (0.14)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Health & Medicine (0.95)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations

Chen, Yanda, Zhong, Ruiqi, Ri, Narutatsu, Zhao, Chen, He, He, Steinhardt, Jacob, Yu, Zhou, McKeown, Kathleen

arXiv.org Artificial IntelligenceJul-17-2023

Large language models (LLMs) are trained to imitate humans to explain human decisions. However, do LLMs explain themselves? Can they help humans build mental models of how LLMs process different inputs? To answer these questions, we propose to evaluate $\textbf{counterfactual simulatability}$ of natural language explanations: whether an explanation can enable humans to precisely infer the model's outputs on diverse counterfactuals of the explained input. For example, if a model answers "yes" to the input question "Can eagles fly?" with the explanation "all birds can fly", then humans would infer from the explanation that it would also answer "yes" to the counterfactual input "Can penguins fly?". If the explanation is precise, then the model's answer should match humans' expectations. We implemented two metrics based on counterfactual simulatability: precision and generality. We generated diverse counterfactuals automatically using LLMs. We then used these metrics to evaluate state-of-the-art LLMs (e.g., GPT-4) on two tasks: multi-hop factual reasoning and reward modeling. We found that LLM's explanations have low precision and that precision does not correlate with plausibility. Therefore, naively optimizing human approvals (e.g., RLHF) may not be a sufficient solution.

artificial intelligence, explanation, natural language, (14 more...)

arXiv.org Artificial Intelligence

2307.08678

Country:

North America > United States (1.00)
Africa > Middle East > Morocco (0.15)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Consumer Health (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (1.00)
Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

On the Relation between Sensitivity and Accuracy in In-context Learning

Chen, Yanda, Zhao, Chen, Yu, Zhou, McKeown, Kathleen, He, He

arXiv.org Artificial IntelligenceFeb-17-2023

In-context learning (ICL) suffers from oversensitivity to the prompt, making it unreliable in real-world scenarios. We study the sensitivity of ICL with respect to multiple perturbation types. First, we find that label bias obscures the true sensitivity, and therefore prior work may have significantly underestimated ICL sensitivity. Second, we observe a strong negative correlation between ICL sensitivity and accuracy: predictions sensitive to perturbations are less likely to be correct. Motivated by these findings, we propose \textsc{SenSel}, a few-shot selective prediction method that abstains from sensitive predictions. Experiments on ten classification datasets show that \textsc{SenSel} consistently outperforms two commonly used confidence-based and entropy-based baselines on abstention decisions.

machine learning, natural language, sensitivity, (16 more...)

arXiv.org Artificial Intelligence

2209.07661

Genre: Research Report (0.83)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
(2 more...)

Add feedback

In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models

Huang, Yukun, Chen, Yanda, Yu, Zhou, McKeown, Kathleen

arXiv.org Artificial IntelligenceDec-20-2022

Given the success with in-context learning of large pre-trained language models, we introduce in-context learning distillation to transfer in-context few-shot learning ability from large models to smaller models. We propose to combine in-context learning objectives with language modeling objectives to distill both the ability to read in-context examples and task knowledge to the smaller models. We perform in-context learning distillation under two different few-shot learning paradigms: Meta In-context Tuning (Meta-ICT) and Multitask In-context Tuning (Multitask-ICT). Multitask-ICT performs better on multitask few-shot learning but also requires more computation than Meta-ICT. Our method shows consistent improvements for both Meta-ICT and Multitask-ICT on two benchmarks: LAMA and CrossFit. Our extensive experiments and analysis reveal that in-context learning objectives and language modeling objectives are complementary under the Multitask-ICT paradigm. In-context learning objectives achieve the best performance when combined with language modeling objectives.

distillation, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2212.1067

Country: North America (0.29)

Genre: Research Report (0.82)

Industry: Health & Medicine > Consumer Health (0.37)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Add feedback