AITopics | Chen, Catherine

Collaborating Authors

Chen, Catherine

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Cross-Encoder Rediscovers a Semantic Variant of BM25

Lu, Meng, Chen, Catherine, Eickhoff, Carsten

arXiv.org Artificial IntelligenceFeb-6-2025

Neural Ranking Models (NRMs) have rapidly advanced state-of-the-art performance on information retrieval tasks. In this work, we investigate a Cross-Encoder variant of MiniLM to determine which relevance features it computes and where they are stored. We find that it employs a semantic variant of the traditional BM25 in an interpretable manner, featuring localized components: (1) Transformer attention heads that compute soft term frequency while controlling for term saturation and document length effects, and (2) a low-rank component of its embedding matrix that encodes inverse document frequency information for the vocabulary. This suggests that the Cross-Encoder uses the same fundamental mechanisms as BM25, but further leverages their capacity to capture semantics for improved retrieval performance. The granular understanding lays the groundwork for model editing to enhance model transparency, addressing safety concerns, and improving scalability in training and real-world applications.

information retrieval, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2502.04645

Country:

North America > United States > Rhode Island (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Political-LLM: Large Language Models in Political Science

Li, Lincan, Li, Jiaqi, Chen, Catherine, Gui, Fred, Yang, Hongjia, Yu, Chenxiao, Wang, Zhengguang, Cai, Jianing, Zhou, Junlong Aaron, Shen, Bolin, Qian, Alex, Chen, Weixin, Xue, Zhongkai, Sun, Lichao, He, Lifang, Chen, Hanjie, Ding, Kaize, Du, Zijian, Mu, Fangzhou, Pei, Jiaxin, Zhao, Jieyu, Swayamdipta, Swabha, Neiswanger, Willie, Wei, Hua, Hu, Xiyang, Zhu, Shixiang, Chen, Tianlong, Lu, Yingzhou, Shi, Yang, Qin, Lianhui, Fu, Tianfan, Tu, Zhengzhong, Yang, Yuzhe, Yoo, Jaemin, Zhang, Jiaheng, Rossi, Ryan, Zhan, Liang, Zhao, Liang, Ferrara, Emilio, Liu, Yan, Huang, Furong, Zhang, Xiangliang, Rothenberg, Lawrence, Ji, Shuiwang, Yu, Philip S., Zhao, Yue, Dong, Yushun

arXiv.org Artificial IntelligenceDec-9-2024

In recent years, large language models (LLMs) have been widely adopted in political science tasks such as election prediction, sentiment analysis, policy impact assessment, and misinformation detection. Meanwhile, the need to systematically understand how LLMs can further revolutionize the field also becomes urgent. In this work, we--a multidisciplinary team of researchers spanning computer science and political science--present the first principled framework termed Political-LLM to advance the comprehensive understanding of integrating LLMs into computational political science. Specifically, we first introduce a fundamental taxonomy classifying the existing explorations into two perspectives: political science and computational methodologies. In particular, from the political science perspective, we highlight the role of LLMs in automating predictive and generative tasks, simulating behavior dynamics, and improving causal inference through tools like counterfactual generation; from a computational perspective, we introduce advancements in data preparation, fine-tuning, and evaluation methods for LLMs that are tailored to political contexts. We identify key challenges and future directions, emphasizing the development of domain-specific datasets, addressing issues of bias and fairness, incorporating human expertise, and redefining evaluation criteria to align with the unique requirements of computational political science. Political-LLM seeks to serve as a guidebook for researchers to foster an informed, ethical, and impactful use of Artificial Intelligence in political science. Our online resource is available at: http://political-llm.org/. Corresponding authors: Yushun Dong (yd24f@fsu.edu) is with the Department of Computer Science, Florida State University; Yue Zhao (yzhao010@usc.edu) is with the Department of Computer Science, University of Southern California; Fred Gui (pgui@lsu.edu) is with the Department of Political Science, Louisiana State University; Catherine Chen (catherinechen@lsu.edu) is with the Manship School of Mass Communication and the Department of Political Science, Louisiana State University.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.06864

Country:

North America > United States > California (0.86)
North America > United States > Louisiana > East Baton Rouge Parish > Baton Rouge (0.44)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)
(2 more...)

Industry:

Media > News (1.00)
Law (1.00)
Health & Medicine (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Outlier Dimensions Encode Task-Specific Knowledge

Rudman, William, Chen, Catherine, Eickhoff, Carsten

arXiv.org Artificial IntelligenceJan-23-2024

Representations Two seminal works discovered the presence of "outlier" of transformer-based LLMs are dominated by a (Kovaleva et al., 2021) or "rogue" (Timkey few outlier dimensions whose variance and magnitude and van Schijndel, 2021) dimensions in pre-trained are significantly larger than the rest of the LLMs. Following Kovaleva et al. (2021) and Puccetti model's representations (Timkey and van Schijndel, et al. (2022), we define outlier dimensions 2021; Kovaleva et al., 2021). Previous studies as dimensions in LLM representations whose variance devoted to the formation of outlier dimensions in is at least 5x larger than the average variance pre-trained LLMs suggest that imbalanced token in the global vector space. The formation of outlier frequency causes an uneven distribution of variance dimensions is caused by a token imbalance in the in model representations (Gao et al., 2019; Puccetti pre-training data with more common tokens having et al., 2022). Although many argue that outlier dimensions much higher norms in the outlier dimensions "disrupt" model representations, making compared to rare tokens (Gao et al., 2019; Puccetti them less interpretable and hindering model performance, et al., 2022). Although the community agrees on ablating outlier dimensions has been shown the origin of outlier dimensions, their impact on to cause downstream performance to decrease dramatically the representational quality of pre-trained LLMs (Kovaleva et al., 2021; Puccetti et al., has been widely contested.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2310.17715

Country:

North America > United States > Texas (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

A Vision-free Baseline for Multimodal Grammar Induction

Li, Boyi, Corona, Rodolfo, Mangalam, Karttikeya, Chen, Catherine, Flaherty, Daniel, Belongie, Serge, Weinberger, Kilian Q., Malik, Jitendra, Darrell, Trevor, Klein, Dan

arXiv.org Artificial IntelligenceOct-31-2023

Past work has shown that paired vision-language signals substantially improve grammar induction in multimodal datasets such as MSCOCO. We investigate whether advancements in large language models (LLMs) that are only trained with text could provide strong assistance for grammar induction in multimodal settings. We find that our text-only approach, an LLM-based C-PCFG (LC-PCFG), outperforms previous multi-modal methods, and achieves state-of-the-art grammar induction performance for various multimodal datasets. Compared to image-aided grammar induction, LC-PCFG outperforms the prior state-of-the-art by 7.9 Corpus-F1 points, with an 85% reduction in parameter count and 1.7x faster training speed. Across three video-assisted grammar induction benchmarks, LC-PCFG outperforms prior state-of-the-art by up to 7.7 Corpus-F1, with 8.8x faster training. These results shed light on the notion that text-only language models might include visually grounded cues that aid in grammar induction in multimodal contexts. Moreover, our results emphasize the importance of establishing a robust vision-free baseline when evaluating the benefit of multimodal approaches.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2212.10564

Country: Europe (0.46)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Are Layout-Infused Language Models Robust to Layout Distribution Shifts? A Case Study with Scientific Documents

Chen, Catherine, Shen, Zejiang, Klein, Dan, Stanovsky, Gabriel, Downey, Doug, Lo, Kyle

arXiv.org Artificial IntelligenceJun-1-2023

Recent work has shown that infusing layout features into language models (LMs) improves processing of visually-rich documents such as scientific papers. Layout-infused LMs are often evaluated on documents with familiar layout features (e.g., papers from the same publisher), but in practice models encounter documents with unfamiliar distributions of layout features, such as new combinations of text sizes and styles, or new spatial configurations of textual elements. In this work we test whether layout-infused LMs are robust to layout distribution shifts. As a case study we use the task of scientific document structure recovery, segmenting a scientific paper into its structural categories (e.g., "title", "caption", "reference"). To emulate distribution shifts that occur in practice we re-partition the GROTOAP2 dataset. We find that under layout distribution shifts model performance degrades by up to 20 F1. Simple training strategies, such as increasing training diversity, can reduce this degradation by over 35% relative F1; however, models fail to reach in-distribution performance in any tested out-of-distribution conditions. This work highlights the need to consider layout distribution shifts during model evaluation, and presents a methodology for conducting such evaluations.

distribution shift, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2306.01058

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Learning to Apply Schematic Knowledge to Novel Instances

Chen, Catherine, Lu, Qihong, Beukers, Andre, Baldassano, Chris, Norman, Kenneth

arXiv.org Artificial IntelligenceFeb-24-2019

Humans have schematic knowledge of how certain types of events unfold (e.g. coffeeshop visits) that can readily be generalized to new instances of those events. Schematic knowledge allows humans to perform role-filler binding, the task of associating schematic roles (e.g. "barista") with specific fillers (e.g. "Bob"). Here we examined whether and how recurrent neural networks learn to do this. We procedurally generated stories from an underlying generative graph, and trained networks on role-filler binding question-answering tasks. We tested whether networks can learn to maintain filler information on their own, and whether they can generalize to fillers that they have not seen before. We studied networks by analyzing their behavior and decoding their memory states. We found that a network's success in learning role-filler binding depends on both the breadth of roles introduced during training, and the network's memory architecture. In our decoding analyses, we observed a close relationship between the information we could decode from various parts of network architecture, and the information the network could recall.

deep learning, filler, neural network, (20 more...)

arXiv.org Artificial Intelligence

1902.09006

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.68)

Industry:

Government > Military (0.46)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback