Tu, Jingxuan
TRACE: Real-Time Multimodal Common Ground Tracking in Situated Collaborative Dialogues
VanderHoeven, Hannah, Bhalla, Brady, Khebour, Ibrahim, Youngren, Austin, Venkatesha, Videep, Bradford, Mariah, Fitzgerald, Jack, Mabrey, Carlos, Tu, Jingxuan, Zhu, Yifan, Lai, Kenneth, Jung, Changsoo, Pustejovsky, James, Krishnaswamy, Nikhil
In situations the following novel and unique contributions in a involving hybrid human-AI teams, although there single system: is an increasing desire for AIs that act as collaborators Real-time tracking of participant speech, actions, with humans, modern AI systems struggle to gesture, and gaze when engaging in a account for such mental states in their human interlocutors shared task; (Sap et al., 2022; Ullman, 2023) that might expose shared or conflicting beliefs, and thus predict On-the-fly interpretation and integration of and explain in-context behavior (Premack and multimodal signals to provide a complete Woodruff, 1978). Additionally, in realistic scenarios scene representation for inference; such as collaborative problem solving (Nelson, Simultaneous detection of asserted propositional 2013), beliefs are communicated not just through content and epistemic positioning to language, but through multimodal signals including infer task-relevant information for which evidence gestures, tone of voice, and interaction with has been raised, or which the group has the physical environment (VanderHoeven et al., agreed is factual; 2024b). Since one of the critical capabilities that makes human-human collaboration so successful is A modular, extensible architecture adaptable the human ability to interpret multiple coordinated to new tasks and scenarios.
Linguistically Conditioned Semantic Textual Similarity
Tu, Jingxuan, Xu, Keer, Yue, Liulu, Ye, Bingyang, Rim, Kyeongmin, Pustejovsky, James
Semantic textual similarity (STS) is a fundamental NLP task that measures the semantic similarity between a pair of sentences. In order to reduce the inherent ambiguity posed from the sentences, a recent work called Conditional STS (C-STS) has been proposed to measure the sentences' similarity conditioned on a certain aspect. Despite the popularity of C-STS, we find that the current C-STS dataset suffers from various issues that could impede proper evaluation on this task. In this paper, we reannotate the C-STS validation set and observe an annotator discrepancy on 55% of the instances resulting from the annotation errors in the original label, ill-defined conditions, and the lack of clarity in the task definition. After a thorough dataset analysis, we improve the C-STS task by leveraging the models' capability to understand the conditions under a QA task setting. With the generated answers, we present an automatic error identification pipeline that is able to identify annotation errors from the C-STS data with over 80% F1 score. We also propose a new method that largely improves the performance over baselines on the C-STS data by training the models with the answers. Finally we discuss the conditionality annotation based on the typed-feature structure (TFS) of entity types. We show in examples that the TFS is able to provide a linguistic foundation for constructing C-STS data with new conditions.
Common Ground Tracking in Multimodal Dialogue
Khebour, Ibrahim, Lai, Kenneth, Bradford, Mariah, Zhu, Yifan, Brutti, Richard, Tam, Christopher, Tu, Jingxuan, Ibarra, Benjamin, Blanchard, Nathaniel, Krishnaswamy, Nikhil, Pustejovsky, James
Within Dialogue Modeling research in AI and NLP, considerable attention has been spent on ``dialogue state tracking'' (DST), which is the ability to update the representations of the speaker's needs at each turn in the dialogue by taking into account the past dialogue moves and history. Less studied but just as important to dialogue modeling, however, is ``common ground tracking'' (CGT), which identifies the shared belief space held by all of the participants in a task-oriented dialogue: the task-relevant propositions all participants accept as true. In this paper we present a method for automatically identifying the current set of shared beliefs and ``questions under discussion'' (QUDs) of a group with a shared goal. We annotate a dataset of multimodal interactions in a shared physical space with speech transcriptions, prosodic features, gestures, actions, and facets of collaboration, and operationalize these features for use in a deep neural model to predict moves toward construction of common ground. Model outputs cascade into a set of formal closure rules derived from situated evidence and belief axioms and update operations. We empirically assess the contribution of each feature type toward successful construction of common ground relative to ground truth, establishing a benchmark in this novel, challenging task.
Dense Paraphrasing for Textual Enrichment
Tu, Jingxuan, Rim, Kyeongmin, Holderness, Eben, Pustejovsky, James
Understanding inferences and answering questions from text requires more than merely recovering surface arguments, adjuncts, or strings associated with the query terms. As humans, we interpret sentences as contextualized components of a narrative or discourse, by both filling in missing information, and reasoning about event consequences. In this paper, we define the process of rewriting a textual expression (lexeme or phrase) such that it reduces ambiguity while also making explicit the underlying semantics that is not (necessarily) expressed in the economy of sentence structure as Dense Paraphrasing (DP). We build the first complete DP dataset, provide the scope and design of the annotation task, and present results demonstrating how this DP process can enrich a source text to improve inferencing and QA task performance. The data and the source code will be publicly available.
TMR: Evaluating NER Recall on Tough Mentions
Tu, Jingxuan, Lignos, Constantine
We propose the Tough Mentions Recall (TMR) metrics to supplement traditional named entity recognition (NER) evaluation by examining recall on specific subsets of "tough" mentions: unseen mentions, those whose tokens or token/type combination were not observed in training, and type-confusable mentions, token sequences with multiple entity types in the test data. We demonstrate the usefulness of these metrics by evaluating corpora of English, Spanish, and Dutch using five recent neural architectures. We identify subtle differences between the performance of BERT and Flair on two English NER corpora and identify a weak spot in the performance of current models in Spanish. We conclude that the TMR metrics enable differentiation between otherwise similar-scoring systems and identification of patterns in performance that would go unnoticed from overall precision, recall, and F1.
COVID-19 Literature Knowledge Graph Construction and Drug Repurposing Report Generation
Wang, Qingyun, Li, Manling, Wang, Xuan, Parulian, Nikolaus, Han, Guangxing, Ma, Jiawei, Tu, Jingxuan, Lin, Ying, Zhang, Haoran, Liu, Weili, Chauhan, Aabhas, Guan, Yingjun, Li, Bangzheng, Li, Ruisong, Song, Xiangchen, Ji, Heng, Han, Jiawei, Chang, Shih-Fu, Pustejovsky, James, Rah, Jasmine, Liem, David, Elsayed, Ahmed, Palmer, Martha, Voss, Clare, Schneider, Cynthia, Onyshkevych, Boyan
To combat COVID-19, both clinicians and scientists need to digest the vast amount of relevant biomedical knowledge in literature to understand the disease mechanism and the related biological functions. We have developed a novel and comprehensive knowledge discovery framework, \textbf{COVID-KG} to extract fine-grained multimedia knowledge elements (entities, relations and events) from scientific literature. We then exploit the constructed multimedia knowledge graphs (KGs) for question answering and report generation, using drug repurposing as a case study. Our framework also provides detailed contextual sentences, subfigures and knowledge subgraphs as evidence. All of the data, KGs, reports, resources and shared services are publicly available.