Jiao, Yizhu
Open-Vocabulary Argument Role Prediction for Event Extraction
Jiao, Yizhu, Li, Sha, Xie, Yiqing, Zhong, Ming, Ji, Heng, Han, Jiawei
The argument role in event extraction refers to the relation between an event and an argument participating in it. Despite the great progress in event extraction, existing studies still depend on roles pre-defined by domain experts. These studies expose obvious weakness when extending to emerging event types or new domains without available roles. Therefore, more attention and effort needs to be devoted to automatically customizing argument roles. In this paper, we define this essential but under-explored task: open-vocabulary argument role prediction. The goal of this task is to infer a set of argument roles for a given event type. We propose a novel unsupervised framework, RolePred for this task. Specifically, we formulate the role prediction problem as an in-filling task and construct prompts for a pre-trained language model to generate candidate roles. By extracting and analyzing the candidate arguments, the event-specific roles are further merged and selected. To standardize the research of this task, we collect a new event extraction dataset from WikiPpedia including 142 customized argument roles with rich semantics. On this dataset, RolePred outperforms the existing methods by a large margin. Source code and dataset are available on our GitHub repository: https://github.com/yzjiao/RolePred
Sub-graph Contrast for Scalable Self-Supervised Graph Representation Learning
Jiao, Yizhu, Xiong, Yun, Zhang, Jiawei, Zhang, Yao, Zhang, Tianqi, Zhu, Yangyong
Graph representation learning has attracted lots of attention recently. Existing graph neural networks fed with the complete graph data are not scalable due to limited computation and memory costs. Thus, it remains a great challenge to capture rich information in large-scale graph data. Besides, these methods mainly focus on supervised learning and highly depend on node label information, which is expensive to obtain in the real world. As to unsupervised network embedding approaches, they overemphasize node proximity instead, whose learned representations can hardly be used in downstream application tasks directly. In recent years, emerging self-supervised learning provides a potential solution to address the aforementioned problems. However, existing self-supervised works also operate on the complete graph data and are biased to fit either global or very local (1-hop neighborhood) graph structures in defining the mutual information based loss terms. In this paper, a novel self-supervised representation learning method via Subgraph Contrast, namely \textsc{Subg-Con}, is proposed by utilizing the strong correlation between central nodes and their sampled subgraphs to capture regional structure information. Instead of learning on the complete input graph data, with a novel data augmentation strategy, \textsc{Subg-Con} learns node representations through a contrastive loss defined based on subgraphs sampled from the original graph instead. Compared with existing graph representation learning approaches, \textsc{Subg-Con} has prominent performance advantages in weaker supervision requirements, model learning scalability, and parallelization. Extensive experiments verify both the effectiveness and the efficiency of our work compared with both classic and state-of-the-art graph representation learning approaches on multiple real-world large-scale benchmark datasets from different domains.