AITopics | Yan, Zhaohui

Collaborating Authors

Yan, Zhaohui

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Potential and Limitations of LLMs in Capturing Structured Semantics: A Case Study on SRL

Cheng, Ning, Yan, Zhaohui, Wang, Ziming, Li, Zhijie, Yu, Jiaming, Zheng, Zilong, Tu, Kewei, Xu, Jinan, Han, Wenjuan

arXiv.org Artificial IntelligenceMay-10-2024

Large Language Models (LLMs) play a crucial role in capturing structured semantics to enhance language understanding, improve interpretability, and reduce bias. Nevertheless, an ongoing controversy exists over the extent to which LLMs can grasp structured semantics. To assess this, we propose using Semantic Role Labeling (SRL) as a fundamental task to explore LLMs' ability to extract structured semantics. In our assessment, we employ the prompting approach, which leads to the creation of our few-shot SRL parser, called PromptSRL. PromptSRL enables LLMs to map natural languages to explicit semantic structures, which provides an interpretable window into the properties of LLMs. We find interesting potential: LLMs can indeed capture semantic structures, and scaling-up doesn't always mirror potential. Additionally, limitations of LLMs are observed in C-arguments, etc. Lastly, we are surprised to discover that significant overlap in the errors is made by both LLMs and untrained humans, accounting for almost 30% of all errors.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2405.0641

Country:

Asia > China (0.29)
North America > United States > Colorado (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Joint Entity and Relation Extraction with Span Pruning and Hypergraph Neural Networks

Yan, Zhaohui, Yang, Songlin, Liu, Wei, Tu, Kewei

arXiv.org Artificial IntelligenceOct-26-2023

Entity and Relation Extraction (ERE) is an important task in information extraction. Recent marker-based pipeline models achieve state-of-the-art performance, but still suffer from the error propagation issue. Also, most of current ERE models do not take into account higher-order interactions between multiple entities and relations, while higher-order modeling could be beneficial.In this work, we propose HyperGraph neural network for ERE ($\hgnn{}$), which is built upon the PL-marker (a state-of-the-art marker-based pipleline model). To alleviate error propagation,we use a high-recall pruner mechanism to transfer the burden of entity identification and labeling from the NER module to the joint module of our model. For higher-order modeling, we build a hypergraph, where nodes are entities (provided by the span pruner) and relations thereof, and hyperedges encode interactions between two different relations or between a relation and its associated subject and object entities. We then run a hypergraph neural network for higher-order inference by applying message passing over the built hypergraph. Experiments on three widely used benchmarks (\acef{}, \ace{} and \scierc{}) for ERE task show significant improvements over the previous state-of-the-art PL-marker.

computational linguistic, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2310.17238

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Modeling Instance Interactions for Joint Information Extraction with Neural High-Order Conditional Random Field

Jia, Zixia, Yan, Zhaohui, Han, Wenjuan, Zheng, Zilong, Tu, Kewei

arXiv.org Artificial IntelligenceMay-28-2023

Prior works on joint Information Extraction (IE) typically model instance (e.g., event triggers, entities, roles, relations) interactions by representation enhancement, type dependencies scoring, or global decoding. We find that the previous models generally consider binary type dependency scoring of a pair of instances, and leverage local search such as beam search to approximate global solutions. To better integrate cross-instance interactions, in this work, we introduce a joint IE framework (CRFIE) that formulates joint IE as a high-order Conditional Random Field. Specifically, we design binary factors and ternary factors to directly model interactions between not only a pair of instances but also triplets. Then, these factors are utilized to jointly predict labels of all instances. To address the intractability problem of exact high-order inference, we incorporate a high-order neural decoder that is unfolded from a mean-field variational inference method, which achieves consistent learning and inference. The experimental results show that our approach achieves consistent improvements on three IE tasks compared with our baseline and prior work.

computational linguistic, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2212.08929

Country:

Asia > Middle East (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Structural Knowledge Distillation

Wang, Xinyu, Jiang, Yong, Yan, Zhaohui, Jia, Zixia, Bach, Nguyen, Wang, Tao, Huang, Zhongqiang, Huang, Fei, Tu, Kewei

arXiv.org Artificial IntelligenceOct-10-2020

Knowledge distillation is a critical technique to transfer knowledge between models, typically from a large model (the teacher) to a smaller one (the student). The objective function of knowledge distillation is typically the cross-entropy between the teacher and the student's output distributions. However, for structured prediction problems, the output space is exponential in size; therefore, the cross-entropy objective becomes intractable to compute and optimize directly. In this paper, we derive a factorized form of the knowledge distillation objective for structured prediction, which is tractable for many typical choices of the teacher and student models. In particular, we show the tractability and empirical effectiveness of structural knowledge distillation between sequence labeling and dependency parsing models under four different scenarios: 1) the teacher and student share the same factorization form of the output structure scoring function; 2) the student factorization produces smaller substructures than the teacher factorization; 3) the teacher factorization produces smaller substructures than the student factorization; 4) the factorization forms from the teacher and the student are incompatible. Deeper and larger neural networks have led to significant improvement in accuracy in various tasks, but they are also more computationally expensive and unfit for resource-constrained scenarios such as online serving. An interesting and viable solution to this problem is knowledge distillation (KD) (Buciluǎ et al., 2006; Ba & Caruana, 2014; Hinton et al., 2015), which can be used to transfer the knowledge of a large model (the teacher) to a smaller model (the student). In the field of natural language processing, for example, KD has been successfully applied to compress massive pretrained language models such as BERT (Devlin et al., 2019) and XLM-R (Conneau et al., 2020) into much smaller and faster models without significant loss in accuracy (Tang et al., 2019; Sanh et al., 2019; Tsai et al., 2019; Mukherjee & Hassan Awadallah, 2020).

computational linguistics, deep learning, neural network, (20 more...)

arXiv.org Artificial Intelligence

2010.0501

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.82)

Industry: Education (0.90)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)

Add feedback