AITopics | Cai, Yujun

Collaborating Authors

Cai, Yujun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Primacy Effect of ChatGPT

Wang, Yiwei, Cai, Yujun, Chen, Muhao, Liang, Yuxuan, Hooi, Bryan

arXiv.org Artificial IntelligenceOct-19-2023

Instruction-tuned large language models (LLMs), such as ChatGPT, have led to promising zero-shot performance in discriminative natural language understanding (NLU) tasks. This involves querying the LLM using a prompt containing the question, and the candidate labels to choose from. The question-answering capabilities of ChatGPT arise from its pre-training on large amounts of human-written text, as well as its subsequent fine-tuning on human preferences, which motivates us to ask: Does ChatGPT also inherits humans' cognitive biases? In this paper, we study the primacy effect of ChatGPT: the tendency of selecting the labels at earlier positions as the answer. We have two main findings: i) ChatGPT's decision is sensitive to the order of labels in the prompt; ii) ChatGPT has a clearly higher chance to select the labels at earlier positions as the answer. We hope that our experiments and analyses provide additional insights into building more reliable ChatGPT-based solutions. We release the source code at https://github.com/wangywUST/PrimacyEffectGPT.

deep learning, machine learning, primacy effect, (4 more...)

arXiv.org Artificial Intelligence

2310.13206

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How Fragile is Relation Extraction under Entity Replacements?

Wang, Yiwei, Hooi, Bryan, Wang, Fei, Cai, Yujun, Liang, Yuxuan, Zhou, Wenxuan, Tang, Jing, Duan, Manjuan, Chen, Muhao

arXiv.org Artificial IntelligenceMay-29-2023

Relation extraction (RE) aims to extract the relations between entity names from the textual context. In principle, textual context determines the ground-truth relation and the RE models should be able to correctly identify the relations reflected by the textual context. However, existing work has found that the RE models memorize the entity name patterns to make RE predictions while ignoring the textual context. This motivates us to raise the question: ``are RE models robust to the entity replacements?'' In this work, we operate the random and type-constrained entity replacements over the RE instances in TACRED and evaluate the state-of-the-art RE models under the entity replacements. We observe the 30\% - 50\% F1 score drops on the state-of-the-art RE models under entity replacements. These results suggest that we need more efforts to develop effective RE models robust to entity replacements. We release the source code at https://github.com/wangywUST/RobustRE.

artificial intelligence, entity name, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.13551

Country:

Europe (1.00)
Asia (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

DeepEMD: Differentiable Earth Mover's Distance for Few-Shot Learning

Zhang, Chi, Cai, Yujun, Lin, Guosheng, Shen, Chunhua

arXiv.org Artificial IntelligenceMar-30-2023

In this work, we develop methods for few-shot image classification from a new perspective of optimal matching between image regions. We employ the Earth Mover's Distance (EMD) as a metric to compute a structural distance between dense image representations to determine image relevance. The EMD generates the optimal matching flows between structural elements that have the minimum matching cost, which is used to calculate the image distance for classification. To generate the important weights of elements in the EMD formulation, we design a cross-reference mechanism, which can effectively alleviate the adverse impact caused by the cluttered background and large intra-class appearance variations. To implement k-shot classification, we propose to learn a structured fully connected layer that can directly classify dense image representations with the EMD. Based on the implicit function theorem, the EMD can be inserted as a layer into the network for end-to-end training. Our extensive experiments validate the effectiveness of our algorithm which outperforms state-of-the-art methods by a significant margin on five widely used few-shot classification benchmarks, namely, miniImageNet, tieredImageNet, Fewshot-CIFAR100 (FC100), Caltech-UCSD Birds-200-2011 (CUB), and CIFAR-FewShot (CIFAR-FS). We also demonstrate the effectiveness of our method on the image retrieval task in our experiments.

artificial intelligence, classification, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2003.06777

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Time-Aware Neighbor Sampling for Temporal Graph Networks

Wang, Yiwei, Cai, Yujun, Liang, Yuxuan, Ding, Henghui, Wang, Changhu, Hooi, Bryan

arXiv.org Artificial IntelligenceDec-18-2021

We present a new neighbor sampling method on temporal graphs. In a temporal graph, predicting different nodes' time-varying properties can require the receptive neighborhood of various temporal scales. In this work, we propose the TNS (Time-aware Neighbor Sampling) method: TNS learns from temporal information to provide an adaptive receptive neighborhood for every node at any time. Learning how to sample neighbors is non-trivial, since the neighbor indices in time order are discrete and not differentiable. To address this challenge, we transform neighbor indices from discrete values to continuous ones by interpolating the neighbors' messages. TNS can be flexibly incorporated into popular temporal graph networks to improve their effectiveness without increasing their time complexity. TNS can be trained in an end-to-end manner. It needs no extra supervision and is automatically and implicitly guided to sample the neighbors that are most beneficial for prediction. Empirical results on multiple standard datasets show that TNS yields significant gains on edge prediction and node classification.

artificial intelligence, educational setting, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2112.09845

Genre: Research Report (0.64)

Industry: Education > Educational Setting > Online (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Structure-Aware Label Smoothing for Graph Neural Networks

Wang, Yiwei, Cai, Yujun, Liang, Yuxuan, Wang, Wei, Ding, Henghui, Chen, Muhao, Tang, Jing, Hooi, Bryan

arXiv.org Artificial IntelligenceDec-1-2021

Representing a label distribution as a one-hot vector is a common practice in training node classification models. However, the one-hot representation may not adequately reflect the semantic characteristics of a node in different classes, as some nodes may be semantically close to their neighbors in other classes. It would cause over-confidence since the models are encouraged to assign full probabilities when classifying every node. While training models with label smoothing can ease this problem to some degree, it still fails to capture the nodes' semantic characteristics implied by the graph structures. In this work, we propose a novel SALS (\textit{Structure-Aware Label Smoothing}) method as an enhancement component to popular node classification models. SALS leverages the graph structures to capture the semantic correlations between the connected nodes and generate the structure-aware label distribution to replace the original one-hot label vectors, thus improving the node classification performance without inference costs. Extensive experiments on seven node classification benchmark datasets reveal the effectiveness of our SALS on improving both transductive and inductive node classification. Empirical results show that SALS is superior to the label smoothing method and enhances the node classification models to outperform the baseline methods.

artificial intelligence, machine learning, node, (16 more...)

arXiv.org Artificial Intelligence

2112.00499

Country: Asia (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)

Add feedback

GraphCrop: Subgraph Cropping for Graph Classification

Wang, Yiwei, Wang, Wei, Liang, Yuxuan, Cai, Yujun, Hooi, Bryan

arXiv.org Artificial IntelligenceSep-22-2020

We present a new method to regularize graph neural networks (GNNs) for better generalization in graph classification. Observing that the omission of substructures does not necessarily change the class label of the whole graph, we develop the GraphCrop (Subgraph Cropping) data augmentation method to simulate the real-world noise of substructure omission. In principle, GraphCrop utilizes a node-centric strategy to crop a contiguous subgraph from the original graph while maintaining its connectivity. By preserving the valid structure contexts for graph classification, we encourage GNNs to understand the content of graph structures in a global sense, rather than rely on a few key nodes or edges, which may not always be present. GraphCrop is parameter learning free and easy to implement within existing GNN-based graph classifiers. Qualitatively, GraphCrop expands the existing training set by generating novel and informative augmented graphs, which retain the original graph labels in most cases. Quantitatively, GraphCrop yields significant and consistent gains on multiple standard datasets, and thus enhances the popular GNNs to outperform the baseline methods. Figure 1: Omission of substructures does not change the genre label'Action' of an actor Daniel Craig's egonetwork

artificial intelligence, graph, neural network, (19 more...)

arXiv.org Artificial Intelligence

2009.10564

Country: Asia > Singapore (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)

Add feedback