AITopics | Pan, Jeff Z.

Collaborating Authors

Pan, Jeff Z.

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models

Zhou, Xiongtao, He, Jie, Ke, Yuhua, Zhu, Guangyao, Gutiérrez-Basulto, Víctor, Pan, Jeff Z.

arXiv.org Artificial IntelligenceJun-7-2024

Multimodal large language models (MLLMs) fine-tuned with multimodal instruction datasets have demonstrated remarkable capabilities in multimodal tasks. However, fine-tuning all parameters of MLLMs has become challenging as they usually contain billions of parameters. To address this issue, we study parameter-efficient fine-tuning (PEFT) methods for MLLMs. We aim to identify effective methods for enhancing the performance of MLLMs in scenarios where only a limited number of parameters are trained. This paper conducts empirical studies using four popular PEFT methods to fine-tune the LLM component of open-source MLLMs. We present a comprehensive analysis that encompasses various aspects, including the impact of PEFT methods on various models, parameters and location of the PEFT module, size of fine-tuning data, model stability based on PEFT methods, MLLM's generalization, and hallucination. We evaluated four PEFT methods on seven datasets from two different categories: unseen and seen datasets. Across all experiments, we show that the adapter is the best-performing PEFT method. At the same time, fine-tuning the connector layers leads to improved performance in most MLLMs. Code and data are available at https://github.com/alenai97/PEFT-MLLM.git.

large language model, llava-1, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2406.0513

Country:

Asia > Middle East > UAE (0.14)
North America > Canada (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Sports (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

COTET: Cross-view Optimal Transport for Knowledge Graph Entity Typing

Hu, Zhiwei, Gutiérrez-Basulto, Víctor, Xiang, Zhiliang, Li, Ru, Pan, Jeff Z.

arXiv.org Artificial IntelligenceMay-22-2024

Knowledge graph entity typing (KGET) aims to infer missing entity type instances in knowledge graphs. Previous research has predominantly centered around leveraging contextual information associated with entities, which provides valuable clues for inference. However, they have long ignored the dual nature of information inherent in entities, encompassing both high-level coarse-grained cluster knowledge and fine-grained type knowledge. This paper introduces Cross-view Optimal Transport for knowledge graph Entity Typing (COTET), a method that effectively incorporates the information on how types are clustered into the representation of entities and types. COTET comprises three modules: i) Multi-view Generation and Encoder, which captures structured knowledge at different levels of granularity through entity-type, entity-cluster, and type-cluster-type perspectives; ii) Cross-view Optimal Transport, transporting view-specific embeddings to a unified space by minimizing the Wasserstein distance from a distributional alignment perspective; iii) Pooling-based Entity Typing Prediction, employing a mixture pooling mechanism to aggregate prediction scores from diverse neighbors of an entity. Additionally, we introduce a distribution-based loss function to mitigate the occurrence of false negatives during training. Extensive experiments demonstrate the effectiveness of COTET when compared to existing baselines.

artificial intelligence, machine learning, neighbor, (15 more...)

arXiv.org Artificial Intelligence

2405.13602

Country:

North America > United States (0.46)
Europe > United Kingdom > England > West Midlands > Birmingham (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Sports > Soccer (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Prompting Large Language Models with Knowledge Graphs for Question Answering Involving Long-tail Facts

Huang, Wenyu, Zhou, Guancheng, Lapata, Mirella, Vougiouklis, Pavlos, Montella, Sebastien, Pan, Jeff Z.

arXiv.org Artificial IntelligenceMay-10-2024

Although Large Language Models (LLMs) are effective in performing various NLP tasks, they still struggle to handle tasks that require extensive, real-world knowledge, especially when dealing with long-tail facts (facts related to long-tail entities). This limitation highlights the need to supplement LLMs with non-parametric knowledge. To address this issue, we analysed the effects of different types of non-parametric knowledge, including textual passage and knowledge graphs (KGs). Since LLMs have probably seen the majority of factual question-answering datasets already, to facilitate our analysis, we proposed a fully automatic pipeline for creating a benchmark that requires knowledge of long-tail facts for answering the involved questions. Using this pipeline, we introduce the LTGen benchmark. We evaluate state-of-the-art LLMs in different knowledge settings using the proposed benchmark. Our experiments show that LLMs alone struggle with answering these questions, especially when the long-tail level is high or rich knowledge is required. Nonetheless, the performance of the same models improved significantly when they were prompted with non-parametric knowledge. We observed that, in most cases, prompting LLMs with KG triples surpasses passage-based prompting using a state-of-the-art retriever. In addition, while prompting LLMs with both KG triples and documents does not consistently improve knowledge coverage, it can dramatically reduce hallucinations in the generated content.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2405.06524

Country:

Europe (1.00)
Asia (1.00)
North America > Canada (0.68)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Sports > Olympic Games (0.30)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Add feedback

TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness

Zheng, Danna, Liu, Danyang, Lapata, Mirella, Pan, Jeff Z.

arXiv.org Artificial IntelligenceMay-6-2024

Large Language Models (LLMs) have demonstrated impressive capabilities across various domains, prompting a surge in their practical applications. However, concerns have arisen regarding the trustworthiness of LLMs' outputs, particularly in closed-book question-answering tasks, where non-experts may struggle to identify inaccuracies due to the absence of contextual or ground truth information. This paper introduces TrustScore, a framework based on the concept of Behavioral Consistency, which evaluates whether an LLM's response aligns with its intrinsic knowledge. Additionally, TrustScore can seamlessly integrate with factchecking methods, which assesses alignment with external knowledge sources. The experimental results show that TrustScore achieves strong correlations with human judgments, surpassing existing reference-free metrics, and achieving results on par with reference-based metrics. Large-scale language models (LLMs) have recently been in the spotlight due to their impressive performance in various NLP tasks, sparking enthusiasm for potential applications (Kaddour et al., 2023; Bubeck et al., 2023). However, a notable concern has emerged regarding the ability of LLMs to generate plausible yet incorrect responses (Tam et al., 2022; Liu et al., 2023; Devaraj et al., 2022), particularly challenging for users without specialized expertise. Consequently, users are often advised to employ LLMs in scenarios where they can confidently assess the information provided.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2402.12545

Country:

North America > United States (1.00)
Europe (0.94)

Genre:

Research Report > New Finding (0.48)
Personal > Honors (0.30)

Industry: Government > Regional Government > North America Government > United States Government (0.47)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Leveraging Intra-modal and Inter-modal Interaction for Multi-Modal Entity Alignment

Hu, Zhiwei, Gutiérrez-Basulto, Víctor, Xiang, Zhiliang, Li, Ru, Pan, Jeff Z.

arXiv.org Artificial IntelligenceApr-19-2024

Multi-modal entity alignment (MMEA) aims to identify equivalent entity pairs across different multi-modal knowledge graphs (MMKGs). Existing approaches focus on how to better encode and aggregate information from different modalities. However, it is not trivial to leverage multi-modal knowledge in entity alignment due to the modal heterogeneity. In this paper, we propose a Multi-Grained Interaction framework for Multi-Modal Entity Alignment (MIMEA), which effectively realizes multi-granular interaction within the same modality or between different modalities. MIMEA is composed of four modules: i) a Multi-modal Knowledge Embedding module, which extracts modality-specific representations with multiple individual encoders; ii) a Probability-guided Modal Fusion module, which employs a probability guided approach to integrate uni-modal representations into joint-modal embeddings, while considering the interaction between uni-modal representations; iii) an Optimal Transport Modal Alignment module, which introduces an optimal transport mechanism to encourage the interaction between uni-modal and joint-modal embeddings; iv) a Modal-adaptive Contrastive Learning module, which distinguishes the embeddings of equivalent entities from those of non-equivalent ones, for each modality. Extensive experiments conducted on two real-world datasets demonstrate the strong performance of MIMEA compared to the SoTA. Datasets and code have been submitted as supplementary materials.

artificial intelligence, machine learning, modality, (15 more...)

arXiv.org Artificial Intelligence

2404.1759

Country:

North America > United States (1.00)
Europe > Austria > Vienna (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

HyperMono: A Monotonicity-aware Approach to Hyper-Relational Knowledge Representation

Hu, Zhiwei, Gutiérrez-Basulto, Víctor, Xiang, Zhiliang, Li, Ru, Pan, Jeff Z.

arXiv.org Artificial IntelligenceApr-15-2024

In a hyper-relational knowledge graph (HKG), each fact is composed of a main triple associated with attribute-value qualifiers, which express additional factual knowledge. The hyper-relational knowledge graph completion (HKGC) task aims at inferring plausible missing links in a HKG. Most existing approaches to HKGC focus on enhancing the communication between qualifier pairs and main triples, while overlooking two important properties that emerge from the monotonicity of the hyper-relational graphs representation regime. Stage Reasoning allows for a two-step reasoning process, facilitating the integration of coarse-grained inference results derived solely from main triples and fine-grained inference results obtained from hyper-relational facts with qualifiers. In the initial stage, coarse-grained results provide an upper bound for correct predictions, which are subsequently refined in the fine-grained step. More generally, Qualifier Monotonicity implies that by attaching more qualifier pairs to a main triple, we may only narrow down the answer set, but never enlarge it. This paper proposes the HyperMono model for hyper-relational knowledge graph completion, which realizes stage reasoning and qualifier monotonicity. To implement qualifier monotonicity HyperMono resorts to cone embeddings. Experiments on three real-world datasets with three different scenario conditions demonstrate the strong performance of HyperMono when compared to the SoTA.

artificial intelligence, machine learning, qualifier, (17 more...)

arXiv.org Artificial Intelligence

2404.09848

Country:

Europe (1.00)
Asia (0.93)
North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Sports > Basketball (0.95)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.70)

Add feedback

UniArk: Improving Generalisation and Consistency for Factual Knowledge Extraction through Debiasing

Yang, Yijun, He, Jie, Chen, Pinzhen, Gutiérrez-Basulto, Víctor, Pan, Jeff Z.

arXiv.org Artificial IntelligenceApr-1-2024

Several recent papers have investigated the potential of language models as knowledge bases as well as the existence of severe biases when extracting factual knowledge. In this work, we focus on the factual probing performance over unseen prompts from tuning, and using a probabilistic view we show the inherent misalignment between pre-training and downstream tuning objectives in language models for probing knowledge. We hypothesize that simultaneously debiasing these objectives can be the key to generalisation over unseen prompts. We propose an adapter-based framework, UniArk, for generalised and consistent factual knowledge extraction through simple methods without introducing extra parameters. Extensive experiments show that UniArk can significantly improve the model's out-of-domain generalisation as well as consistency under various prompts. Additionally, we construct ParaTrex, a large-scale and diverse dataset for measuring the inconsistency and out-of-domain generation of models. Further, ParaTrex offers a reference method for constructing paraphrased datasets using large language models.

machine learning, natural language, template, (22 more...)

arXiv.org Artificial Intelligence

2404.01253

Country:

Europe (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.66)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

Archer: A Human-Labeled Text-to-SQL Dataset with Arithmetic, Commonsense and Hypothetical Reasoning

Zheng, Danna, Lapata, Mirella, Pan, Jeff Z.

arXiv.org Artificial IntelligenceFeb-24-2024

We present Archer, a challenging bilingual text-to-SQL dataset specific to complex reasoning, including arithmetic, commonsense and hypothetical reasoning. It contains 1,042 English questions and 1,042 Chinese questions, along with 521 unique SQL queries, covering 20 English databases across 20 domains. Notably, this dataset demonstrates a significantly higher level of complexity compared to existing publicly available datasets. Our evaluation shows that Archer challenges the capabilities of current state-of-the-art models, with a high-ranked model on the Spider leaderboard achieving only 6.73% execution accuracy on Archer test set. Thus, Archer presents a significant challenge for future research in this field.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2402.12554

Country:

Asia (0.28)
North America > United States (0.14)
Europe > Belgium (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry:

Leisure & Entertainment (0.93)
Automobiles & Trucks (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.75)

Add feedback

A Usage-centric Take on Intent Understanding in E-Commerce

Zhou, Wendi, Li, Tianyi, Vougiouklis, Pavlos, Steedman, Mark, Pan, Jeff Z.

arXiv.org Artificial IntelligenceFeb-22-2024

Identifying and understanding user intents is a pivotal task for E-Commerce. Despite its popularity, intent understanding has not been consistently defined or accurately benchmarked. In this paper, we focus on predicative user intents as "how a customer uses a product", and pose intent understanding as a natural language reasoning task, independent of product ontologies. We identify two weaknesses of FolkScope, the SOTA E-Commerce Intent Knowledge Graph, that limit its capacity to reason about user intents and to recommend diverse useful products. Following these observations, we introduce a Product Recovery Benchmark including a novel evaluation framework and an example dataset. We further validate the above FolkScope weaknesses on this benchmark.

large language model, machine learning, user intent, (21 more...)

arXiv.org Artificial Intelligence

2402.14901

Country:

North America > United States (0.14)
Europe > Italy (0.14)

Genre: Research Report (0.50)

Industry: Information Technology > Services > e-Commerce Services (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey

Chen, Zhuo, Zhang, Yichi, Fang, Yin, Geng, Yuxia, Guo, Lingbing, Chen, Xiang, Li, Qian, Zhang, Wen, Chen, Jiaoyan, Zhu, Yushan, Li, Jiaqi, Liu, Xiaoze, Pan, Jeff Z., Zhang, Ningyu, Chen, Huajun

arXiv.org Artificial IntelligenceFeb-9-2024

Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the semantic web community's exploration into multi-modal dimensions unlocking new avenues for innovation. In this survey, we carefully review over 300 articles, focusing on KG-aware research in two principal aspects: KG-driven Multi-Modal (KG4MM) learning, where KGs support multi-modal tasks, and Multi-Modal Knowledge Graph (MM4KG), which extends KG studies into the MMKG realm. We begin by defining KGs and MMKGs, then explore their construction progress. Our review includes two primary task categories: KG-aware multi-modal learning tasks, such as Image Classification and Visual Question Answering, and intrinsic MMKG tasks like Multi-modal Knowledge Graph Completion and Entity Alignment, highlighting specific research trajectories. For most of these tasks, we provide definitions, evaluation benchmarks, and additionally outline essential insights for conducting relevant research. Finally, we discuss current challenges and identify emerging trends, such as progress in Large Language Modeling and Multi-modal Pre-training strategies. This survey aims to serve as a comprehensive reference for researchers already involved in or considering delving into KG and multi-modal learning research, offering insights into the evolving landscape of MMKG research and supporting future work.

large language model, machine learning, question answering, (28 more...)

arXiv.org Artificial Intelligence

2402.05391

Country:

North America > United States > California (0.27)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.13)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Information Technology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Education (1.00)
(2 more...)

Technology:

Information Technology > Communications > Web > Semantic Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
(10 more...)

Add feedback