AITopics | Li, Jiaang

Collaborating Authors

Li, Jiaang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ChatMotion: A Multimodal Multi-Agent for Human Motion Analysis

Li, Lei, Jia, Sen, Wang, Jianhao, An, Zhaochong, Li, Jiaang, Hwang, Jenq-Neng, Belongie, Serge

arXiv.org Artificial IntelligenceFeb-27-2025

Advancements in Multimodal Large Language Models (MLLMs) have improved human motion understanding. However, these models remain constrained by their "instruct-only" nature, lacking interactivity and adaptability for diverse analytical perspectives. To address these challenges, we introduce ChatMotion, a multimodal multi-agent framework for human motion analysis. ChatMotion dynamically interprets user intent, decomposes complex tasks into meta-tasks, and activates specialized function modules for motion comprehension. It integrates multiple specialized modules, such as the MotionCore, to analyze human motion from various perspectives. Extensive experiments demonstrate ChatMotion's precision, adaptability, and user engagement for human motion understanding.

chatmotion, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2502.1818

Genre:

Research Report (0.64)
Workflow (0.46)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture

Li, Wenyan, Zhang, Xinyu, Li, Jiaang, Peng, Qiwei, Tang, Raphael, Zhou, Li, Zhang, Weijia, Hu, Guimin, Yuan, Yifei, Søgaard, Anders, Hershcovich, Daniel, Elliott, Desmond

arXiv.org Artificial IntelligenceJun-16-2024

Beijing Chaoshan Food is a rich and varied dimension of cultural heritage, crucial to both individuals and social groups. To bridge the gap in the literature on the often-overlooked regional diversity in this domain, we introduce FoodieQA, a manually curated, fine-grained image-text dataset capturing the intricate features of food cultures across various regions in China. We evaluate vision-language Models (VLMs) and large language models (LLMs) on newly collected, unseen food images and corresponding questions. FoodieQA comprises three multiplechoice question-answering tasks where models need to answer questions based on multiple images, Sichuan Guangdong a single image, and text-only descriptions, Figure 1: An example of regional food differences in respectively. While LLMs excel at text-based referring to hotpot in China. The depicted soups and question answering, surpassing human accuracy, dishware visually reflect the ingredients, flavors, and the open-weights VLMs still fall short by traditions of these regions: Beijing in the north, Sichuan 41% on multi-image and 21% on single-image in the southwest, and Guangdong in the south coast. VQA tasks, although closed-weights models perform closer to human levels (within 10%).

information, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.1103

Country: Asia > China > Beijing > Beijing (0.44)

Genre: Research Report (0.82)

Industry: Consumer Products & Services (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning

Li, Wenyan, Li, Jiaang, Ramos, Rita, Tang, Raphael, Elliott, Desmond

arXiv.org Artificial IntelligenceJun-6-2024

Recent advances in retrieval-augmented models for image captioning highlight the benefit of retrieving related captions for efficient, lightweight models with strong domain-transfer capabilities. While these models demonstrate the success of retrieval augmentation, retrieval models are still far from perfect in practice: the retrieved information can sometimes mislead the model, resulting in incorrect generation and worse performance. In this paper, we analyze the robustness of a retrieval-augmented captioning model SmallCap. Our analysis shows that the model is sensitive to tokens that appear in the majority of the retrieved captions, and the input attribution shows that those tokens are likely copied into the generated output. Given these findings, we propose to train the model by sampling retrieved captions from more diverse sets. This decreases the chance that the model learns to copy majority tokens, and improves both in-domain and cross-domain performance.

caption, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2406.02265

Country: Europe > Italy (0.28)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.69)
(2 more...)

Add feedback

Does Instruction Tuning Make LLMs More Consistent?

Fierro, Constanza, Li, Jiaang, Søgaard, Anders

arXiv.org Artificial IntelligenceApr-30-2024

The purpose of instruction tuning is enabling zero-shot performance, but instruction tuning has also been shown to improve chain-of-thought reasoning and value alignment (Si et al., 2023). Here we consider the impact on $\textit{consistency}$, i.e., the sensitivity of language models to small perturbations in the input. We compare 10 instruction-tuned LLaMA models to the original LLaMA-7b model and show that almost across-the-board they become more consistent, both in terms of their representations and their predictions in zero-shot and downstream tasks. We explain these improvements through mechanistic analyses of factual recall.

consistency, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2404.15206

Country:

Europe (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Word Order's Impacts: Insights from Reordering and Generation Analysis

Zhao, Qinghua, Li, Jiaang, Li, Lei, Zhou, Zenghui, Liu, Junfeng

arXiv.org Artificial IntelligenceMar-18-2024

Existing works have studied the impacts of the order of words within natural text. They usually analyze it by destroying the original order of words to create a scrambled sequence, and then comparing the models' performance between the original and scrambled sequences. The experimental results demonstrate marginal drops. Considering this findings, different hypothesis about word order is proposed, including ``the order of words is redundant with lexical semantics'', and ``models do not rely on word order''. In this paper, we revisit the aforementioned hypotheses by adding a order reconstruction perspective, and selecting datasets of different spectrum. Specifically, we first select four different datasets, and then design order reconstruction and continuing generation tasks. Empirical findings support that ChatGPT relies on word order to infer, but cannot support or negate the redundancy relations between word order lexical semantics.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2403.11473

Country:

North America > United States > Pennsylvania (0.14)
North America > United States > Michigan (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.75)

Add feedback

Exploring Visual Culture Awareness in GPT-4V: A Comprehensive Probing

Cao, Yong, Li, Wenyan, Li, Jiaang, Yuan, Yifei, Hershcovich, Daniel

arXiv.org Artificial IntelligenceFeb-8-2024

Pretrained large Vision-Language models have drawn considerable interest in recent years due to their remarkable performance. Despite considerable efforts to assess these models from diverse perspectives, the extent of visual cultural awareness in the state-of-the-art GPT-4V model remains unexplored. To tackle this gap, we extensively probed GPT-4V using the MaRVL benchmark dataset, aiming to investigate its capabilities and limitations in visual understanding with a focus on cultural aspects. Specifically, we introduced three visual related tasks, i.e. caption classification, pairwise captioning, and culture tag selection, to systematically delve into fine-grained visual cultural evaluation. Experimental results indicate that GPT-4V excels at identifying cultural concepts but still exhibits weaker performance in low-resource languages, such as Tamil and Swahili. Notably, through human evaluation, GPT-4V proves to be more culturally relevant in image captioning tasks than the original MaRVL human annotations, suggesting a promising solution for future visual cultural benchmark construction.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2402.06015

Country:

North America > United States (0.14)
North America > Canada (0.14)
Europe > Croatia (0.14)
Asia > Middle East (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Structural Similarities Between Language Models and Neural Response Measurements

Li, Jiaang, Karamolegkou, Antonia, Kementchedjhieva, Yova, Abdou, Mostafa, Lehmann, Sune, Søgaard, Anders

arXiv.org Artificial IntelligenceOct-31-2023

Large language models (LLMs) have complicated internal dynamics, but induce representations of words and phrases whose geometry we can study. Human language processing is also opaque, but neural response measurements can provide (noisy) recordings of activation during listening or reading, from which we can extract similar representations of words and phrases. Here we study the extent to which the geometries induced by these representations, share similarities in the context of brain decoding. We find that the larger neural language models get, the more their representations are structurally similar to neural response measurements from brain imaging. Code is available at https://github.com/coastalcph/

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2306.0193

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.94)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Random Entity Quantization for Parameter-Efficient Compositional Knowledge Graph Representation

Li, Jiaang, Wang, Quan, Liu, Yi, Zhang, Licheng, Mao, Zhendong

arXiv.org Artificial IntelligenceOct-24-2023

Representation Learning on Knowledge Graphs (KGs) is essential for downstream tasks. The dominant approach, KG Embedding (KGE), represents entities with independent vectors and faces the scalability challenge. Recent studies propose an alternative way for parameter efficiency, which represents entities by composing entity-corresponding codewords matched from predefined small-scale codebooks. We refer to the process of obtaining corresponding codewords of each entity as entity quantization, for which previous works have designed complicated strategies. Surprisingly, this paper shows that simple random entity quantization can achieve similar results to current strategies. We analyze this phenomenon and reveal that entity codes, the quantization outcomes for expressing entities, have higher entropy at the code level and Jaccard distance at the codeword level under random entity quantization. Therefore, different entities become more easily distinguished, facilitating effective KG representation. The above results show that current quantization strategies are not critical for KG representation, and there is still room for improvement in entity distinguishability beyond current strategies. The code to reproduce our results is available at https://github.com/JiaangL/RandomQuantization.

codeword, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2310.15797

Country:

Asia > China (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Copyright Violations and Large Language Models

Karamolegkou, Antonia, Li, Jiaang, Zhou, Li, Søgaard, Anders

arXiv.org Artificial IntelligenceOct-20-2023

Language models may memorize more than just facts, including entire chunks of texts seen during training. Fair use exemptions to copyright laws typically allow for limited use of copyrighted material without permission from the copyright holder, but typically for extraction of information from copyrighted materials, rather than {\em verbatim} reproduction. This work explores the issue of copyright violations and large language models through the lens of verbatim memorization, focusing on possible redistribution of copyrighted text. We present experiments with a range of language models over a collection of popular books and coding problems, providing a conservative characterization of the extent to which language models can redistribute these materials. Overall, this research highlights the need for further examination and the potential impact on future developments in natural language processing to ensure adherence to copyright regulations. Code is at \url{https://github.com/coastalcph/CopyrightLLMs}.

artificial intelligence, language model, natural language

arXiv.org Artificial Intelligence

2310.13771

Genre: Research Report (0.40)

Industry: Law > Intellectual Property & Technology Law (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.60)

Add feedback

Inductive Relation Prediction from Relational Paths and Context with Hierarchical Transformers

Li, Jiaang, Wang, Quan, Mao, Zhendong

arXiv.org Artificial IntelligenceJul-10-2023

Relation prediction on knowledge graphs (KGs) is a key research topic. Dominant embedding-based methods mainly focus on the transductive setting and lack the inductive ability to generalize to new entities for inference. Existing methods for inductive reasoning mostly mine the connections between entities, i.e., relational paths, without considering the nature of head and tail entities contained in the relational context. This paper proposes a novel method that captures both connections between entities and the intrinsic nature of entities, by simultaneously aggregating RElational Paths and cOntext with a unified hieRarchical Transformer framework, namely REPORT. REPORT relies solely on relation semantics and can naturally generalize to the fully-inductive setting, where KGs for training and inference have no common entities. In the experiments, REPORT performs consistently better than all baselines on almost all the eight version subsets of two fully-inductive datasets. Moreover. REPORT is interpretable by providing each element's contribution to the prediction results.

artificial intelligence, machine learning, relational path, (16 more...)

arXiv.org Artificial Intelligence

2304.00215

Country:

Asia > China (0.29)
North America > United States > California (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.95)

Add feedback