AITopics | Zhang, Jiyuan

Collaborating Authors

Zhang, Jiyuan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Exploring the Role of Explicit Temporal Modeling in Multimodal Large Language Models for Video Understanding

Li, Yun, Liu, Zhe, Kong, Yajing, Li, Guangrui, Zhang, Jiyuan, Bian, Chao, Liu, Feng, Yao, Lina, Sun, Zhenbang

arXiv.org Artificial IntelligenceJan-28-2025

Applying Multimodal Large Language Models (MLLMs) to video understanding presents significant challenges due to the need to model temporal relations across frames. Existing approaches adopt either implicit temporal modeling, relying solely on the LLM decoder, or explicit temporal modeling, employing auxiliary temporal encoders. To investigate this debate between the two paradigms, we propose the Stackable Temporal Encoder (STE). STE enables flexible explicit temporal modeling with adjustable temporal receptive fields and token compression ratios. Using STE, we systematically compare implicit and explicit temporal modeling across dimensions such as overall performance, token compression effectiveness, and temporal-specific understanding. We also explore STE's design considerations and broader impacts as a plug-in module and in image modalities. Our findings emphasize the critical role of explicit temporal modeling, providing actionable insights to advance video MLLMs.

large language model, machine learning, temporal modeling, (14 more...)

arXiv.org Artificial Intelligence

2501.16786

Country:

Oceania > Australia (0.14)
Europe > Netherlands (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Automatic Item Generation for Personality Situational Judgment Tests with Large Language Models

Li, Chang-Jin, Zhang, Jiyuan, Tang, Yun, Li, Jian

arXiv.org Artificial IntelligenceDec-10-2024

Personality assessment, particularly through situational judgment tests (SJTs), is a vital tool for psychological research, talent selection, and educational evaluation. This study explores the potential of GPT-4, a state-of-the-art large language model (LLM), to automate the generation of personality situational judgment tests (PSJTs) in Chinese. Traditional SJT development is labor-intensive and prone to biases, while GPT-4 offers a scalable, efficient alternative. Two studies were conducted: Study 1 evaluated the impact of prompt design and temperature settings on content validity, finding that optimized prompts with a temperature of 1.0 produced creative and accurate items. Study 2 assessed the psychometric properties of GPT-4-generated PSJTs, revealing that they demonstrated satisfactory reliability and validity, surpassing the performance of manually developed tests in measuring the Big Five personality traits. This research highlights GPT-4's effectiveness in developing high-quality PSJTs, providing a scalable and innovative method for psychometric test development. These findings expand the possibilities of automatic item generation and the application of LLMs in psychology, and offer practical implications for streamlining test development processes in resource-limited settings.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2412.12144

Country: Asia > China (0.29)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)
Research Report > Promising Solution (0.66)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.68)
Law (0.67)
Education > Assessment & Standards (0.67)
Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Evaluating and Advancing Multimodal Large Language Models in Ability Lens

Chen, Feng, Gou, Chenhui, Liu, Jing, Yang, Yang, Li, Zhaoyang, Zhang, Jiyuan, Sun, Zhenbang, Zhuang, Bohan, Wu, Qi

arXiv.org Artificial IntelligenceNov-21-2024

As multimodal large language models (MLLMs) advance rapidly, rigorous evaluation has become essential, providing further guidance for their development. In this work, we focus on a unified and robust evaluation of \textbf{vision perception} abilities, the foundational skill of MLLMs. We find that existing perception benchmarks, each focusing on different question types, domains, and evaluation metrics, introduce significant evaluation variance, complicating comprehensive assessments of perception abilities when relying on any single benchmark. To address this, we introduce \textbf{AbilityLens}, a unified benchmark designed to evaluate MLLMs across six key perception abilities, focusing on both accuracy and stability, with each ability encompassing diverse question types, domains, and metrics. With the assistance of AbilityLens, we: (1) identify the strengths and weaknesses of current models, highlighting stability patterns and revealing a notable performance gap between open-source and closed-source models; (2) introduce an online evaluation mode, which uncovers interesting ability conflict and early convergence phenomena during MLLM training; and (3) design a simple ability-specific model merging method that combines the best ability checkpoint from early training stages, effectively mitigating performance decline due to ability conflict. The benchmark and online leaderboard will be released soon.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2411.14725

Country: Europe > Netherlands (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

USP-Gaussian: Unifying Spike-based Image Reconstruction, Pose Correction and Gaussian Splatting

Chen, Kang, Zhang, Jiyuan, Hao, Zecheng, Zheng, Yajing, Huang, Tiejun, Yu, Zhaofei

arXiv.org Artificial IntelligenceNov-15-2024

Spike cameras, as an innovative neuromorphic camera that captures scenes with the 0-1 bit stream at 40 kHz, are increasingly employed for the 3D reconstruction task via Neural Radiance Fields (NeRF) or 3D Gaussian Splatting (3DGS). Previous spike-based 3D reconstruction approaches often employ a casecased pipeline: starting with high-quality image reconstruction from spike streams based on established spike-to-image reconstruction algorithms, then progressing to camera pose estimation and 3D reconstruction. However, this cascaded approach suffers from substantial cumulative errors, where quality limitations of initial image reconstructions negatively impact pose estimation, ultimately degrading the fidelity of the 3D reconstruction. To address these issues, we propose a synergistic optimization framework, \textbf{USP-Gaussian}, that unifies spike-based image reconstruction, pose correction, and Gaussian splatting into an end-to-end framework. Leveraging the multi-view consistency afforded by 3DGS and the motion capture capability of the spike camera, our framework enables a joint iterative optimization that seamlessly integrates information between the spike-to-image network and 3DGS. Experiments on synthetic datasets with accurate poses demonstrate that our method surpasses previous approaches by effectively eliminating cascading errors. Moreover, we integrate pose optimization to achieve robust 3D reconstruction in real-world scenarios with inaccurate initial poses, outperforming alternative methods by effectively reducing noise and preserving fine texture details. Our code, data and trained models will be available at \url{https://github.com/chenkang455/USP-Gaussian}.

artificial intelligence, spike stream, video understanding, (17 more...)

arXiv.org Artificial Intelligence

2411.10504

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Vision > Video Understanding (0.75)

Add feedback

A Deep Learning Inference Scheme Based on Pipelined Matrix Multiplication Acceleration Design and Non-uniform Quantization

Zhang, Yuyang, Leung, Dik Hin, Guo, Min, Xiao, Yijia, Liu, Haoyue, Li, Yunfei, Zhang, Jiyuan, Wang, Guan, Chen, Zhen

arXiv.org Artificial IntelligenceOct-10-2021

Matrix multiplication is the bedrock in Deep Learning inference application. When it comes to hardware acceleration on edge computing devices, matrix multiplication often takes up a great majority of the time. To achieve better performance in edge computing, we introduce a low-power Multi-layer Perceptron (MLP) accelerator based on a pipelined matrix multiplication scheme and a nonuniform quantization methodology. The implementation is running on Field-programmable Gate Array (FPGA) devices and tested its performance on handwritten digit classification and Q-learning tasks. Results show that our method can achieve better performance with fewer power consumption.

artificial intelligence, machine learning, neural network, (14 more...)

arXiv.org Artificial Intelligence

2110.04861

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

High Performance Zero-Memory Overhead Direct Convolutions

Zhang, Jiyuan, Franchetti, Franz, Low, Tze Meng

arXiv.org Machine LearningSep-19-2018

The computation of convolution layers in deep neural networks typically rely on high performance routines that trade space for time by using additional memory (either for packing purposes or required as part of the algorithm) to improve performance. The problems with such an approach are two-fold. First, these routines incur additional memory overhead which reduces the overall size of the network that can fit on embedded devices with limited memory capacity. Second, these high performance routines were not optimized for performing convolution, which means that the performance obtained is usually less than conventionally expected. In this paper, we demonstrate that direct convolution, when implemented correctly, eliminates all memory overhead, and yields performance that is between 10% to 400% times better than existing high performance implementations of convolution layers on conventional and embedded CPU architectures. We also show that a high performance direct convolution exhibits better scaling performance, i.e. suffers less performance drop, when increasing the number of threads.

convolution, deep learning, neural network, (18 more...)

arXiv.org Machine Learning

1809.1017

Country:

North America > United States > Pennsylvania (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Architecture (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback