AITopics | Zhang, Yujia

Collaborating Authors

Zhang, Yujia

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness

Li, Jian, Huang, Haojing, Zhang, Yujia, Xu, Pengfei, Chen, Xi, Song, Rui, Shi, Lida, Wang, Jingwen, Xu, Hao

arXiv.org Artificial IntelligenceSep-26-2024

Recently, there has been significant interest in replacing the reward model in Reinforcement Learning with Human Feedback (RLHF) methods for Large Language Models (LLMs), such as Direct Preference Optimization (DPO) and its variants. These approaches commonly use a binary cross-entropy mechanism on pairwise samples, i.e., minimizing and maximizing the loss based on preferred or dis-preferred responses, respectively. However, while this training strategy omits the reward model, it also overlooks the varying preference degrees within different responses. We hypothesize that this is a key factor hindering LLMs from sufficiently understanding human preferences. To address this problem, we propose a novel Self-supervised Preference Optimization (SPO) framework, which constructs a self-supervised preference degree loss combined with the alignment loss, thereby helping LLMs improve their ability to understand the degree of preference. Extensive experiments are conducted on two widely used datasets of different tasks. The results demonstrate that SPO can be seamlessly integrated with existing preference optimization methods and significantly boost their performance to achieve state-of-the-art performance. We also conduct detailed analyses to offer comprehensive insights into SPO, which verifies its effectiveness. The code is available at https://github.com/lijian16/SPO.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2409.17791

Country:

Europe (0.93)
North America > Canada > Quebec (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Hawaii (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Less Cybersickness, Please: Demystifying and Detecting Stereoscopic Visual Inconsistencies in Virtual Reality Apps

Li, Shuqing, Gao, Cuiyun, Zhang, Jianping, Zhang, Yujia, Liu, Yepang, Gu, Jiazhen, Peng, Yun, Lyu, Michael R.

arXiv.org Artificial IntelligenceSep-19-2024

The quality of Virtual Reality (VR) apps is vital, particularly the rendering quality of the VR Graphical User Interface (GUI). Different from traditional 2D apps, VR apps create a 3D digital scene for users, by rendering two distinct 2D images for the user's left and right eyes, respectively. Stereoscopic visual inconsistency (denoted as "SVI") issues, however, undermine the rendering process of the user's brain, leading to user discomfort and even adverse health effects. Such issues commonly exist but remain underexplored. We conduct an empirical analysis on 282 SVI bug reports from 15 VR platforms, summarizing 15 types of manifestations. The empirical analysis reveals that automatically detecting SVI issues is challenging, mainly because: (1) lack of training data; (2) the manifestations of SVI issues are diverse, complicated, and often application-specific; (3) most accessible VR apps are closed-source commercial software. Existing pattern-based supervised classification approaches may be inapplicable or ineffective in detecting the SVI issues. To counter these challenges, we propose an unsupervised black-box testing framework named StereoID to identify the stereoscopic visual inconsistencies, based only on the rendered GUI states. StereoID generates a synthetic right-eye image based on the actual left-eye image and computes distances between the synthetic right-eye image and the actual right-eye image to detect SVI issues. We propose a depth-aware conditional stereo image translator to power the image generation process, which captures the expected perspective shifts between left-eye and right-eye images. We build a large-scale unlabeled VR stereo screenshot dataset with larger than 171K images from 288 real-world VR apps for experiments. After substantial experiments, StereoID demonstrates superior performance for detecting SVI issues in both user reports and wild VR apps.

artificial intelligence, human computer interaction, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3660803

2406.09313

Country: Asia > China (0.96)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine (1.00)
Leisure & Entertainment > Games > Computer Games (0.67)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model

Xu, Zhenhua, Zhang, Yujia, Xie, Enze, Zhao, Zhen, Guo, Yong, Wong, Kwan-Yee. K., Li, Zhenguo, Zhao, Hengshuang

arXiv.org Artificial IntelligenceOct-8-2023

In the past decade, autonomous driving has experienced rapid development in both academia and industry. However, its limited interpretability remains a significant unsolved problem, severely hindering autonomous vehicle commercialization and further development. Previous approaches utilizing small language models have failed to address this issue due to their lack of flexibility, generalization ability, and robustness. Recently, multimodal large language models (LLMs) have gained considerable attention from the research community for their capability to process and reason non-text data (e.g., images and videos) by text. In this paper, we present DriveGPT4, an interpretable end-to-end autonomous driving system utilizing LLMs. DriveGPT4 is capable of interpreting vehicle actions and providing corresponding reasoning, as well as answering diverse questions posed by human users for enhanced interaction. Additionally, DriveGPT4 predicts vehicle low-level control signals in an end-to-end fashion. These capabilities stem from a customized visual instruction tuning dataset specifically designed for autonomous driving. To the best of our knowledge, DriveGPT4 is the first work focusing on interpretable end-to-end autonomous driving. When evaluated on multiple tasks alongside conventional methods and video understanding LLMs, DriveGPT4 demonstrates superior qualitative and quantitative performance. Additionally, DriveGPT4 can be generalized in a zero-shot fashion to accommodate more unseen scenarios. The project page is available at https://tonyxuqaq.github.io/projects/DriveGPT4/ .

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2310.01412

Country: Asia (0.14)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

HDPV-SLAM: Hybrid Depth-augmented Panoramic Visual SLAM for Mobile Mapping System with Tilted LiDAR and Panoramic Visual Camera

Ahmadi, Mostafa, Naeini, Amin Alizadeh, Sheikholeslami, Mohammad Moein, Arjmandi, Zahra, Zhang, Yujia, Sohn, Gunho

arXiv.org Artificial IntelligenceJun-22-2023

This paper proposes a novel visual simultaneous localization and mapping (SLAM) system called Hybrid Depth-augmented Panoramic Visual SLAM (HDPV-SLAM), that employs a panoramic camera and a tilted multi-beam LiDAR scanner to generate accurate and metrically-scaled trajectories. RGB-D SLAM was the design basis for HDPV-SLAM, which added depth information to visual features. It aims to solve the two major issues hindering the performance of similar SLAM systems. The first obstacle is the sparseness of LiDAR depth, which makes it difficult to correlate it with the extracted visual features of the RGB image. A deep learning-based depth estimation module for iteratively densifying sparse LiDAR depth was suggested to address this issue. The second issue pertains to the difficulties in depth association caused by a lack of horizontal overlap between the panoramic camera and the tilted LiDAR sensor. To surmount this difficulty, we present a hybrid depth association module that optimally combines depth information estimated by two independent procedures, feature-based triangulation and depth estimation. During a phase of feature tracking, this hybrid depth association module aims to maximize the use of more accurate depth information between the triangulated depth with visual features tracked and the deep learning-based corrected depth. We evaluated the efficacy of HDPV-SLAM using the 18.95 km-long York University and Teledyne Optech (YUTO) MMS dataset. The experimental results demonstrate that the two proposed modules contribute substantially to the performance of HDPV-SLAM, which surpasses that of the state-of-the-art (SOTA) SLAM systems.

artificial intelligence, machine learning, map point, (17 more...)

arXiv.org Artificial Intelligence

2301.11823

Country:

Europe (0.46)
North America > Canada > Ontario (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Template-based Chatbot for Agriculture Related FAQs

Zhang, Daping, Chen, Xin, Zhang, Yujia, Qin, Shihan

arXiv.org Artificial IntelligenceJul-29-2021

Agriculture is the fundamental industry of the society, which is the basis of food supply and an important source of employment and GDP increase. However, the insufficient expert can not fulfill the demand of farmers. To address this problem, we design a chatbot to answer frequently asked questions in the Agriculture field. Template-based questions will be answered by AIML while LSA is used for other service-based questions. This chatbot will assist farmers by dealing with industry problems conveniently and efficiently.

arXiv.org Artificial Intelligence

2107.12595

Genre:

Research Report (0.81)
Frequently Asked Questions (FAQ) (0.69)

Industry: Food & Agriculture > Agriculture (0.80)

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.80)

Add feedback

Why should you trust my interpretation? Understanding uncertainty in LIME predictions

Fen, Hui, Tan, null, Song, Kuangyan, Udell, Madeilene, Sun, Yiming, Zhang, Yujia

arXiv.org Artificial IntelligenceApr-29-2019

Methods for interpreting machine learning black-box models increase the outcomes' transparency and in turn generates insight into the reliability and fairness of the algorithms. However, the interpretations themselves could contain significant uncertainty that undermines the trust in the outcomes and raises concern about the model's reliability. Focusing on the method "Local Interpretable Model-agnostic Explanations" (LIME), we demonstrate the presence of two sources of uncertainty, namely the randomness in its sampling procedure and the variation of interpretation quality across different input data points. Such uncertainty is present even in models with high training and test accuracy. We apply LIME to synthetic data and two public data sets, text classification in 20 Newsgroup and recidivism risk-scoring in COMPAS, to support our argument.

artificial intelligence, decision tree learning, interpretation, (18 more...)

arXiv.org Artificial Intelligence

1904.12991

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.32)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.30)

Add feedback