AITopics | Zhang, Zhixin

Collaborating Authors

Zhang, Zhixin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PL-VIWO: A Lightweight and Robust Point-Line Monocular Visual Inertial Wheel Odometry

Zhang, Zhixin, Bai, Wenzhi, Zhao, Liang, Ladosz, Pawel

arXiv.org Artificial IntelligenceMar-1-2025

-- This paper presents a novel tightly coupled Filter-based monocular visual-inertial-wheel odometry (VIWO) system for ground robots, designed to deliver accurate and robust localization in long-term complex outdoor navigation scenarios. As an external sensor, the camera enhances localization performance by introducing visual constraints. However, obtaining a sufficient number of effective visual features is often challenging, particularly in dynamic or low-texture environments. T o address this issue, we incorporate the line features for additional geometric constraints. Unlike traditional approaches that treat point and line features independently, our method exploits the geometric relationships between points and lines in 2D images, enabling fast and robust line matching and triangulation. Additionally, we introduce Motion Consistency Check (MCC) to filter out potential dynamic points, ensuring the effectiveness of point feature updates. The proposed system was evaluated on publicly available datasets and benchmarked against state-of-the-art methods. Experimental results demonstrate superior performance in terms of accuracy, robustness, and efficiency.

artificial intelligence, line feature, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2503.00551

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Robots (0.72)
Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.34)

Add feedback

Stackelberg Game Preference Optimization for Data-Efficient Alignment of Language Models

Chu, Xu, Zhang, Zhixin, Jia, Tianyu, Jin, Yujie

arXiv.org Artificial IntelligenceFeb-27-2025

Aligning language models with human preferences is critical for real-world deployment, but existing methods often require large amounts of high-quality human annotations. Aiming at a data-efficient alignment method, we propose Stackelberg Game Preference Optimization (SGPO), a framework that models alignment as a two-player Stackelberg game, where a policy (leader) optimizes against a worst-case preference distribution (follower) within an $\epsilon$-Wasserstein ball, ensuring robustness to (self-)annotation noise and distribution shifts. SGPO guarantees $O(\epsilon)$-bounded regret, unlike Direct Preference Optimization (DPO), which suffers from linear regret growth in the distribution mismatch. We instantiate SGPO with the Stackelberg Self-Annotated Preference Optimization (SSAPO) algorithm, which iteratively self-annotates preferences and adversarially reweights synthetic annotated preferences. Using only 2K seed preferences, from the UltraFeedback dataset, i.e., 1/30 of human labels in the dataset, our method achieves 35.82% GPT-4 win-rate with Mistral-7B and 40.12% with Llama3-8B-Instruct within three rounds of SSAPO.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.18099

Genre: Research Report (0.64)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.88)

Add feedback

Debt Collection Negotiations with Large Language Models: An Evaluation System and Optimizing Decision Making with Multi-Agent

Wang, Xiaofeng, Zhang, Zhixin, Zheng, Jinguang, Ai, Yiming, Wang, Rui

arXiv.org Artificial IntelligenceFeb-25-2025

Debt collection negotiations (DCN) are vital for managing non-performing loans (NPLs) and reducing creditor losses. Traditional methods are labor-intensive, while large language models (LLMs) offer promising automation potential. However, prior systems lacked dynamic negotiation and real-time decision-making capabilities. This paper explores LLMs in automating DCN and proposes a novel evaluation framework with 13 metrics across 4 aspects. Our experiments reveal that LLMs tend to over-concede compared to human negotiators. To address this, we propose the Multi-Agent Debt Negotiation (MADeN) framework, incorporating planning and judging modules to improve decision rationality. We also apply post-training techniques, including DPO with rejection sampling, to optimize performance. Our studies provide valuable insights for practitioners and researchers seeking to enhance efficiency and outcomes in this domain.

debtor, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2502.18228

Country:

Asia > China (0.14)
Asia > Thailand (0.14)
Asia > Indonesia (0.14)

Genre: Research Report > New Finding (0.93)

Industry:

Commercial Services & Supplies (0.87)
Banking & Finance > Loans (0.68)
Health & Medicine > Therapeutic Area (0.46)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines

Zhang, Zhixin, Zhang, Yiyuan, Ding, Xiaohan, Yue, Xiangyu

arXiv.org Artificial IntelligenceOct-28-2024

Search engines enable the retrieval of unknown information with texts. However, traditional methods fall short when it comes to understanding unfamiliar visual content, such as identifying an object that the model has never seen before. This challenge is particularly pronounced for large vision-language models (VLMs): if the model has not been exposed to the object depicted in an image, it struggles to generate reliable answers to the user's question regarding that image. Moreover, as new objects and events continuously emerge, frequently updating VLMs is impractical due to heavy computational burdens. To address this limitation, we propose Vision Search Assistant, a novel framework that facilitates collaboration between VLMs and web agents. This approach leverages VLMs' visual understanding capabilities and web agents' real-time information access to perform open-world Retrieval-Augmented Generation via the web. By integrating visual and textual representations through this collaboration, the model can provide informed responses even when the image is novel to the system. Extensive experiments conducted on both open-set and closed-set QA benchmarks demonstrate that the Vision Search Assistant significantly outperforms the other models and can be widely applied to existing VLMs.

large language model, machine learning, preprint arxiv, (20 more...)

arXiv.org Artificial Intelligence

2410.2122

Country: North America > United States (1.00)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.93)

Industry:

Leisure & Entertainment > Sports (1.00)
Government > Voting & Elections (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.85)

Add feedback

CPFD: Confidence-aware Privileged Feature Distillation for Short Video Classification

Shi, Jinghao, Shen, Xiang, Zhao, Kaili, Wang, Xuedong, Wen, Vera, Wang, Zixuan, Wu, Yifan, Zhang, Zhixin

arXiv.org Artificial IntelligenceOct-6-2024

Dense features, customized for different business scenarios, are essential in short video classification. However, their complexity, specific adaptation requirements, and high computational costs make them resource-intensive and less accessible during online inference. Consequently, these dense features are categorized as `Privileged Dense Features'.Meanwhile, end-to-end multi-modal models have shown promising results in numerous computer vision tasks. In industrial applications, prioritizing end-to-end multi-modal features, can enhance efficiency but often leads to the loss of valuable information from historical privileged dense features. To integrate both features while maintaining efficiency and manageable resource costs, we present Confidence-aware Privileged Feature Distillation (CPFD), which empowers features of an end-to-end multi-modal model by adaptively distilling privileged features during training. Unlike existing privileged feature distillation (PFD) methods, which apply uniform weights to all instances during distillation, potentially causing unstable performance across different business scenarios and a notable performance gap between teacher model (Dense Feature enhanced multimodal-model DF-X-VLM) and student model (multimodal-model only X-VLM), our CPFD leverages confidence scores derived from the teacher model to adaptively mitigate the performance variance with the student model. We conducted extensive offline experiments on five diverse tasks demonstrating that CPFD improves the video classification F1 score by 6.76% compared with end-to-end multimodal-model (X-VLM) and by 2.31% with vanilla PFD on-average. And it reduces the performance gap by 84.6% and achieves results comparable to teacher model DF-X-VLM. The effectiveness of CPFD is further substantiated by online experiments, and our framework has been deployed in production systems for over a dozen models.

artificial intelligence, distillation, video understanding, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3627673.3680045

2410.03038

Country: North America > United States (0.51)

Genre: Research Report (0.64)

Industry: Education (0.88)

Technology: Information Technology > Artificial Intelligence > Vision > Video Understanding (0.95)

Add feedback

InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions

Zhang, Yiyuan, Kang, Yuhao, Zhang, Zhixin, Ding, Xiaohan, Zhao, Sanyuan, Yue, Xiangyu

arXiv.org Artificial IntelligenceFeb-5-2024

We introduce $\textit{InteractiveVideo}$, a user-centric framework for video generation. Different from traditional generative approaches that operate based on user-provided images or text, our framework is designed for dynamic interaction, allowing users to instruct the generative model through various intuitive mechanisms during the whole generation process, e.g. text and image prompts, painting, drag-and-drop, etc. We propose a Synergistic Multimodal Instruction mechanism, designed to seamlessly integrate users' multimodal instructions into generative models, thus facilitating a cooperative and responsive interaction between user inputs and the generative process. This approach enables iterative and fine-grained refinement of the generation result through precise and effective user instructions. With $\textit{InteractiveVideo}$, users are given the flexibility to meticulously tailor key aspects of a video. They can paint the reference image, edit semantics, and adjust video motions until their requirements are fully met. Code, models, and demo are available at https://github.com/invictus717/InteractiveVideo

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2402.0304

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Online Vectorized HD Map Construction using Geometry

Zhang, Zhixin, Zhang, Yiyuan, Ding, Xiaohan, Jin, Fusheng, Yue, Xiangyu

arXiv.org Artificial IntelligenceDec-6-2023

The construction of online vectorized High-Definition (HD) maps is critical for downstream prediction and planning. Recent efforts have built strong baselines for this task, however, shapes and relations of instances in urban road systems are still under-explored, such as parallelism, perpendicular, or rectangle-shape. In our work, we propose GeMap ($\textbf{Ge}$ometry $\textbf{Map}$), which end-to-end learns Euclidean shapes and relations of map instances beyond basic perception. Specifically, we design a geometric loss based on angle and distance clues, which is robust to rigid transformations. We also decouple self-attention to independently handle Euclidean shapes and relations. Our method achieves new state-of-the-art performance on the NuScenes and Argoverse 2 datasets. Remarkably, it reaches a 71.8% mAP on the large-scale Argoverse 2 dataset, outperforming MapTR V2 by +4.4% and surpassing the 70% mAP threshold for the first time. Code is available at https://github.com/cnzzx/GeMap

artificial intelligence, displacement vector, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2312.03341

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (0.50)
Transportation > Infrastructure & Services (0.35)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback