AITopics | Long, Xinwei

Collaborating Authors

Long, Xinwei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Technologies on Effectiveness and Efficiency: A Survey of State Spaces Models

Lv, Xingtai, Sun, Youbang, Zhang, Kaiyan, Qu, Shang, Zhu, Xuekai, Fan, Yuchen, Wu, Yi, Hua, Ermo, Long, Xinwei, Ding, Ning, Zhou, Bowen

arXiv.org Artificial IntelligenceMar-14-2025

State Space Models (SSMs) have emerged as a promising alternative to the popular transformer-based models and have been increasingly gaining attention. Compared to transformers, SSMs excel at tasks with sequential data or longer contexts, demonstrating comparable performances with significant efficiency gains. In this survey, we provide a coherent and systematic overview for SSMs, including their theoretical motivations, mathematical formulations, comparison with existing model classes, and various applications. We divide the SSM series into three main sections, providing a detailed introduction to the original SSM, the structured SSM represented by S4, and the selective SSM typified by Mamba. We put an emphasis on technicality, and highlight the various key techniques introduced to address the effectiveness and efficiency of SSMs. We hope this manuscript serves as an introduction for researchers to explore the theoretical foundations of SSMs.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.11224

Country: North America > United States (0.28)

Genre: Overview (1.00)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (0.68)
Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines

Long, Xinwei, Ma, Zhiyuan, Hua, Ermo, Zhang, Kaiyan, Qi, Biqing, Zhou, Bowen

arXiv.org Artificial IntelligenceFeb-23-2025

Retrieval-augmented generation (RAG) has emerged to address the knowledge-intensive visual question answering (VQA) task. Current methods mainly employ separate retrieval and generation modules to acquire external knowledge and generate answers, respectively. We propose ReAuSE, an alternative to the previous RAG model for the knowledge-based VQA task, which seamlessly integrates knowledge retriever into the generative multi-modal large language model, serving as a built-in search engine. Specifically, our model functions both as a generative retriever and an accurate answer generator. It not only helps retrieve documents from the knowledge base by producing identifiers for each document, but it also answers visual questions based on the retrieved documents. Furthermore, we propose a reinforced retrieval calibration module from relevance feedback to improve retrieval performance and align with the preferences for accurate answer generation. Extensive experiments on two representative OKVQA and A-OKVQA datasets demonstrate significant improvements ranging from 2.9\% to 9.6\% across all evaluation metrics when compared to strong baselines.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.16641

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.95)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

In-Context Meta LoRA Generation

Shao, Yihua, Yan, Minxi, Liu, Yang, Chen, Siyu, Chen, Wenjie, Long, Xinwei, Yan, Ziyang, Li, Lei, Zhang, Chenyu, Sebe, Nicu, Tang, Hao, Wang, Yan, Zhao, Hao, Wang, Mengzhu, Guo, Jingcai

arXiv.org Artificial IntelligenceJan-30-2025

Low-rank Adaptation (LoRA) has demonstrated remarkable capabilities for task specific fine-tuning. However, in scenarios that involve multiple tasks, training a separate LoRA model for each one results in considerable inefficiency in terms of storage and inference. Moreover, existing parameter generation methods fail to capture the correlations among these tasks, making multi-task LoRA parameter generation challenging. To address these limitations, we propose In-Context Meta LoRA (ICM-LoRA), a novel approach that efficiently achieves task-specific customization of large language models (LLMs). Specifically, we use training data from all tasks to train a tailored generator, Conditional Variational Autoencoder (CVAE). CVAE takes task descriptions as inputs and produces task-aware LoRA weights as outputs. These LoRA weights are then merged with LLMs to create task-specialized models without the need for additional fine-tuning. Furthermore, we utilize in-context meta-learning for knowledge enhancement and task mapping, to capture the relationship between tasks and parameter distributions. As a result, our method achieves more accurate LoRA parameter generation for diverse tasks using CVAE. ICM-LoRA enables more accurate LoRA parameter reconstruction than current parameter reconstruction methods and is useful for implementing task-specific enhancements of LoRA parameters. At the same time, our method occupies 283MB, only 1\% storage compared with the original LoRA.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.17635

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

AccidentBlip2: Accident Detection With Multi-View MotionBlip2

Shao, Yihua, Cai, Hongyi, Long, Xinwei, Lang, Weiyi, Wang, Zhe, Wu, Haoran, Wang, Yan, Yin, Jiayi, Yang, Yang, Lv, Yisheng, Lei, Zhen

arXiv.org Artificial IntelligenceMay-7-2024

Intelligent vehicles have demonstrated excellent capabilities in many transportation scenarios. The inference capabilities of neural networks using cameras limit the accuracy of accident detection in complex transportation systems. This paper presents AccidentBlip2, a pure vision-based multi-modal large model Blip2 for accident detection. Our method first processes the multi-view images through ViT-14g and sends the multi-view features into the cross-attention layer of Q-Former. Different from Blip2's Q-Former, our Motion Q-Former extends the self-attention layer with the temporal-attention layer. In the inference process, the queries generated from previous frames are input into Motion Q-Former to aggregate temporal information. Queries are updated with an auto-regressive strategy and are sent to a MLP to detect whether there is an accident in the surrounding environment. Our AccidentBlip2 can be extended to a multi-vehicle cooperative system by deploying Motion Q-Former on each vehicle and simultaneously fusing the generated queries into the MLP for auto-regressive inference. Our approach outperforms existing video large language models in detection accuracy in both single-vehicle and multi-vehicle systems.

information, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2404.12149

Country: Asia > China (0.29)

Genre: Research Report (0.40)

Industry: Transportation (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Generative Multi-Modal Knowledge Retrieval with Large Language Models

Long, Xinwei, Zeng, Jiali, Meng, Fandong, Ma, Zhiyuan, Zhang, Kaiyan, Zhou, Bowen, Zhou, Jie

arXiv.org Artificial IntelligenceJan-16-2024

Knowledge retrieval with multi-modal queries plays a crucial role in supporting knowledge-intensive multi-modal applications. However, existing methods face challenges in terms of their effectiveness and training efficiency, especially when it comes to training and integrating multiple retrievers to handle multi-modal queries. In this paper, we propose an innovative end-to-end generative framework for multi-modal knowledge retrieval. Our framework takes advantage of the fact that large language models (LLMs) can effectively serve as virtual knowledge bases, even when trained with limited data. We retrieve knowledge via a two-step process: 1) generating knowledge clues related to the queries, and 2) obtaining the relevant document by searching databases using the knowledge clue. In particular, we first introduce an object-aware prefix-tuning technique to guide multi-grained visual learning. Then, we align multi-grained visual features into the textual feature space of the LLM, employing the LLM to capture cross-modal interactions. Subsequently, we construct instruction data with a unified format for model training. Finally, we propose the knowledge-guided generation strategy to impose prior constraints in the decoding steps, thereby promoting the generation of distinctive knowledge clues. Through experiments conducted on three benchmarks, we demonstrate significant improvements ranging from 3.0% to 14.6% across all evaluation metrics when compared to strong baselines.

artificial intelligence, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2401.08206

Country: North America > United States > California (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without Full Large Language Model

Zhang, Kaiyan, Ding, Ning, Qi, Biqing, Zhu, Xuekai, Long, Xinwei, Zhou, Bowen

arXiv.org Artificial IntelligenceOct-23-2023

Instruction tuning has recently been recognized as an effective way of aligning Large Language Models (LLMs) to enhance their generalization ability across various tasks. However, when tuning publicly accessible, centralized LLMs with private instruction data, privacy concerns are inevitable. While direct transfer of parameterized modules between models is a plausible approach to address this, its implications and effectiveness need further exploration. This paper focuses on Offsite-Tuning (OFT), a representative technique that transfers transformer blocks between centralized LLMs and downstream emulators. Given the limited understanding of the underlying mechanism of OFT, we perform an empirical analysis on LLMs from the perspectives of representation and functional similarity. Interestingly, our findings reveal a unique modular structure within the layers of LLMs that appears to emerge as the model size expands. Simultaneously, we note subtle but potentially significant changes in representation and intermediate predictions across the layers. Inspired by these observations, we propose CRaSh, involving Clustering, Removing, and Sharing, a training-free strategy to derive improved emulators from LLMs. CRaSh significantly boosts performance of OFT with billions of parameters. Furthermore, we investigate the optimal solutions yielded by fine-tuning with and without full model through the lens of loss landscape. Our findings demonstrate a linear connectivity among these optima falling over the same basin, thereby highlighting the effectiveness of CRaSh and OFT. The source code is publicly available at https://github.com/TsinghuaC3I/CRaSh.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2310.15477

Country:

Europe (1.00)
Asia (0.93)
North America > United States > Minnesota (0.28)
North America > United States > Washington > King County > Seattle (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback