Goto

Collaborating Authors

 Shi, Yiming


Connector-S: A Survey of Connectors in Multi-modal Large Language Models

arXiv.org Artificial Intelligence

With the rapid advancements in multi-modal large language models (MLLMs), connectors play a pivotal role in bridging diverse modalities and enhancing model performance. However, the design and evolution of connectors have not been comprehensively analyzed, leaving gaps in understanding how these components function and hindering the development of more powerful connectors. In this survey, we systematically review the current progress of connectors in MLLMs and present a structured taxonomy that categorizes connectors into atomic operations (mapping, compression, mixture of experts) and holistic designs (multi-layer, multi-encoder, multi-modal scenarios), highlighting their technical contributions and advancements. Furthermore, we discuss several promising research frontiers and challenges, including high-resolution input, dynamic compression, guide information selection, combination strategy, and interpretability. This survey is intended to serve as a foundational reference and a clear roadmap for researchers, providing valuable insights into the design and optimization of next-generation connectors to enhance the performance and adaptability of MLLMs.


Machine Translation Evaluation Benchmark for Wu Chinese: Workflow and Analysis

arXiv.org Artificial Intelligence

We introduce a FLORES+ dataset as an evaluation benchmark for modern Wu Chinese machine translation models and showcase its compatibility with existing Wu data. Wu Chinese is mutually unintelligible with other Sinitic languages such as Mandarin and Yue (Cantonese), but uses a set of Hanzi (Chinese characters) that profoundly overlaps with others. The population of Wu speakers is the second largest among languages in China, but the language has been suffering from significant drop in usage especially among the younger generations. We identify Wu Chinese as a textually low-resource language and address challenges for its machine translation models. Our contributions include: (1) an open-source, manually translated dataset, (2) full documentations on the process of dataset creation and validation experiments, (3) preliminary tools for Wu Chinese normalization and segmentation, and (4) benefits and limitations of our dataset, as well as implications to other low-resource languages.


USD: Unsupervised Soft Contrastive Learning for Fault Detection in Multivariate Time Series

arXiv.org Artificial Intelligence

Unsupervised fault detection in multivariate time series is critical for maintaining the integrity and efficiency of complex systems, with current methodologies largely focusing on statistical and machine learning techniques. However, these approaches often rest on the assumption that data distributions conform to Gaussian models, overlooking the diversity of patterns that can manifest in both normal and abnormal states, thereby diminishing discriminative performance. Our innovation addresses this limitation by introducing a combination of data augmentation and soft contrastive learning, specifically designed to capture the multifaceted nature of state behaviors more accurately. The data augmentation process enriches the dataset with varied representations of normal states, while soft contrastive learning fine-tunes the model's sensitivity to the subtle differences between normal and abnormal patterns, enabling it to recognize a broader spectrum of anomalies. This dual strategy significantly boosts the model's ability to distinguish between normal and abnormal states, leading to a marked improvement in fault detection performance across multiple datasets and settings, thereby setting a new benchmark for unsupervised fault detection in complex systems. The code of our method is available at \url{https://github.com/zangzelin/code_USD.git}.


TinyLLaVA Factory: A Modularized Codebase for Small-scale Large Multimodal Models

arXiv.org Artificial Intelligence

We present TinyLLaVA Factory, an open-source modular codebase for small-scale large multimodal models (LMMs) with a focus on simplicity of code implementations, extensibility of new features, and reproducibility of training results. Following the design philosophy of the factory pattern in software engineering, TinyLLaVA Factory modularizes the entire system into interchangeable components, with each component integrating a suite of cutting-edge models and methods, meanwhile leaving room for extensions to more features. In addition to allowing users to customize their own LMMs, TinyLLaVA Factory provides popular training recipes to let users pretrain and finetune their models with less coding effort. Empirical experiments validate the effectiveness of our codebase. The goal of TinyLLaVA Factory is to assist researchers and practitioners in exploring the wide landscape of designing and training small-scale LMMs with affordable computational resources.