AITopics | vcm

Large Vision-Language Models (LVLMs) are pivotal for real-world AI tasks like embodied intelligence due to their strong vision-language reasoning abilities. However, current LVLMs process entire images at the token level, which is inefficient compared to humans who analyze information and generate content at the conceptual level, extracting relevant visual concepts with minimal effort. This inefficiency, stemming from the lack of a visual concept model, limits LVLMs' usability in real-world applications. To address this, we propose VCM, an end-to-end self-supervised visual concept modeling framework. VCM leverages implicit contrastive learning across multiple sampled instances and vision-language fine-tuning to construct a visual concept model without requiring costly concept-level annotations. Our results show that VCM significantly reduces computational costs (e.g., 85\% fewer FLOPs for LLaVA-1.5-7B) while maintaining strong performance across diverse image understanding tasks. Moreover, VCM enhances visual encoders' capabilities in classic visual concept perception tasks. Extensive quantitative and qualitative experiments validate the effectiveness and efficiency of VCM.

arxiv preprint arxiv, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2504.19627

Country: Asia (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.87)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)

Add feedback

Variational Collaborative Learning for User Probabilistic Representation

Cui, Kenan, Chen, Xu, Yao, Jiangchao, Zhang, Ya

arXiv.org Machine LearningSep-22-2018

Collaborative filtering (CF) has been successfully employed by many modern recommender systems. Conventional CF-based methods use the user-item interaction data as the sole information source to recommend items to users. However, CF-based methods are known for suffering from cold start problems and data sparsity problems. Hybrid models that utilize auxiliary information on top of interaction data have increasingly gained attention. A few "collaborative learning"-based models, which tightly bridges two heterogeneous learners through mutual regularization, are recently proposed for the hybrid recommendation. However, the "collaboration" in the existing methods are actually asynchronous due to the alternative optimization of the two learners. Leveraging the recent advances in variational autoencoder~(VAE), we here propose a model consisting of two streams of mutual linked VAEs, named variational collaborative model (VCM). Unlike the mutual regularization used in previous works where two learners are optimized asynchronously, VCM enables a synchronous collaborative learning mechanism. Besides, the two stream VAEs setup allows VCM to fully leverages the Bayesian probabilistic representations in collaborative learning. Extensive experiments on three real-life datasets have shown that VCM outperforms several state-of-art methods.

artificial intelligence, machine learning, representation, (18 more...)

arXiv.org Machine Learning

1809.084

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Filters

Collaborating Authors

vcm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Learning to Solve Quadratic Unconstrained Binary Optimization in a Classification Way Ming Chen

8a57aa8e8b57e64a42e95f7dceb0adb9-Paper-Conference.pdf

Learning to Solve Quadratic Unconstrained Binary Optimization in a Classification Way Ming Chen

8a57aa8e8b57e64a42e95f7dceb0adb9-Paper-Conference.pdf

VCM: Vision Concept Modeling Based on Implicit Contrastive Learning with Vision-Language Instruction Fine-Tuning

Variational Collaborative Learning for User Probabilistic Representation