AITopics | Wang, Weizhen

Plotting

Wang, Weizhen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Analysis of On-policy Policy Gradient Methods under the Distribution Mismatch

Wang, Weizhen, He, Jianping, Duan, Xiaoming

arXiv.org Artificial IntelligenceMar-28-2025

Policy gradient methods are one of the most successful methods for solving challenging reinforcement learning problems. However, despite their empirical successes, many SOTA policy gradient algorithms for discounted problems deviate from the theoretical policy gradient theorem due to the existence of a distribution mismatch. In this work, we analyze the impact of this mismatch on the policy gradient methods. Specifically, we first show that in the case of tabular parameterizations, the methods under the mismatch remain globally optimal. Then, we extend this analysis to more general parameterizations by leveraging the theory of biased stochastic gradient descent. Our findings offer new insights into the robustness of policy gradient methods as well as the gap between theoretical foundations and practical implementations.

artificial intelligence, gradient, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2503.22244

Country: Asia (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.86)

Add feedback

CTR-Driven Advertising Image Generation with Multimodal Large Language Models

Chen, Xingye, Feng, Wei, Du, Zhenbang, Wang, Weizhen, Chen, Yanyin, Wang, Haohan, Liu, Linkai, Li, Yaoyu, Zhao, Jinyuan, Li, Yu, Zhang, Zheng, Lv, Jingjing, Shen, Junjie, Lin, Zhangang, Shao, Jingping, Shao, Yuanjie, You, Xinge, Gao, Changxin, Sang, Nong

arXiv.org Artificial IntelligenceFeb-5-2025

In web data, advertising images are crucial for capturing user attention and improving advertising effectiveness. Most existing methods generate background for products primarily focus on the aesthetic quality, which may fail to achieve satisfactory online performance. To address this limitation, we explore the use of Multimodal Large Language Models (MLLMs) for generating advertising images by optimizing for Click-Through Rate (CTR) as the primary objective. Firstly, we build targeted pre-training tasks, and leverage a large-scale e-commerce multimodal dataset to equip MLLMs with initial capabilities for advertising image generation tasks. To further improve the CTR of generated images, we propose a novel reward model to fine-tune pre-trained MLLMs through Reinforcement Learning (RL), which can jointly utilize multimodal features and accurately reflect user click preferences. Meanwhile, a product-centric preference optimization strategy is developed to ensure that the generated background content aligns with the product characteristics after fine-tuning, enhancing the overall relevance and effectiveness of the advertising images. Extensive experiments have demonstrated that our method achieves state-of-the-art performance in both online and offline metrics. Our code and pre-trained models are publicly available at: https://github.com/Chenguoz/CAIG.

background, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2502.06823

Country:

Asia > China (0.49)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.88)

Industry:

Marketing (1.00)
Information Technology > Services (0.51)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Embodied Scene Understanding for Vision Language Models via MetaVQA

Wang, Weizhen, Duan, Chenda, Peng, Zhenghao, Liu, Yuxin, Zhou, Bolei

arXiv.org Artificial IntelligenceJan-15-2025

Vision Language Models (VLMs) demonstrate significant potential as embodied AI agents for various mobility applications. However, a standardized, closed-loop benchmark for evaluating their spatial reasoning and sequential decision-making capabilities is lacking. To address this, we present MetaVQA: a comprehensive benchmark designed to assess and enhance VLMs' understanding of spatial relationships and scene dynamics through Visual Question Answering (VQA) and closed-loop simulations. MetaVQA leverages Set-of-Mark prompting and top-down view ground-truth annotations from nuScenes and Waymo datasets to automatically generate extensive question-answer pairs based on diverse real-world traffic scenarios, ensuring object-centric and context-rich instructions. Our experiments show that fine-tuning VLMs with the MetaVQA dataset significantly improves their spatial reasoning and embodied scene comprehension in safety-critical simulations, evident not only in improved VQA accuracies but also in emerging safety-aware driving maneuvers. In addition, the learning demonstrates strong transferability from simulation to real-world observation. Code and data will be publicly available at https://metadriverse.github.io/metavqa .

explanation, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2501.09167

Country: North America > United States > California (0.14)

Genre: Research Report (0.50)

Industry:

Transportation > Ground > Road (0.94)
Automobiles & Trucks (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback