AITopics | Wang, Haochen

Collaborating Authors

Wang, Haochen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Reconstructive Visual Instruction Tuning

Wang, Haochen, Zheng, Anlin, Zhao, Yucheng, Wang, Tiancai, Ge, Zheng, Zhang, Xiangyu, Zhang, Zhaoxiang

arXiv.org Artificial IntelligenceDec-30-2024

This paper introduces reconstructive visual instruction tuning (ROSS), a family of Large Multimodal Models (LMMs) that exploit vision-centric supervision signals. In contrast to conventional visual instruction tuning approaches that exclusively supervise text outputs, ROSS prompts LMMs to supervise visual outputs via reconstructing input images. By doing so, it capitalizes on the inherent richness and detail present within input images themselves, which are often lost in pure text supervision. However, producing meaningful feedback from natural images is challenging due to the heavy spatial redundancy of visual signals. To address this issue, ROSS employs a denoising objective to reconstruct latent representations of input images, avoiding directly regressing exact raw RGB values. This intrinsic activation design inherently encourages LMMs to maintain image detail, thereby enhancing their fine-grained comprehension capabilities and reducing hallucinations. Empirically, ROSS consistently brings significant improvements across different visual encoders and language models. In comparison with extrinsic assistance state-of-the-art alternatives that aggregate multiple visual experts, ROSS delivers competitive performance with a single SigLIP visual encoder, demonstrating the efficacy of our vision-centric supervision tailored for visual outputs.

arxiv preprint arxiv, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2410.09575

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

CJEval: A Benchmark for Assessing Large Language Models Using Chinese Junior High School Exam Data

Zhang, Qian-Wen, Wang, Haochen, Li, Fang, An, Siyu, Qiao, Lingfeng, Gao, Liangcai, Yin, Di, Sun, Xing

arXiv.org Artificial IntelligenceSep-24-2024

Online education platforms have significantly transformed the dissemination of educational resources by providing a dynamic and digital infrastructure. With the further enhancement of this transformation, the advent of Large Language Models (LLMs) has elevated the intelligence levels of these platforms. However, current academic benchmarks provide limited guidance for real-world industry scenarios. This limitation arises because educational applications require more than mere test question responses. To bridge this gap, we introduce CJEval, a benchmark based on Chinese Junior High School Exam Evaluations. CJEval consists of 26,136 samples across four application-level educational tasks covering ten subjects. These samples include not only questions and answers but also detailed annotations such as question types, difficulty levels, knowledge concepts, and answer explanations. By utilizing this benchmark, we assessed LLMs' potential applications and conducted a comprehensive analysis of their performance by fine-tuning on various educational tasks. Extensive experiments and discussions have highlighted the opportunities and challenges of applying LLMs in the field of education.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2409.16202

Genre: Research Report (0.82)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (0.89)
Education > Educational Setting > K-12 Education > Secondary School (0.71)
Education > Educational Setting > Online (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Open-Vocabulary Video Instance Segmentation

Wang, Haochen, Yan, Cilin, Wang, Shuai, Jiang, Xiaolong, Tang, XU, Hu, Yao, Xie, Weidi, Gavves, Efstratios

arXiv.org Artificial IntelligenceAug-6-2023

Video Instance Segmentation (VIS) aims at segmenting and categorizing objects in videos from a closed set of training categories, lacking the generalization ability to handle novel categories in real-world videos. To address this limitation, we make the following three contributions. First, we introduce the novel task of Open-Vocabulary Video Instance Segmentation, which aims to simultaneously segment, track, and classify objects in videos from open-set categories, including novel categories unseen during training. Second, to benchmark Open-Vocabulary VIS, we collect a Large-Vocabulary Video Instance Segmentation dataset (LV-VIS), that contains well-annotated objects from 1,196 diverse categories, significantly surpassing the category size of existing datasets by more than one order of magnitude. Third, we propose an efficient Memory-Induced Transformer architecture, OV2Seg, to first achieve Open-Vocabulary VIS in an end-to-end manner with near real-time inference speed. Extensive experiments on LV-VIS and four existing VIS datasets demonstrate the strong zero-shot generalization ability of OV2Seg on novel categories. The dataset and code are released here https://github.com/haochenheheda/LVVIS.

artificial intelligence, category, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2304.01715

Genre: Research Report (0.64)

Industry:

Transportation (0.46)
Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation

Wang, Haochen, Du, Xiaodan, Li, Jiahao, Yeh, Raymond A., Shakhnarovich, Greg

arXiv.org Artificial IntelligenceDec-1-2022

A diffusion model learns to predict a vector field of gradients. We propose to apply chain rule on the learned gradients, and back-propagate the score of a diffusion model through the Jacobian of a differentiable renderer, which we instantiate to be a voxel radiance field. This setup aggregates 2D scores at multiple camera viewpoints into a 3D score, and repurposes a pretrained 2D model for 3D data generation. We identify a technical challenge of distribution mismatch that arises in this application, and propose a novel estimation mechanism to resolve it. We run our algorithm on several off-the-shelf diffusion image generative models, including the recently released Stable Diffusion trained on the large-scale LAION dataset.

artificial intelligence, diffusion model, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2212.00774

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback