AITopics | jiaheng liu

Collaborating Authors

jiaheng liu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs

Peng, Tianhao, Wang, Haochen, Zhang, Yuanxing, Wang, Zekun, Wang, Zili, Chang, Gavin, Yang, Jian, Li, Shihao, Wang, Yanghai, Wang, Xintao, Li, Houyi, Ji, Wei, Wan, Pengfei, Huang, Steven, Zhang, Zhaoxiang, Liu, Jiaheng

arXiv.org Artificial IntelligenceNov-14-2025

The advent of Multimodal Large Language Models (MLLMs) has expanded AI capabilities to visual modalities, yet existing evaluation benchmarks remain limited to single-video understanding, overlooking the critical need for multi-video understanding in real-world scenarios (e.g., sports analytics and autonomous driving). To address this significant gap, we introduce MVU-Eval, the first comprehensive benchmark for evaluating Multi-Video Understanding for MLLMs. Specifically, our MVU-Eval mainly assesses eight core competencies through 1,824 meticulously curated question-answer pairs spanning 4,959 videos from diverse domains, addressing both fundamental perception tasks and high-order reasoning tasks. These capabilities are rigorously aligned with real-world applications such as multi-sensor synthesis in autonomous systems and cross-angle sports analytics. Through extensive evaluation of state-of-the-art open-source and closed-source models, we reveal significant performance discrepancies and limitations in current MLLMs' ability to perform understanding across multiple videos. The benchmark will be made publicly available to foster future research.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.0725

Country:

Asia (0.46)
Europe > Austria (0.28)

Genre: Research Report (0.50)

Industry:

Transportation > Ground > Road (0.66)
Information Technology (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models

Huang, Siming, Cheng, Tianhao, Liu, J. K., Hao, Jiaran, Song, Liuyihan, Xu, Yang, Yang, J., Liu, J. H., Zhang, Chenchen, Chai, Linzheng, Yuan, Ruifeng, Zhang, Zhaoxiang, Fu, Jie, Liu, Qian, Zhang, Ge, Wang, Zili, Qi, Yuan, Xu, Yinghui, Chu, Wei

arXiv.org Artificial IntelligenceNov-9-2024

Large language models (LLMs) for code have become indispensable in various domains, including code generation, reasoning tasks and agent systems. While open-access code LLMs are increasingly approaching the performance levels of proprietary models, high-quality code LLMs suitable for rigorous scientific investigation, particularly those with reproducible data processing pipelines and transparent training protocols, remain limited. The scarcity is due to various challenges, including resource constraints, ethical considerations, and the competitive advantages of keeping models advanced. To address the gap, we introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an "open cookbook" for the research community. Unlike most prior efforts, we release not only model weights and inference code, but also the reproducible training data, complete data processing pipeline, rigorous experimental ablation results, and detailed training protocols for open scientific research. Through this comprehensive release, we identify the key ingredients for building a top-tier code LLM: (1) code optimized heuristic rules for data cleaning and methods for data deduplication, (2) recall of text corpus related to code and (3) high-quality synthetic data in both annealing and supervised fine-tuning stages. By offering this level of openness, we aim to broaden access to all aspects of a top-tier code LLM, with OpenCoder serving as both a powerful model and an open foundation to accelerate research, and enable reproducible advancements in code AI.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.04905

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry:

Information Technology > Software (0.86)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models

Wu, Yanan, Liu, Jie, Bu, Xingyuan, Liu, Jiaheng, Zhou, Zhanhui, Zhang, Yuanxing, Zhang, Chenchen, Bai, Zhiqi, Chen, Haibin, Ge, Tiezheng, Ouyang, Wanli, Su, Wenbo, Zheng, Bo

arXiv.org Artificial IntelligenceFeb-23-2024

This paper introduces ConceptMath, a bilingual (English and Chinese), fine-grained benchmark that evaluates concept-wise mathematical reasoning of Large Language Models (LLMs). Unlike traditional benchmarks that evaluate general mathematical reasoning with an average accuracy, ConceptMath systematically organizes math problems under a hierarchy of math concepts, so that mathematical reasoning can be evaluated at different granularity with concept-wise accuracies. Based on our ConcepthMath, we evaluate a broad range of LLMs, and we observe existing LLMs, though achieving high average accuracies on traditional benchmarks, exhibit significant performance variations across different math concepts and may even fail catastrophically on the most basic ones. Besides, we also introduce an efficient fine-tuning strategy to enhance the weaknesses of existing LLMs. Finally, we hope ConceptMath could guide the developers to understand the fine-grained mathematical abilities of their models and facilitate the growth of foundation models.

accuracy, conceptmath, language model, (16 more...)

arXiv.org Artificial Intelligence

2402.1466

Country:

North America > United States > Maryland > Baltimore (0.04)
Europe > Belgium (0.04)
Asia > China > Shanghai > Shanghai (0.04)
(2 more...)

Genre: Research Report (0.40)

Industry:

Education (1.00)
Government > Tax (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

M2C: Towards Automatic Multimodal Manga Complement

Guo, Hongcheng, Wang, Boyang, Bai, Jiaqi, Liu, Jiaheng, Yang, Jian, Li, Zhoujun

arXiv.org Artificial IntelligenceOct-26-2023

Multimodal manga analysis focuses on enhancing manga understanding with visual and textual features, which has attracted considerable attention from both natural language processing and computer vision communities. Currently, most comics are hand-drawn and prone to problems such as missing pages, text contamination, and aging, resulting in missing comic text content and seriously hindering human comprehension. In other words, the Multimodal Manga Complement (M2C) task has not been investigated, which aims to handle the aforementioned issues by providing a shared semantic space for vision and language understanding. To this end, we first propose the Multimodal Manga Complement task by establishing a new M2C benchmark dataset covering two languages. First, we design a manga argumentation method called MCoT to mine event knowledge in comics with large language models. Then, an effective baseline FVP-M$^{2}$ using fine-grained visual prompts is proposed to support manga complement. Extensive experimental results show the effectiveness of FVP-M$^{2}$ method for Multimodal Mange Complement.

complement, fvp-m 2, jiaheng liu, (12 more...)

arXiv.org Artificial Intelligence

2310.1713

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > China (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback