AITopics | hongming zhang

Collaborating Authors

hongming zhang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

WebEvolver: Enhancing Web Agent Self-Improvement with Coevolving World Model

Fang, Tianqing, Zhang, Hongming, Zhang, Zhisong, Ma, Kaixin, Yu, Wenhao, Mi, Haitao, Yu, Dong

arXiv.org Artificial IntelligenceAug-22-2025

Agent self-improvement, where the backbone Large Language Model (LLM) of the agent are trained on trajectories sampled autonomously based on their own policies, has emerged as a promising approach for enhancing performance. Recent advancements, particularly in web environments, face a critical limitation: their performance will reach a stagnation point during autonomous learning cycles, hindering further improvement. We argue that this stems from limited exploration of the web environment and insufficient exploitation of pre-trained web knowledge in LLMs. To improve the performance of self-improvement, we propose a novel framework that introduces a co-evolving World Model LLM. This world model predicts the next observation based on the current observation and action within the web environment. Leveraging LLMs' pretrained knowledge of abundant web content, the World Model serves dual roles: (1) as a virtual web server generating self-instructed training data to continuously refine the agent's policy, and (2) as an imagination engine during inference, enabling look-ahead simulation to guide action selection for the agent LLM. Experiments in real-world web environments (Mind2Web-Live, WebVoyager, and GAIA-web) show a 10% performance gain over existing self-evolving agents, demonstrating the efficacy and generalizability of our approach, without using any distillation from more powerful close-sourced models. Our work establishes the necessity of integrating world models into autonomous agent frameworks to unlock sustained adaptability. Code is available at https://github.com/Tencent/SelfEvolvingAgent

large language model, machine learning, trajectory, (21 more...)

arXiv.org Artificial Intelligence

2504.21024

Country:

North America > United States (0.68)
Europe > Austria > Vienna (0.14)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training

Fang, Tianqing, Zhang, Zhisong, Wang, Xiaoyang, Wang, Rui, Qin, Can, Wan, Yuxuan, Ma, Jun-Yu, Zhang, Ce, Chen, Jiaqi, Li, Xiyun, Zhang, Hongming, Mi, Haitao, Yu, Dong

arXiv.org Artificial IntelligenceAug-13-2025

General AI Agents are increasingly recognized as foundational frameworks for the next generation of artificial intelligence, enabling complex reasoning, web interaction, coding, and autonomous research capabilities. However, current agent systems are either closed-source or heavily reliant on a variety of paid APIs and proprietary tools, limiting accessibility and reproducibility for the research community. In this work, we present \textbf{Cognitive Kernel-Pro}, a fully open-source and (to the maximum extent) free multi-module agent framework designed to democratize the development and evaluation of advanced AI agents. Within Cognitive Kernel-Pro, we systematically investigate the curation of high-quality training data for Agent Foundation Models, focusing on the construction of queries, trajectories, and verifiable answers across four key domains: web, file, code, and general reasoning. Furthermore, we explore novel strategies for agent test-time reflection and voting to enhance agent robustness and performance. We evaluate Cognitive Kernel-Pro on GAIA, achieving state-of-the-art results among open-source and free agents. Notably, our 8B-parameter open-source model surpasses previous leading systems such as WebDancer and WebSailor, establishing a new performance standard for accessible, high-capability AI agents. Code is available at https://github.com/Tencent/CognitiveKernel-Pro

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2508.00414

Country:

Asia (0.67)
Europe > Austria (0.28)

Genre:

Research Report (1.00)
Workflow (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Are All Steps Equally Important? Benchmarking Essentiality Detection of Events

Wang, Haoyu, Zhang, Hongming, Wang, Yueguan, Deng, Yuqian, Chen, Muhao, Roth, Dan

arXiv.org Artificial IntelligenceOct-28-2023

Natural language expresses events with varying granularities, where coarse-grained events (goals) can be broken down into finer-grained event sequences (steps). A critical yet overlooked aspect of understanding event processes is recognizing that not all step events hold equal importance toward the completion of a goal. In this paper, we address this gap by examining the extent to which current models comprehend the essentiality of step events in relation to a goal event. Cognitive studies suggest that such capability enables machines to emulate human commonsense reasoning about preconditions and necessary efforts of everyday tasks. We contribute a high-quality corpus of (goal, step) pairs gathered from the community guideline website WikiHow, with steps manually annotated for their essentiality concerning the goal by experts. The high inter-annotator agreement demonstrates that humans possess a consistent understanding of event essentiality. However, after evaluating multiple statistical and largescale pre-trained language models, we find that existing approaches considerably underperform compared to humans. This observation highlights the need for further exploration into this critical and challenging task. The dataset and code are available at http://cogcomp.org/page/publication_view/1023.

essentiality, hongming zhang, proceedings, (11 more...)

arXiv.org Artificial Intelligence

2210.04074

Country:

North America > United States (0.94)
Europe > Croatia > Dubrovnik-Neretva County > Dubrovnik (0.04)
North America > Dominican Republic (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (0.69)
Government > Military (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.34)

Add feedback

TILFA: A Unified Framework for Text, Image, and Layout Fusion in Argument Mining

Zong, Qing, Wang, Zhaowei, Xu, Baixuan, Zheng, Tianshi, Shi, Haochen, Wang, Weiqi, Song, Yangqiu, Wong, Ginny Y., See, Simon

arXiv.org Artificial IntelligenceOct-8-2023

A main goal of Argument Mining (AM) is to analyze an author's stance. Unlike previous AM datasets focusing only on text, the shared task at the 10th Workshop on Argument Mining introduces a dataset including both text and images. Importantly, these images contain both visual elements and optical characters. Our new framework, TILFA (A Unified Framework for Text, Image, and Layout Fusion in Argument Mining), is designed to handle this mixed data. It excels at not only understanding text but also detecting optical characters and recognizing layout details in images. Our model significantly outperforms existing baselines, earning our team, KnowComp, the 1st place in the leaderboard of Argumentative Stance Classification subtask in this shared task.

computational linguistic, proceedings, tilfa, (14 more...)

arXiv.org Artificial Intelligence

2310.0521

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > Canada > Ontario > Toronto (0.05)
Oceania > Australia > Victoria > Melbourne (0.04)
(14 more...)

Genre: Research Report (0.82)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)
Health & Medicine (0.30)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

COLA: Contextualized Commonsense Causal Reasoning from the Causal Inference Perspective

Wang, Zhaowei, Do, Quyet V., Zhang, Hongming, Zhang, Jiayao, Wang, Weiqi, Fang, Tianqing, Song, Yangqiu, Wong, Ginny Y., See, Simon

arXiv.org Artificial IntelligenceMay-9-2023

Detecting commonsense causal relations (causation) between events has long been an essential yet challenging task. Given that events are complicated, an event may have different causes under various contexts. Thus, exploiting context plays an essential role in detecting causal relations. Meanwhile, previous works about commonsense causation only consider two events and ignore their context, simplifying the task formulation. This paper proposes a new task to detect commonsense causation between two events in an event sequence (i.e., context), called contextualized commonsense causal reasoning. We also design a zero-shot framework: COLA (Contextualized Commonsense Causality Reasoner) to solve the task from the causal inference perspective. This framework obtains rich incidental supervision from temporality and balances covariates from multiple timestamps to remove confounding effects. Our extensive experiments show that COLA can detect commonsense causality more accurately than baselines.

covariate, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2305.05191

Country:

Asia > China > Hong Kong (0.04)
North America > United States > Pennsylvania (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Efficient Zero-shot Event Extraction with Context-Definition Alignment

Zhang, Hongming, Yao, Wenlin, Yu, Dong

arXiv.org Artificial IntelligenceNov-13-2022

Event extraction (EE) is the task of identifying interested event mentions from text. Conventional efforts mainly focus on the supervised setting. However, these supervised models cannot generalize to event types out of the pre-defined ontology. To fill this gap, many efforts have been devoted to the zero-shot EE problem. This paper follows the trend of modeling event-type semantics but moves one step further. We argue that using the static embedding of the event type name might not be enough because a single word could be ambiguous, and we need a sentence to define the type semantics accurately. To model the definition semantics, we use two separate transformer models to project the contextualized event mentions and corresponding definitions into the same embedding space and then minimize their embedding distance via contrastive learning. On top of that, we also propose a warming phase to help the model learn the minor difference between similar definitions. We name our approach Zero-shot Event extraction with Definition (ZED). Experiments on the MAVEN dataset show that our model significantly outperforms all previous zero-shot EE methods with fast inference speed due to the disjoint design. Further experiments also show that ZED can be easily applied to the few-shot setting when the annotation is available and consistently outperforms baseline supervised methods.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2211.05156

Country: North America > United States (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)

Add feedback