AITopics | Zhang, Hanchen

Collaborating Authors

Zhang, Hanchen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AutoGLM: Autonomous Foundation Agents for GUIs

Liu, Xiao, Qin, Bo, Liang, Dongzhu, Dong, Guang, Lai, Hanyu, Zhang, Hanchen, Zhao, Hanlin, Iong, Iat Long, Sun, Jiadai, Wang, Jiaqi, Gao, Junjie, Shan, Junjun, Liu, Kangning, Zhang, Shudan, Yao, Shuntian, Cheng, Siyi, Yao, Wentao, Zhao, Wenyi, Liu, Xinghan, Liu, Xinyi, Chen, Xinying, Yang, Xinyue, Yang, Yang, Xu, Yifan, Yang, Yu, Wang, Yujia, Xu, Yulin, Qi, Zehan, Dong, Yuxiao, Tang, Jie

arXiv.org Artificial IntelligenceOct-28-2024

We present AutoGLM, a new series in the ChatGLM family, designed to serve as foundation agents for autonomous control of digital devices through Graphical User Interfaces (GUIs). While foundation models excel at acquiring human knowledge, they often struggle with decision-making in dynamic real-world environments, limiting their progress toward artificial general intelligence. This limitation underscores the importance of developing foundation agents capable of learning through autonomous environmental interactions by reinforcing existing models. Focusing on Web Browser and Phone as representative GUI scenarios, we have developed AutoGLM as a practical foundation agent system for real-world GUI interactions. Our approach integrates a comprehensive suite of techniques and infrastructures to create deployable agent systems suitable for user delivery. Through this development, we have derived two key insights: First, the design of an appropriate "intermediate interface" for GUI control is crucial, enabling the separation of planning and grounding behaviors, which require distinct optimization for flexibility and accuracy respectively. Second, we have developed a novel progressive training framework that enables self-evolving online curriculum reinforcement learning for AutoGLM. Our evaluations demonstrate AutoGLM's effectiveness across multiple domains. For web browsing, AutoGLM achieves a 55.2% success rate on VAB-WebArena-Lite (improving to 59.1% with a second attempt) and 96.2% on OpenTable evaluation tasks. In Android device control, AutoGLM attains a 36.2% success rate on AndroidLab (VAB-Mobile) and 89.7% on common tasks in popular Chinese APPs.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2411.0082

Country: Asia > China > Guangdong Province (0.14)

Genre:

Research Report (0.50)
Instructional Material (0.35)

Industry: Information Technology > Software (0.34)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent

Lai, Hanyu, Liu, Xiao, Iong, Iat Long, Yao, Shuntian, Chen, Yuxuan, Shen, Pengbo, Yu, Hao, Zhang, Hanchen, Zhang, Xiaohan, Dong, Yuxiao, Tang, Jie

arXiv.org Artificial IntelligenceApr-4-2024

Large language models (LLMs) have fueled many intelligent agent tasks, such as web navigation -- but most existing agents perform far from satisfying in real-world webpages due to three factors: (1) the versatility of actions on webpages, (2) HTML text exceeding model processing capacity, and (3) the complexity of decision-making due to the open-domain nature of web. In light of the challenge, we develop AutoWebGLM, a GPT-4-outperforming automated web navigation agent built upon ChatGLM3-6B. Inspired by human browsing patterns, we design an HTML simplification algorithm to represent webpages, preserving vital information succinctly. We employ a hybrid human-AI method to build web browsing data for curriculum training. Then, we bootstrap the model by reinforcement learning and rejection sampling to further facilitate webpage comprehension, browser operations, and efficient task decomposition by itself. For testing, we establish a bilingual benchmark -- AutoWebBench -- for real-world web browsing tasks. We evaluate AutoWebGLM across diverse web navigation benchmarks, revealing its improvements but also underlying challenges to tackle real environments. Related code, model, and data will be released at \url{https://github.com/THUDM/AutoWebGLM}.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2404.03648

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Industry:

Education (0.93)
Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

AgentBench: Evaluating LLMs as Agents

Liu, Xiao, Yu, Hao, Zhang, Hanchen, Xu, Yifan, Lei, Xuanyu, Lai, Hanyu, Gu, Yu, Ding, Hangliang, Men, Kaiwen, Yang, Kejuan, Zhang, Shudan, Deng, Xiang, Zeng, Aohan, Du, Zhengxiao, Zhang, Chenhui, Shen, Sheng, Zhang, Tianjun, Su, Yu, Sun, Huan, Huang, Minlie, Dong, Yuxiao, Tang, Jie

arXiv.org Artificial IntelligenceOct-25-2023

Large Language Models (LLMs) are becoming increasingly smart and autonomous, targeting real-world pragmatic missions beyond traditional NLP tasks. As a result, there has been an urgent need to evaluate LLMs as agents on challenging tasks in interactive environments. We present AgentBench, a multi-dimensional evolving benchmark that currently consists of 8 distinct environments to assess LLM-as-Agent's reasoning and decision-making abilities in a multi-turn open-ended generation setting. Our extensive test over 27 API-based and open-sourced (OSS) LLMs shows that, while top commercial LLMs present a strong ability of acting as agents in complex environments, there is a significant disparity in performance between them and OSS competitors. We identify the typical reasons of failures in environments and LLMs, showing that poor long-term reasoning, decision-making, and instruction following abilities are the main obstacles for developing usable LLM agents. Training on code and high quality multi-turn alignment data could improve agent performance. Datasets, environments, and an integrated evaluation package for AgentBench are released at \url{https://github.com/THUDM/AgentBench}.

agentbench, artificial intelligence, large language model, (2 more...)

arXiv.org Artificial Intelligence

2308.03688

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback