AITopics | Gu, Pengjie

Collaborating Authors

Gu, Pengjie

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Cradle: Empowering Foundation Agents Towards General Computer Control

Tan, Weihao, Zhang, Wentao, Xu, Xinrun, Xia, Haochong, Ding, Ziluo, Li, Boyu, Zhou, Bohan, Yue, Junpeng, Jiang, Jiechuan, Li, Yewen, An, Ruyi, Qin, Molei, Zong, Chuqiao, Zheng, Longtao, Wu, Yujie, Chai, Xiaoqiang, Bi, Yifei, Xie, Tianbao, Gu, Pengjie, Li, Xiyun, Zhang, Ceyao, Tian, Long, Wang, Chaojie, Wang, Xinrun, Karlsson, Börje F., An, Bo, Yan, Shuicheng, Lu, Zongqing

arXiv.org Artificial IntelligenceJul-2-2024

Despite the success in specific scenarios, existing foundation agents still struggle to generalize across various virtual scenarios, mainly due to the dramatically different encapsulations of environments with manually designed observation and action spaces. To handle this issue, we propose the General Computer Control (GCC) setting to restrict foundation agents to interact with software through the most unified and standardized interface, i.e., using screenshots as input and keyboard and mouse actions as output. We introduce Cradle, a modular and flexible LMM-powered framework, as a preliminary attempt towards GCC. Enhanced by six key modules, Cradle can understand input screenshots and output executable code for low-level keyboard and mouse control after high-level planning, so that Cradle can interact with any software and complete long-horizon complex tasks without relying on any built-in APIs. Experimental results show that Cradle exhibits remarkable generalizability and impressive performance across four previously unexplored commercial video games, five software applications, and a comprehensive benchmark, OSWorld. Cradle is the first to enable foundation agents to follow the main storyline and complete 40-minute-long real missions in the complex AAA game Red Dead Redemption 2 (RDR2). Cradle can also create a city of a thousand people in Cities: Skylines, farm and harvest parsnips in Stardew Valley, and trade and bargain with a maximal weekly total profit of 87% in Dealer's Life 2. Cradle can not only operate daily software, like Chrome, Outlook, and Feishu, but also edit images and videos using Meitu and CapCut. Cradle greatly extends the reach of foundation agents by enabling the easy conversion of any software, especially complex games, into benchmarks to evaluate agents' various abilities and facilitate further data collection, thus paving the way for generalist agents.

large language model, machine learning, reinforcement learning, (21 more...)

arXiv.org Artificial Intelligence

2403.03186

Country: Asia > China (0.45)

Genre:

Workflow (1.00)
Research Report > New Finding (0.47)
Personal > Interview (0.45)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Information Technology > Software (1.00)
Energy > Renewable (0.92)
Energy > Power Industry (0.92)

Technology:

Information Technology > Software (1.00)
Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(9 more...)

Add feedback

Resisting Stochastic Risks in Diffusion Planners with the Trajectory Aggregation Tree

Feng, Lang, Gu, Pengjie, An, Bo, Pan, Gang

arXiv.org Artificial IntelligenceJun-7-2024

Diffusion planners have shown promise in handling long-horizon and sparse-reward tasks due to the non-autoregressive plan generation. However, their inherent stochastic risk of generating infeasible trajectories presents significant challenges to their reliability and stability. We introduce a novel approach, the Trajectory Aggregation Tree (TAT), to address this issue in diffusion planners. Compared to prior methods that rely solely on raw trajectory predictions, TAT aggregates information from both historical and current trajectories, forming a dynamic tree-like structure. Each trajectory is conceptualized as a branch and individual states as nodes. As the structure evolves with the integration of new trajectories, unreliable states are marginalized, and the most impactful nodes are prioritized for decision-making. TAT can be deployed without modifying the original training and sampling pipelines of diffusion planners, making it a training-free, ready-to-deploy solution. We provide both theoretical analysis and empirical evidence to support TAT's effectiveness. Our results highlight its remarkable ability to resist the risk from unreliable trajectories, guarantee the performance boosting of diffusion planners in $100\%$ of tasks, and exhibit an appreciable tolerance margin for sample quality, thereby enabling planning with a more than $3\times$ acceleration.

machine learning, natural language, trajectory, (17 more...)

arXiv.org Artificial Intelligence

2405.17879

Country: Europe > Austria > Vienna (0.14)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Neuromorphic Auditory Perception by Neural Spiketrum

Tang, Huajin, Gu, Pengjie, Wijekoon, Jayawan, Alsakkal, MHD Anas, Wang, Ziming, Shen, Jiangrong, Yan, Rui

arXiv.org Artificial IntelligenceSep-11-2023

Neuromorphic computing holds the promise to achieve the energy efficiency and robust learning performance of biological neural systems. To realize the promised brain-like intelligence, it needs to solve the challenges of the neuromorphic hardware architecture design of biological neural substrate and the hardware amicable algorithms with spike-based encoding and learning. Here we introduce a neural spike coding model termed spiketrum, to characterize and transform the time-varying analog signals, typically auditory signals, into computationally efficient spatiotemporal spike patterns. It minimizes the information loss occurring at the analog-to-spike transformation and possesses informational robustness to neural fluctuations and spike losses. The model provides a sparse and efficient coding scheme with precisely controllable spike rate that facilitates training of spiking neural networks in various auditory perception tasks. We further investigate the algorithm-hardware co-designs through a neuromorphic cochlear prototype which demonstrates that our approach can provide a systematic solution for spike-based artificial intelligence by fully exploiting its advantages with spike-based computation.

artificial intelligence, machine learning, spiketrum, (18 more...)

arXiv.org Artificial Intelligence

2309.0543

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Consumer Health (0.70)
Health & Medicine > Therapeutic Area > Neurology (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Controlling Type Confounding in Ad Hoc Teamwork with Instance-wise Teammate Feedback Rectification

Xing, Dong, Gu, Pengjie, Zheng, Qian, Wang, Xinrun, Liu, Shanqi, Zheng, Longtao, An, Bo, Pan, Gang

arXiv.org Artificial IntelligenceJun-19-2023

Ad hoc teamwork requires an agent to cooperate with unknown teammates without prior coordination. Many works propose to abstract teammate instances into high-level representation of types and then pre-train the best response for each type. However, most of them do not consider the distribution of teammate instances within a type. This could expose the agent to the hidden risk of \emph{type confounding}. In the worst case, the best response for an abstract teammate type could be the worst response for all specific instances of that type. This work addresses the issue from the lens of causal inference. We first theoretically demonstrate that this phenomenon is due to the spurious correlation brought by uncontrolled teammate distribution. Then, we propose our solution, CTCAT, which disentangles such correlation through an instance-wise teammate feedback rectification. This operation reweights the interaction of teammate instances within a shared type to reduce the influence of type confounding. The effect of CTCAT is evaluated in multiple domains, including classic ad hoc teamwork tasks and real-world scenarios. Results show that CTCAT is robust to the influence of type confounding, a practical issue that directly hazards the robustness of our trained agents but was unnoticed in previous works.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2306.10944

Country:

Asia > China > Zhejiang Province (0.14)
North America > United States > Hawaii (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback