AITopics | Song, Yufan

Collaborating Authors

Song, Yufan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks

Xu, Frank F., Song, Yufan, Li, Boxuan, Tang, Yuxuan, Jain, Kritanjali, Bao, Mengxue, Wang, Zora Z., Zhou, Xuhui, Guo, Zhitong, Cao, Murong, Yang, Mingyang, Lu, Hao Yang, Martin, Amaad, Su, Zhe, Maben, Leander, Mehta, Raj, Chi, Wayne, Jang, Lawrence, Xie, Yiqing, Zhou, Shuyan, Neubig, Graham

arXiv.org Artificial IntelligenceDec-18-2024

We interact with computers on an everyday basis, be it in everyday life or work, and many aspects of work can be done entirely with access to a computer and the Internet. At the same time, thanks to improvements in large language models (LLMs), there has also been a rapid development in AI agents that interact with and affect change in their surrounding environments. But how performant are AI agents at helping to accelerate or even autonomously perform work-related tasks? The answer to this question has important implications for both industry looking to adopt AI into their workflows, and for economic policy to understand the effects that adoption of AI may have on the labor market. To measure the progress of these LLM agents' performance on performing real-world professional tasks, in this paper, we introduce TheAgentCompany, an extensible benchmark for evaluating AI agents that interact with the world in similar ways to those of a digital worker: by browsing the Web, writing code, running programs, and communicating with other coworkers. We build a self-contained environment with internal web sites and data that mimics a small software company environment, and create a variety of tasks that may be performed by workers in such a company. We test baseline agents powered by both closed API-based and open-weights language models (LMs), and find that with the most competitive agent, 24% of the tasks can be completed autonomously. This paints a nuanced picture on task automation with LM agents -- in a setting simulating a real workplace, a good portion of simpler tasks could be solved autonomously, but more difficult long-horizon tasks are still beyond the reach of current systems.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2412.14161

Country: North America > United States (0.93)

Genre:

Workflow (0.88)
Research Report (0.64)

Industry:

Information Technology > Software (1.00)
Government (0.93)
Banking & Finance > Economy (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine

Weng, Jiayi, Lin, Min, Huang, Shengyi, Liu, Bo, Makoviichuk, Denys, Makoviychuk, Viktor, Liu, Zichen, Song, Yufan, Luo, Ting, Jiang, Yukun, Xu, Zhongwen, Yan, Shuicheng

arXiv.org Artificial IntelligenceOct-12-2022

There has been significant progress in developing reinforcement learning (RL) training systems. Past works such as IMPALA, Apex, Seed RL, Sample Factory, and others, aim to improve the system's overall throughput. In this paper, we aim to address a common bottleneck in the RL training system, i.e., parallel environment execution, which is often the slowest part of the whole system but receives little attention. With a curated design for paralleling RL environments, we have improved the RL environment simulation speed across different hardware setups, ranging from a laptop and a modest workstation, to a high-end machine such as NVIDIA DGX-A100. On a high-end machine, EnvPool achieves one million frames per second for the environment execution on Atari environments and three million frames per second on MuJoCo environments. When running EnvPool on a laptop, the speed is 2.8x that of the Python subprocess. Moreover, great compatibility with existing RL training libraries has been demonstrated in the open-sourced community, including CleanRL, rl_games, DeepMind Acme, etc. Finally, EnvPool allows researchers to iterate their ideas at a much faster pace and has great potential to become the de facto RL environment execution engine. Example runs show that it only takes five minutes to train agents to play Atari Pong and MuJoCo Ant on a laptop. EnvPool is open-sourced at https://github.com/sail-sg/envpool.

envpool, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2206.10558

Genre: Research Report (1.00)

Industry:

Information Technology (1.00)
Leisure & Entertainment > Games > Computer Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback