AITopics | class

Collaborating Authors

class

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ODRL: A Benchmark for Off-Dynamics Reinforcement Learning

Neural Information Processing SystemsFeb-15-2026, 16:54:54 GMT

Human beings are able to transfer the policies swiftly to a structurally similar task.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Europe > Portugal > Braga > Braga (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > New Finding (0.92)

Industry: Education > Educational Setting (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

ODRL: A Benchmark for Off-Dynamics Reinforcement Learning

Neural Information Processing SystemsOct-10-2025, 05:31:58 GMT

Human beings are able to transfer the policies swiftly to a structurally similar task.

capsule, source domain, target domain, (15 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Europe > Portugal > Braga > Braga (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > New Finding (0.92)

Industry: Education > Educational Setting (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Can LLM Agents Simulate Multi-Turn Human Behavior? Evidence from Real Online Customer Behavior Data

Lu, Yuxuan, Huang, Jing, Han, Yan, Yao, Bingsheng, Bei, Sisong, Gesi, Jiri, Xie, Yaochen, Zheshen, null, Wang, null, He, Qi, Wang, Dakuo

arXiv.org Artificial IntelligenceOct-10-2025

Recent research shows that LLM Agents can generate ``believable'' human behaviors via prompt-only methods, and such agents have been increasingly adopted in downstream applications. However, existing evaluation of these agents only focuses on qualitative believability (whether human raters think they are accurate), leaving open questions of whether LLM agents can accurately generate step-by-step actions mimicking a particular human's behavior in a multi-turn interaction task. In this work, we take shopping as a case study and present the first large-scale quantitative evaluation of state-of-the-art LLMs' ability to accurately simulate human behavior. Using real-world data from 31,865 online shopping sessions containing 230,965 user actions, our evaluation reveals that prompt-based LLMs (DeepSeek-R1, Llama, Claude) achieve only 11.86% accuracy in generating human actions, highlighting a substantial gap in actual behavioral accuracy. Through experiments, we also showcase that strategies as simple as fine-tuning LLMs on real human click-through data augmented with synthesized reasoning traces can greatly enhance models' performance. The fine-tuned Qwen2.5-7B achieves 17.26% action generation accuracy and 33.86% F1 score on final purchase prediction, representing substantial improvements of 5.4% and 13.85% over prompt-only baselines. This work establishes the first rigorous benchmark for human behavior simulation and provides actionable insights for developing more accurate LLM agents for future downstream applications.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2503.20749

Genre: Research Report > New Finding (1.00)

Industry:

Retail > Online (0.68)
Information Technology > Services > e-Commerce Services (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

COLA: A Scalable Multi-Agent Framework For Windows UI Task Automation

Zhao, Di, Ma, Longhui, Wang, Siwei, Wang, Miao, Lv, Zhao

arXiv.org Artificial IntelligenceMar-12-2025

With the rapid advancements in Large Language Models (LLMs), an increasing number of studies have leveraged LLMs as the cognitive core of agents to address complex task decision-making challenges. Specially, recent research has demonstrated the potential of LLM-based agents on automating Windows GUI operations. However, existing methodologies exhibit two critical challenges: (1) static agent architectures fail to dynamically adapt to the heterogeneous requirements of OS-level tasks, leading to inadequate scenario generalization;(2) the agent workflows lack fault tolerance mechanism, necessitating complete process re-execution for UI agent decision error. To address these limitations, we introduce \textit{COLA}, a collaborative multi-agent framework for automating Windows UI operations. In this framework, a scenario-aware agent Task Scheduler decomposes task requirements into atomic capability units, dynamically selects the optimal agent from a decision agent pool, effectively responds to the capability requirements of diverse scenarios. The decision agent pool supports plug-and-play expansion for enhanced flexibility. In addition, we design a memory unit equipped to all agents for their self-evolution. Furthermore, we develop an interactive backtracking mechanism that enables human to intervene to trigger state rollbacks for non-destructive process repair. Our experimental results on the GAIA benchmark demonstrates that the \textit{COLA} framework achieves state-of-the-art performance with an average score of 31.89\%, significantly outperforming baseline approaches without web API integration. Ablation studies further validate the individual contributions of our dynamic scheduling. The code is available at https://github.com/Alokia/COLA-demo.

agent, scalable multi-agent framework, subtask, (12 more...)

arXiv.org Artificial Intelligence

2503.09263

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
(2 more...)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)

Add feedback

Dynamic Planning for LLM-based Graphical User Interface Automation

Zhang, Shaoqing, Zhang, Zhuosheng, Chen, Kehai, Ma, Xinbei, Yang, Muyun, Zhao, Tiejun, Zhang, Min

arXiv.org Artificial IntelligenceDec-19-2024

The advent of large language models (LLMs) has spurred considerable interest in advancing autonomous LLMs-based agents, particularly in intriguing applications within smartphone graphical user interfaces (GUIs). When presented with a task goal, these agents typically emulate human actions within a GUI environment until the task is completed. However, a key challenge lies in devising effective plans to guide action prediction in GUI tasks, though planning have been widely recognized as effective for decomposing complex tasks into a series of steps. Specifically, given the dynamic nature of environmental GUIs following action execution, it is crucial to dynamically adapt plans based on environmental feedback and action history.We show that the widely-used ReAct approach fails due to the excessively long historical dialogues. To address this challenge, we propose a novel approach called Dynamic Planning of Thoughts (D-PoT) for LLM-based GUI agents.D-PoT involves the dynamic adjustment of planning based on the environmental feedback and execution history. Experimental results reveal that the proposed D-PoT significantly surpassed the strong GPT-4V baseline by +12.7% (34.66% $\rightarrow$ 47.36%) in accuracy. The analysis highlights the generality of dynamic planning in different backbone LLMs, as well as the benefits in mitigating hallucinations and adapting to unseen tasks. Code is available at https://github.com/sqzhang-lazy/D-PoT.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.00467

Country:

Asia > Pakistan (0.05)
North America > Canada > Ontario > Toronto (0.04)
Asia > Vietnam (0.04)
(4 more...)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data

Shen, Junhong, Jain, Atishay, Xiao, Zedian, Amlekar, Ishan, Hadji, Mouad, Podolny, Aaron, Talwalkar, Ameet

arXiv.org Artificial IntelligenceDec-4-2024

Large Language Model (LLM) agents are rapidly improving to handle increasingly complex web-based tasks. Most of these agents rely on general-purpose, proprietary models like GPT-4 and focus on designing better prompts to improve their planning abilities. However, general-purpose LLMs are not specifically trained to understand specialized web contexts such as HTML, and they often struggle with long-horizon planning. We explore an alternative approach that fine-tunes open-source LLMs using production-scale workflow data collected from over 250 domains corresponding to 6 billion tokens. This simple yet effective approach shows substantial gains over prompting-based agents on existing benchmarks -- ScribeAgent achieves state-of-the-art direct generation performance on Mind2Web and improves the task success rate by 7.3% over the previous best text-only web agents on WebArena. We further perform detailed ablation studies on various fine-tuning design choices and provide insights into LLM selection, training recipes, context window optimization, and effect of dataset sizes.

agent, dataset, scribeagent, (17 more...)

arXiv.org Artificial Intelligence

2411.15004

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)

Genre:

Workflow (1.00)
Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

POS-tagging to highlight the skeletal structure of sentences

Churakov, Grigorii

arXiv.org Artificial IntelligenceNov-21-2024

The article describes the process of developing a model for applying partial annotation to text using BERT learning transfer. Process of data preparation and evaluation of obtained results. It has been found that the proposed method makes it possible to achieve good results in marking text. Attention Is All You Need // 2023. URL: https://github.com/chakki-works/seqeval. 12. Liao W., Veeramachaneni S. A simple semi-supervised algorithm for named entity recognition Boulder, Colorado: Association for Computational Linguistics, 2009.C. 58-65.

class, machine translation, skeletal structure, (8 more...)

arXiv.org Artificial Intelligence

2411.14393

Country:

North America > United States > Colorado > Boulder County > Boulder (0.25)
Europe > Russia (0.05)
Asia > Russia (0.05)
Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.05)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)

Add feedback

ODRL: A Benchmark for Off-Dynamics Reinforcement Learning

Lyu, Jiafei, Xu, Kang, Xu, Jiacheng, Yan, Mengbei, Yang, Jingwen, Zhang, Zongzhang, Bai, Chenjia, Lu, Zongqing, Li, Xiu

arXiv.org Artificial IntelligenceOct-28-2024

We consider off-dynamics reinforcement learning (RL) where one needs to transfer policies across different domains with dynamics mismatch. Despite the focus on developing dynamics-aware algorithms, this field is hindered due to the lack of a standard benchmark. To bridge this gap, we introduce ODRL, the first benchmark tailored for evaluating off-dynamics RL methods. ODRL contains four experimental settings where the source and target domains can be either online or offline, and provides diverse tasks and a broad spectrum of dynamics shifts, making it a reliable platform to comprehensively evaluate the agent's adaptation ability to the target domain. Furthermore, ODRL includes recent off-dynamics RL algorithms in a unified framework and introduces some extra baselines for different settings, all implemented in a single-file manner. To unpack the true adaptation capability of existing methods, we conduct extensive benchmarking experiments, which show that no method has universal advantages across varied dynamics shifts. We hope this benchmark can serve as a cornerstone for future research endeavors.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2410.2075

Country:

North America > United States (0.04)
Europe > Portugal > Braga > Braga (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > New Finding (0.92)

Industry: Education > Educational Setting (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

MindBench: A Comprehensive Benchmark for Mind Map Structure Recognition and Analysis

Chen, Lei, Yan, Feng, Zhong, Yujie, Chen, Shaoxiang, Jie, Zequn, Ma, Lin

arXiv.org Artificial IntelligenceJul-3-2024

Multimodal Large Language Models (MLLM) have made significant progress in the field of document analysis. Despite this, existing benchmarks typically focus only on extracting text and simple layout information, neglecting the complex interactions between elements in structured documents such as mind maps and flowcharts. To address this issue, we introduce the new benchmark named MindBench, which not only includes meticulously constructed bilingual authentic or synthetic images, detailed annotations, evaluation metrics and baseline models, but also specifically designs five types of structured understanding and parsing tasks. These tasks include full parsing, partial parsing, position-related parsing, structured Visual Question Answering (VQA), and position-related VQA, covering key areas such as text recognition, spatial awareness, relationship discernment, and structured parsing. Extensive experimental results demonstrate the substantial potential and significant room for improvement in current models' ability to handle structured document information. We anticipate that the launch of MindBench will significantly advance research and application development in structured document analysis technology. MindBench is available at: https://miasanlei.github.io/MindBench.github.io/.

arxiv preprint arxiv, information, mind map, (14 more...)

arXiv.org Artificial Intelligence

2407.02842

Country:

Asia > China > Beijing > Beijing (0.05)
Europe > Albania (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)

Add feedback

A Multimodal Foundation Agent for Financial Trading: Tool-Augmented, Diversified, and Generalist

Zhang, Wentao, Zhao, Lingxuan, Xia, Haochong, Sun, Shuo, Sun, Jiaze, Qin, Molei, Li, Xinyi, Zhao, Yuqing, Zhao, Yilei, Cai, Xinyu, Zheng, Longtao, Wang, Xinrun, An, Bo

arXiv.org Artificial IntelligenceJun-28-2024

Financial trading is a crucial component of the markets, informed by a multimodal information landscape encompassing news, prices, and Kline charts, and encompasses diverse tasks such as quantitative trading and high-frequency trading with various assets. While advanced AI techniques like deep learning and reinforcement learning are extensively utilized in finance, their application in financial trading tasks often faces challenges due to inadequate handling of multimodal data and limited generalizability across various tasks. To address these challenges, we present FinAgent, a multimodal foundational agent with tool augmentation for financial trading. FinAgent's market intelligence module processes a diverse range of data-numerical, textual, and visual-to accurately analyze the financial market. Its unique dual-level reflection module not only enables rapid adaptation to market dynamics but also incorporates a diversified memory retrieval system, enhancing the agent's ability to learn from historical data and improve decision-making processes. The agent's emphasis on reasoning for actions fosters trust in its financial decisions. Moreover, FinAgent integrates established trading strategies and expert insights, ensuring that its trading approaches are both data-driven and rooted in sound financial principles. With comprehensive experiments on 6 financial datasets, including stocks and Crypto, FinAgent significantly outperforms 9 state-of-the-art baselines in terms of 6 financial metrics with over 36% average improvement on profit. Specifically, a 92.27% return (a 84.39% relative improvement) is achieved on one dataset. Notably, FinAgent is the first advanced multimodal foundation agent designed for financial trading tasks.

finagent, information, market intelligence, (14 more...)

arXiv.org Artificial Intelligence

2402.18485

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.06)
North America > United States > New York > New York County > New York City (0.04)
Asia > Singapore > Central Region > Singapore (0.04)
(2 more...)

Genre:

Overview (1.00)
Research Report (0.81)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback