RISK: A Framework for GUI Agents in E-commerce Risk Management

Chen, Renqi, Tao, Zeyin, Guo, Jianming, Zhu, Jingzhe, Peng, Yiheng, Sun, Qingqing, Zhang, Tianyi, Chen, Shuai

arXiv.org Artificial Intelligence 

E-commerce risk management requires aggregating diverse, deeply embedded web data through multi-step, stateful interactions, which traditional scraping methods and most existing Graphical User Interface (GUI) agents cannot handle. These agents are typically limited to single-step tasks and lack the ability to manage dynamic, interactive content critical for effective risk assessment. To address this challenge, we introduce RISK, a novel framework designed to build and deploy GUI agents for this domain. RISK integrates three components: (1) RISK-Data, a dataset of 8,492 single-step and 2,386 multi-step interaction trajectories, collected through a high-fidelity browser framework and a meticulous data curation process; (2) RISK-Bench, a benchmark with 802 single-step and 320 multi-step trajectories across three difficulty levels for standardized evaluation; and (3) RISK-R1, a R1-style reinforcement fine-tuning framework considering four aspects: (i) Output Format: Updated format reward to enhance output syntactic correctness and task comprehension, (ii) Single-step Level: Stepwise accuracy reward to provide granular feedback during early training stages, (iii) Multi-step Level: Process reweight to emphasize critical later steps in interaction sequences, and (iv) Task Level: Level reweight to focus on tasks of varying difficulty. Experiments show that RISK-R1 outperforms existing baselines, achieving a 6.8% improvement in offline single-step and an 8.8% improvement in offline multi-step. RISK provides a scalable, domain-specific solution for automating complex web interactions, advancing the state of the art in e-commerce risk management. In e-commerce transaction scenarios, stringent compliance and risk control mechanisms are essential to mitigate operational, regulatory, and reputational risks. Decision-making in this context requires the aggregation of heterogeneous information from multiple external sources, many of which exist as unstructured or semi-structured data on the public web. While broad web search can identify relevant sources, truly actionable intelligence often resides deep within specific websites--sometimes on dynamically loaded subpages, behind interactive elements, or embedded within complex document object models (DOM). Traditional scraping APIs or static crawlers fail to retrieve such deeply embedded content, as they lack the ability to engage in stateful, event-driven interactions (Petrova et al., 2025). This work was done when the first author was an intern at Ant International.