Goto

Collaborating Authors

 trading decision



When Agents Trade: Live Multi-Market Trading Benchmark for LLM Agents

Qian, Lingfei, Peng, Xueqing, Wang, Yan, Zhang, Vincent Jim, He, Huan, Smith, Hanley, Han, Yi, He, Yueru, Li, Haohang, Cao, Yupeng, Yu, Yangyang, Lopez-Lira, Alejandro, Lu, Peng, Nie, Jian-Yun, Xiong, Guojun, Huang, Jimin, Ananiadou, Sophia

arXiv.org Artificial Intelligence

Although Large Language Model (LLM)-based agents are increasingly used in financial trading, it remains unclear whether they can reason and adapt in live markets, as most studies test models instead of agents, cover limited periods and assets, and rely on unverified data. To address these gaps, we introduce Agent Market Arena (AMA), the first lifelong, real-time benchmark for evaluating LLM-based trading agents across multiple markets. AMA integrates verified trading data, expert-checked news, and diverse agent architectures within a unified trading framework, enabling fair and continuous comparison under real conditions. It implements four agents, including InvestorAgent as a single-agent baseline, TradeAgent and HedgeFundAgent with different risk styles, and DeepFundAgent with memory-based reasoning, and evaluates them across GPT-4o, GPT-4.1, Claude-3.5-haiku, Claude-sonnet-4, and Gemini-2.0-flash. Live experiments on both cryptocurrency and stock markets demonstrate that agent frameworks display markedly distinct behavioral patterns, spanning from aggressive risk-taking to conservative decision-making, whereas model backbones contribute less to outcome variation. AMA thus establishes a foundation for rigorous, reproducible, and continuously evolving evaluation of financial reasoning and trading intelligence in LLM-based agents.



StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets?

Chen, Yanxu, Yao, Zijun, Liu, Yantao, Ye, Jin, Yu, Jianing, Hou, Lei, Li, Juanzi

arXiv.org Artificial Intelligence

Large language models (LLMs) have recently demonstrated strong capabilities as autonomous agents, showing promise in reasoning, tool use, and sequential decision-making. While prior benchmarks have evaluated LLM agents in domains such as software engineering and scientific discovery, the finance domain remains underexplored, despite its direct relevance to economic value and high-stakes decision-making. Existing financial benchmarks primarily test static knowledge through question answering, but they fall short of capturing the dynamic and iterative nature of trading. To address this gap, we introduce StockBench, a contamination-free benchmark designed to evaluate LLM agents in realistic, multi-month stock trading environments. Agents receive daily market signals -- including prices, fundamentals, and news -- and must make sequential buy, sell, or hold decisions. Performance is assessed using financial metrics such as cumulative return, maximum drawdown, and the Sortino ratio. Our evaluation of state-of-the-art proprietary (e.g., GPT-5, Claude-4) and open-weight (e.g., Qwen3, Kimi-K2, GLM-4.5) models shows that while most LLM agents struggle to outperform the simple buy-and-hold baseline, several models demonstrate the potential to deliver higher returns and manage risk more effectively. These findings highlight both the challenges and opportunities in developing LLM-powered financial agents, showing that excelling at static financial knowledge tasks does not necessarily translate into successful trading strategies. We release StockBench as an open-source resource to support reproducibility and advance future research in this domain.


To Trade or Not to Trade: An Agentic Approach to Estimating Market Risk Improves Trading Decisions

Emmanoulopoulos, Dimitrios, Olby, Ollie, Lyon, Justin, Stillman, Namid R.

arXiv.org Artificial Intelligence

Applications range from technical analysis of a company's fundamental value, wider market sentiment, factor analysis and most tasks involving some form of natural language processing (NLP) [1, 2]. The implications to trading systems will likely be a dramatic increase in the rate and volume of market insights that can be generated to inform decisions. The overall capabilities of LLMs have dramatically increased over the last five years [3]. This has led to an increase in the number of LLMs available, both as proprietary models from frontier labs or as smaller models with open-weights which can be run locally. Given this, the influence of LLMs on trading decisions is expected to be varied and highly model specific. Early work is starting to compare and benchmark these models in tasks specific to financial applications, such as trading decisions, portfolio optimisation, and market analysis [4-10]. As the number of models increases, and their underlying strengths and weaknesses become more apparent, it is expected that different classes of pre-trained models will be more regularly deployed to achieve certain objectives [11, 12]. While these objectives are likely to be significantly linked to NLP-based tasks, such as text summarisation, analysis, and generation, recent LLM architectures give early evidence that more complex tasks can also be automated. These LLMs, such as the'o' series from OpenAI or'R1' from DeepSeek, generate'reasoning' tokens which result in the model performing more in-context analysis of the generated output and has lead to improved performance over a number of key evaluation measures [13, 14].


FinVision: A Multi-Agent Framework for Stock Market Prediction

Fatemi, Sorouralsadat, Hu, Yuheng

arXiv.org Artificial Intelligence

Financial trading has been a challenging task, as it requires the integration of vast amounts of data from various modalities. Traditional deep learning and reinforcement learning methods require large training data and often involve encoding various data types into numerical formats for model input, which limits the explainability of model behavior. Recently, LLM-based agents have demonstrated remarkable advancements in handling multi-modal data, enabling them to execute complex, multi-step decision-making tasks while providing insights into their thought processes. This research introduces a multi-modal multi-agent system designed specifically for financial trading tasks. Our framework employs a team of specialized LLM-based agents, each adept at processing and interpreting various forms of financial data, such as textual news reports, candlestick charts, and trading signal charts. A key feature of our approach is the integration of a reflection module, which conducts analyses of historical trading signals and their outcomes. This reflective process is instrumental in enhancing the decision-making capabilities of the system for future trading scenarios. Furthermore, the ablation studies indicate that the visual reflection module plays a crucial role in enhancing the decision-making capabilities of our framework.


Enhancing LLM Trading Performance with Fact-Subjectivity Aware Reasoning

Wang, Qian, Gao, Yuchen, Tang, Zhenheng, Luo, Bingqiao, He, Bingsheng

arXiv.org Artificial Intelligence

While many studies prove more advanced LLMs perform better on tasks such as math and coding, we notice that in cryptocurrency trading, stronger LLMs work worse than weaker LLMs often. To study how this counter-intuitive phenomenon occurs, we examine the LLM reasoning processes on making trading decisions. We find that separating the reasoning process into factual and subjective components can lead to higher profits. Building on this insight, we introduce a multi-agent framework, FS-ReasoningAgent, which enables LLMs to recognize and learn from both factual and subjective reasoning. Extensive experiments demonstrate that this framework enhances LLM trading performance in cryptocurrency markets. Additionally, an ablation study reveals that relying on subjective news tends to generate higher returns in bull markets, whereas focusing on factual information yields better results in bear markets. Our code and data are available at \url{https://anonymous.4open.science/r/FS-ReasoningAgent-B55F/}.


Combining supervised and unsupervised learning methods to predict financial market movements

Palma, Gabriel Rodrigues, Skoczeń, Mariusz, Maguire, Phil

arXiv.org Artificial Intelligence

The decisions traders make to buy or sell an asset depend on various analyses, with expertise required to identify patterns that can be exploited for profit. In this paper we identify novel features extracted from emergent and well-established financial markets using linear models and Gaussian Mixture Models (GMM) with the aim of finding profitable opportunities. We used approximately six months of data consisting of minute candles from the Bitcoin, Pepecoin, and Nasdaq markets to derive and compare the proposed novel features with commonly used ones. These features were extracted based on the previous 59 minutes for each market and used to identify predictions for the hour ahead. We explored the performance of various machine learning strategies, such as Random Forests (RF) and K-Nearest Neighbours (KNN) to classify market movements. A naive random approach to selecting trading decisions was used as a benchmark, with outcomes assumed to be equally likely. We used a temporal cross-validation approach using test sets of 40%, 30% and 20% of total hours to evaluate the learning algorithms' performances. Our results showed that filtering the time series facilitates algorithms' generalisation. The GMM filtering approach revealed that the KNN and RF algorithms produced higher average returns than the random algorithm.


Gradient Reduction Convolutional Neural Network Policy for Financial Deep Reinforcement Learning

Montazeri, Sina, Jumakhan, Haseebullah, Abrasiabian, Sonia, Mirzaeinia, Amir

arXiv.org Artificial Intelligence

Building on our prior explorations of convolutional neural networks (CNNs) for financial data processing, this paper introduces two significant enhancements to refine our CNN model's predictive performance and robustness for financial tabular data. Firstly, we integrate a normalization layer at the input stage to ensure consistent feature scaling, addressing the issue of disparate feature magnitudes that can skew the learning process. This modification is hypothesized to aid in stabilizing the training dynamics and improving the model's generalization across diverse financial datasets. Secondly, we employ a Gradient Reduction Architecture, where earlier layers are wider and subsequent layers are progressively narrower. This enhancement is designed to enable the model to capture more complex and subtle patterns within the data, a crucial factor in accurately predicting financial outcomes. These advancements directly respond to the limitations identified in previous studies, where simpler models struggled with the complexity and variability inherent in financial applications. Initial tests confirm that these changes improve accuracy and model stability, suggesting that deeper and more nuanced network architectures can significantly benefit financial predictive tasks. This paper details the implementation of these enhancements and evaluates their impact on the model's performance in a controlled experimental setting.


Large Language Model Agent in Financial Trading: A Survey

Ding, Han, Li, Yinheng, Wang, Junhao, Chen, Hang

arXiv.org Artificial Intelligence

Trading is a highly competitive task that requires a combination of strategy, knowledge, and psychological fortitude. With the recent success of large language models(LLMs), it is appealing to apply the emerging intelligence of LLM agents in this competitive arena and understanding if they can outperform professional traders. In this survey, we provide a comprehensive review of the current research on using LLMs as agents in financial trading. We summarize the common architecture used in the agent, the data inputs, and the performance of LLM trading agents in backtesting as well as the challenges presented in these research. This survey aims to provide insights into the current state of LLM-based financial trading agents and outline future research directions in this field.