AITopics

2510.04168

Country:

Europe (0.46)
North America > United States (0.46)

Genre: Research Report > New Finding (0.66)

Industry:

Materials > Metals & Mining (1.00)
Construction & Engineering (1.00)
Energy (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Artificial IntelligenceOct-20-2025

ACON: Optimizing Context Compression for Long-horizon LLM Agents

Kang, Minki, Chen, Wei-Ning, Han, Dongge, Inan, Huseyin A., Wutschitz, Lukas, Chen, Yanzhi, Sim, Robert, Rajmohan, Saravan

Large language models (LLMs) are increasingly deployed as agents in dynamic, real-world environments, where success requires both reasoning and effective tool use. A central challenge for agentic tasks is the growing context length, as agents must accumulate long histories of actions and observations. This expansion raises costs and reduces efficiency in long-horizon tasks, yet prior work on context compression has mostly focused on single-step tasks or narrow applications. We introduce Agent Context Optimization (ACON), a unified framework that optimally compresses both environment observations and interaction histories into concise yet informative condensations. ACON leverages compression guideline optimization in natural language space: given paired trajectories where full context succeeds but compressed context fails, capable LLMs analyze the causes of failure, and the compression guideline is updated accordingly. Furthermore, we propose distilling the optimized LLM compressor into smaller models to reduce the overhead of the additional module. Experiments on AppWorld, OfficeBench, and Multi-objective QA show that ACON reduces memory usage by 26-54% (peak tokens) while largely preserving task performance, preserves over 95% of accuracy when distilled into smaller compressors, and enhances smaller LMs as long-horizon agents with up to 46% performance improvement. Our code is available at https://github.com/microsoft/acon.

large language model, machine learning, natural language, (21 more...)

2510.00615

Country:

North America > United States (1.00)
Asia (0.93)
Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (1.00)
Media > Music (0.47)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

arXiv.org Artificial IntelligenceOct-20-2025

VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications

He, Wei, Sun, Yueqing, Hao, Hongyan, Hao, Xueyuan, Xia, Zhikang, Gu, Qi, Han, Chengcheng, Zhao, Dengchang, Su, Hui, Zhang, Kefeng, Gao, Man, Su, Xi, Cai, Xiaodong, Cai, Xunliang, Yang, Yu, Zhao, Yunke

As LLM-based agents are increasingly deployed in real-life scenarios, existing benchmarks fail to capture their inherent complexity of handling extensive information, leveraging diverse resources, and managing dynamic user interactions. To address this gap, we introduce VitaBench, a challenging benchmark that evaluates agents on versatile interactive tasks grounded in real-world settings. Drawing from daily applications in food delivery, in-store consumption, and online travel services, VitaBench presents agents with the most complex life-serving simulation environment to date, comprising 66 tools. Through a framework that eliminates domain-specific policies, we enable flexible composition of these scenarios and tools, yielding 100 cross-scenario tasks (main results) and 300 single-scenario tasks. Each task is derived from multiple real user requests and requires agents to reason across temporal and spatial dimensions, utilize complex tool sets, proactively clarify ambiguous instructions, and track shifting user intent throughout multi-turn conversations. Moreover, we propose a rubric-based sliding window evaluator, enabling robust assessment of diverse solution pathways in complex environments and stochastic interactions. Our comprehensive evaluation reveals that even the most advanced models achieve only 30% success rate on cross-scenario tasks, and less than 50% success rate on others. Overall, we believe VitaBench will serve as a valuable resource for advancing the development of AI agents in practical real-world applications. The code, dataset, and leaderboard are available at https://vitabench.github.io/

large language model, machine learning, natural language, (18 more...)

2509.2649

Country:

North America > United States (0.46)
Asia > China > Liaoning Province (0.18)
Europe > Austria > Vienna (0.14)

Genre: Research Report (1.00)

Industry:

Transportation > Passenger (1.00)
Health & Medicine (1.00)
Transportation > Ground > Rail (0.94)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

arXiv.org Machine LearningOct-17-2025

deFOREST: Fusing Optical and Radar satellite data for Enhanced Sensing of Tree-loss

Castrillon-Candas, Julio Enrique, Gu, Hanfeng, Meredith, Caleb, Li, Yulin, Tang, Xiaojing, Olofsson, Pontus, Kon, Mark

In this paper we develop a deforestation detection pipeline that incorporates optical and Synthetic Aperture Radar (SAR) data. A crucial component of the pipeline is the construction of anomaly maps of the optical data, which is done using the residual space of a discrete Karhunen-Loève (KL) expansion. Anomalies are quantified using a concentration bound on the distribution of the residual components for the nominal state of the forest. This bound does not require prior knowledge on the distribution of the data. This is in contrast to statistical parametric methods that assume knowledge of the data distribution, an impractical assumption that is especially infeasible for high dimensional data such as ours. Once the optical anomaly maps are computed they are combined with SAR data, and the state of the forest is classified by using a Hidden Markov Model (HMM). We test our approach with Sentinel-1 (SAR) and Sentinel-2 (Optical) data on a $92.19\,km \times 91.80\,km$ region in the Amazon forest. The results show that both the hybrid optical-radar and optical only methods achieve high accuracy that is superior to the recent state-of-the-art hybrid method. Moreover, the hybrid method is significantly more robust in the case of sparse optical data that are common in highly cloudy regions.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2510.14092

Country:

Asia > Southeast Asia (0.04)
South America > Brazil (0.04)
North America > United States > Massachusetts > Middlesex County > Natick (0.04)
(8 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Energy (0.71)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.89)

Verraest, Aderik, Bahnam, Stavrow, Ferede, Robin, de Croon, Guido, De Wagter, Christophe

SkyDreamer: Interpretable End-to-End Vision-Based Drone Racing with Model-Based Reinforcement Learning

Autonomous drone racing (ADR) systems have recently achieved champion-level performance, yet remain highly specific to drone racing. While end-to-end vision-based methods promise broader applicability, no system to date simultaneously achieves full sim-to-real transfer, onboard execution, and champion-level performance. In this work, we present SkyDreamer, to the best of our knowledge, the first end-to-end vision-based ADR policy that maps directly from pixel-level representations to motor commands. SkyDreamer builds on informed Dreamer, a model-based reinforcement learning approach where the world model decodes to privileged information only available during training. By extending this concept to end-to-end vision-based ADR, the world model effectively functions as an implicit state and parameter estimator, greatly improving interpretability. SkyDreamer runs fully onboard without external aid, resolves visual ambiguities by tracking progress using the state decoded from the world model's hidden state, and requires no extrinsic camera calibration, enabling rapid deployment across different drones without retraining. Real-world experiments show that SkyDreamer achieves robust, high-speed flight, executing tight maneuvers such as an inverted loop, a split-S and a ladder, reaching speeds of up to 21 m/s and accelerations of up to 6 g. It further demonstrates a non-trivial visual sim-to-real transfer by operating on poor-quality segmentation masks, and exhibits robustness to battery depletion by accurately estimating the maximum attainable motor RPM and adjusting its flight path in real-time. These results highlight SkyDreamer's adaptability to important aspects of the reality gap, bringing robustness while still achieving extremely high-speed, agile flight.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2510.14783

Country: Europe > Austria (0.28)

Genre: Research Report (0.64)

Industry: Transportation > Air (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

The Bidding Games: Reinforcement Learning for MEV Extraction on Polygon Blockchain

Seoev, Andrei, Gremyachikh, Leonid, Smirnova, Anastasiia, Madhwal, Yash, Kalacheva, Alisa, Belousov, Dmitry, Zubov, Ilia, Smirnov, Aleksei, Fedyanin, Denis, Gorgadze, Vladimir, Yanovich, Yury

In blockchain networks, the strategic ordering of transactions within blocks has emerged as a significant source of profit extraction, known as Maximal Extractable Value (MEV). The transition from spam-based Priority Gas Auctions to structured auction mechanisms like Polygon Atlas has transformed MEV extraction from public bidding wars into sealed-bid competitions under extreme time constraints. While this shift reduces network congestion, it introduces complex strategic challenges where searchers must make optimal bidding decisions within a sub-second window without knowledge of competitor behavior or presence. Traditional game-theoretic approaches struggle in this high-frequency, partially observable environment due to their reliance on complete information and static equilibrium assumptions. We present a reinforcement learning framework for MEV extraction on Polygon Atlas and make three contributions: (1) A novel simulation environment that accurately models the stochastic arrival of arbitrage opportunities and probabilistic competition in Atlas auctions; (2) A PPO-based bidding agent optimized for real-time constraints, capable of adaptive strategy formulation in continuous action spaces while maintaining production-ready inference speeds; (3) Empirical validation demonstrating our history-conditioned agent captures 49\% of available profits when deployed alongside existing searchers and 81\% when replacing the market leader, significantly outperforming static bidding strategies. Our work establishes that reinforcement learning provides a critical advantage in high-frequency MEV environments where traditional optimization methods fail, offering immediate value for industrial participants and protocol designers alike.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2510.14642

Country:

North America > United States (0.28)
Europe > Russia (0.18)

Genre: Research Report (0.82)

Industry:

Information Technology (1.00)
Banking & Finance > Trading (1.00)
Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > e-Commerce > Financial Technology (1.00)
Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Park, Jaewan, Cho, Solbee, Lee, Jay-Yoon

Stop-RAG: Value-Based Retrieval Control for Iterative RAG

Iterative retrieval-augmented generation (RAG) enables large language models to answer complex multi-hop questions, but each additional loop increases latency, costs, and the risk of introducing distracting evidence, motivating the need for an efficient stopping strategy. Existing methods either use a predetermined number of iterations or rely on confidence proxies that poorly reflect whether more retrieval will actually help. We cast iterative RAG as a finite-horizon Markov decision process and introduce Stop-RAG, a value-based controller that adaptively decides when to stop retrieving. Trained with full-width forward-view Q($λ$) targets from complete trajectories, Stop-RAG learns effective stopping policies while remaining compatible with black-box APIs and existing pipelines. On multi-hop question-answering benchmarks, Stop-RAG consistently outperforms both fixed-iteration baselines and prompting-based stopping with LLMs. These results highlight adaptive stopping as a key missing component in current agentic systems, and demonstrate that value-based control can improve the accuracy of RAG systems.

computational linguistic, large language model, machine learning, (19 more...)

2510.14337

Country:

North America > United States (0.67)
Europe (0.67)
Asia > Middle East > UAE (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Mangannavar, Rajesh, Tadepalli, Prasad

GammaZero: Learning To Guide POMDP Belief Space Search With Graph Representations

We introduce an action-centric graph representation framework for learning to guide planning in Partially Observable Markov Decision Processes (POMDPs). Unlike existing approaches that require domain-specific neural architectures and struggle with scalability, GammaZero leverages a unified graph-based belief representation that enables generalization across problem sizes within a domain. Our key insight is that belief states can be systematically transformed into action-centric graphs where structural patterns learned on small problems transfer to larger instances. We employ a graph neural network with a decoder architecture to learn value functions and policies from expert demonstrations on computationally tractable problems, then apply these learned heuristics to guide Monte Carlo tree search on larger problems. Experimental results on standard POMDP benchmarks demonstrate that GammaZero achieves comparable performance to BetaZero when trained and tested on the same-sized problems, while uniquely enabling zero-shot generalization to problems 2-4 times larger than those seen during training, maintaining solution quality with reduced search requirements. Partially observable Markov decision processes (POMDPs) provide a principled framework for sequential decision-making under uncertainty, where agents must act based on incomplete information about the true state of the environment Kaelbling et al. (1998). This partial observability arises naturally in many real-world applications, from autonomous driving where sensors provide limited field-of-view Hoel et al. (2019), to robotic manipulation where object properties must be inferred through interaction Lauri et al. (2022), to subsurface exploration where underground structures can only be observed at sparse drilling locations Mern & Caers (2023).

artificial intelligence, machine learning, representation, (17 more...)

2510.14035

Country: North America > United States > Oregon (0.14)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Machine LearningOct-17-2025

Active Measuring in Reinforcement Learning With Delayed Negative Effects

Gao, Daiqi, Xu, Ziping, Rawashdeh, Aseel, Klasnja, Predrag, Murphy, Susan A.

Measuring states in reinforcement learning (RL) can be costly in real-world settings and may negatively influence future outcomes. We introduce the Actively Observable Markov Decision Process (AOMDP), where an agent not only selects control actions but also decides whether to measure the latent state. The measurement action reveals the true latent state but may have a negative delayed effect on the environment. We show that this reduced uncertainty may provably improve sample efficiency and increase the value of the optimal policy despite these costs. We formulate an AOMDP as a periodic partially observable MDP and propose an online RL algorithm based on belief states. To approximate the belief states, we further propose a sequential Monte Carlo method to jointly approximate the posterior of unknown static environment parameters and unobserved latent states. We evaluate the proposed algorithm in a digital health application, where the agent decides when to deliver digital interventions and when to assess users' health status through surveys.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

2510.14315

Genre: Research Report (1.00)

Industry: Health & Medicine > Consumer Health (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Machine LearningOct-17-2025

Policy Regularized Distributionally Robust Markov Decision Processes with Linear Function Approximation

Gu, Jingwen, He, Yiting, Liu, Zhishuai, Xu, Pan

Decision-making under distribution shift is a central challenge in reinforcement learning (RL), where training and deployment environments differ. We study this problem through the lens of robust Markov decision processes (RMDPs), which optimize performance against adversarial transition dynamics. Our focus is the online setting, where the agent has only limited interaction with the environment, making sample efficiency and exploration especially critical. Policy optimization, despite its success in standard RL, remains theoretically and empirically underexplored in robust RL. To bridge this gap, we propose \textbf{D}istributionally \textbf{R}obust \textbf{R}egularized \textbf{P}olicy \textbf{O}ptimization algorithm (DR-RPO), a model-free online policy optimization method that learns robust policies with sublinear regret. To enable tractable optimization within the softmax policy class, DR-RPO incorporates reference-policy regularization, yielding RMDP variants that are doubly constrained in both transitions and policies. To scale to large state-action spaces, we adopt the $d$-rectangular linear MDP formulation and combine linear function approximation with an upper confidence bonus for optimistic exploration. We provide theoretical guarantees showing that policy optimization can achieve polynomial suboptimality bounds and sample efficiency in robust RL, matching the performance of value-based approaches. Finally, empirical results across diverse domains corroborate our theory and demonstrate the robustness of DR-RPO.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2510.14246

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.61)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.60)