AITopics

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Neural Information Processing SystemsMar-20-2026, 00:52:40 GMT

Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision Transformers

Decision Transformers have recently emerged as a new and compelling paradigm for offline Reinforcement Learning (RL), completing a trajectory in an autoregressive way. While improvements have been made to overcome initial shortcomings, online finetuning of decision transformers has been surprisingly under-explored. The widely adopted state-of-the-art Online Decision Transformer (ODT) still struggles when pretrained with low-reward offline data. In this paper, we theoretically analyze the online-finetuning of the decision transformer, showing that the commonly used Return-To-Go (RTG) that's far from the expected return hampers the online fine-tuning process. This problem, however, is well-addressed by the value function and advantage of standard RL algorithms. As suggested by our analysis, in our experiments, we hence find that simply adding TD3 gradients to the finetuning process of ODT effectively improves the online finetuning performance of ODT, especially if ODT is pretrained with low-reward offline data. These findings provide new directions to further improve decision transformers.

artificial intelligence, machine learning, reinforcement learning, (7 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)

Neural Information Processing SystemsDec-23-2025, 18:51:37 GMT

A Scalable MIP-based Method for Learning Optimal Multivariate Decision Trees

Several recent publications report advances in training optimal decision trees (ODTs) using mixed-integer programs (MIPs), due to algorithmic advances in integer programming and a growing interest in addressing the inherent suboptimality of heuristic approaches such as CART. In this paper, we propose a novel MIP formulation, based on 1-norm support vector machine model, to train a binary oblique ODT for classification problems. We further present techniques, such as cutting planes, to tighten its linear relaxation, to improve run times to reach optimality. Using 36 datasets from the University of California Irvine Machine Learning Repository, we demonstrate that our training approach outperforms its counterparts from literature in terms of out-of-sample performance (around 10% improvement in mean out-of-sample testing accuracy). Towards our goal of developing a scalable framework to train multivariate ODT on large datasets, we propose a new linear programming based data selection method to choose a subset of the data, and use it to train a decision tree through our proposed MIP model. We conclude this paper with extensive numerical testing results, that showcase the generalization performance of our new MIP formulation, and the improvement in mean out-of-sample accuracy on large datasets.

learning optimal multivariate decision tree, name change, scalable mip-based method, (7 more...)

Country: North America > United States > California > Orange County > Irvine (0.27)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.60)

Aref, Zahra, Mandayam, Narayan B.

Mental Accounts for Actions: EWA-Inspired Attention in Decision Transformers

arXiv.org Artificial IntelligenceSep-22-2025

Transformers have emerged as a compelling architecture for sequential decision-making by modeling trajectories via self-attention. In reinforcement learning (RL), they enable return-conditioned control without relying on value function approximation. Decision Transformers (DTs) exploit this by casting RL as supervised sequence modeling, but they are restricted to offline data and lack exploration. Online Decision Transformers (ODTs) address this limitation through entropy-regularized training on on-policy rollouts, offering a stable alternative to traditional RL methods like Soft Actor-Critic, which depend on bootstrapped targets and reward shaping. Despite these advantages, ODTs use standard attention, which lacks explicit memory of action-specific outcomes. This leads to inefficiencies in learning long-term action effectiveness. Inspired by cognitive models such as Experience-Weighted Attraction (EWA), we propose Experience-Weighted Attraction with Vector Quantization for Online Decision Transformers (EWA-VQ-ODT), a lightweight module that maintains per-action mental accounts summarizing recent successes and failures. Continuous actions are routed via direct grid lookup to a compact vector-quantized codebook, where each code stores a scalar attraction updated online through decay and reward-based reinforcement. These attractions modulate attention by biasing the columns associated with action tokens, requiring no change to the backbone or training objective. On standard continuous-control benchmarks, EWA-VQ-ODT improves sample efficiency and average return over ODT, particularly in early training. The module is computationally efficient, interpretable via per-code traces, and supported by theoretical guarantees that bound the attraction dynamics and its impact on attention drift.

machine learning, natural language, reinforcement learning, (14 more...)

2509.15498

Genre:

Instructional Material (0.48)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
(2 more...)

Neural Information Processing SystemsMay-26-2025, 23:06:03 GMT

Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision Transformers

Decision Transformers have recently emerged as a new and compelling paradigm for offline Reinforcement Learning (RL), completing a trajectory in an autoregressive way. While improvements have been made to overcome initial shortcomings, online finetuning of decision transformers has been surprisingly under-explored. The widely adopted state-of-the-art Online Decision Transformer (ODT) still struggles when pretrained with low-reward offline data. In this paper, we theoretically analyze the online-finetuning of the decision transformer, showing that the commonly used Return-To-Go (RTG) that's far from the expected return hampers the online fine-tuning process. This problem, however, is well-addressed by the value function and advantage of standard RL algorithms.

artificial intelligence, machine learning, reinforcement learning, (6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Zhuo, Zhengjia, Nagarajan, Viswanath

A Simple Approximation Algorithm for Optimal Decision Tree

arXiv.org Artificial IntelligenceMay-22-2025

Optimal decision tree (\odt) is a fundamental problem arising in applications such as active learning, entity identification, and medical diagnosis. An instance of \odt is given by $m$ hypotheses, out of which an unknown ``true'' hypothesis is drawn according to some probability distribution. An algorithm needs to identify the true hypothesis by making queries: each query incurs a cost and has a known response for each hypothesis. The goal is to minimize the expected query cost to identify the true hypothesis. We consider the most general setting with arbitrary costs, probabilities and responses. \odt is NP-hard to approximate better than $\ln m$ and there are $O(\ln m)$ approximation algorithms known for it. However, these algorithms and/or their analyses are quite complex. Moreover, the leading constant factors are large. We provide a simple algorithm and analysis for \odt, proving an approximation ratio of $8 \ln m$.

artificial intelligence, hypothesis, machine learning, (17 more...)

2505.15641

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Industry: Health & Medicine > Diagnostic Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.64)

Neural Information Processing SystemsOct-9-2024, 14:02:13 GMT

A Scalable MIP-based Method for Learning Optimal Multivariate Decision Trees

Several recent publications report advances in training optimal decision trees (ODTs) using mixed-integer programs (MIPs), due to algorithmic advances in integer programming and a growing interest in addressing the inherent suboptimality of heuristic approaches such as CART. In this paper, we propose a novel MIP formulation, based on 1-norm support vector machine model, to train a binary oblique ODT for classification problems. We further present techniques, such as cutting planes, to tighten its linear relaxation, to improve run times to reach optimality. Using 36 datasets from the University of California Irvine Machine Learning Repository, we demonstrate that our training approach outperforms its counterparts from literature in terms of out-of-sample performance (around 10% improvement in mean out-of-sample testing accuracy). Towards our goal of developing a scalable framework to train multivariate ODT on large datasets, we propose a new linear programming based data selection method to choose a subset of the data, and use it to train a decision tree through our proposed MIP model.

arXiv.org Artificial IntelligenceSep-19-2023

OpenMSD: Towards Multilingual Scientific Documents Similarity Measurement

Gao, Yang, Ma, Ji, Korotkov, Ivan, Hall, Keith, Alon, Dana, Metzler, Don

We develop and evaluate multilingual scientific documents similarity measurement models in this work. Such models can be used to find related works in different languages, which can help multilingual researchers find and explore papers more efficiently. We propose the first multilingual scientific documents dataset, Open-access Multilingual Scientific Documents (OpenMSD), which has 74M papers in 103 languages and 778M citation pairs. With OpenMSD, we pretrain science-specialized language models, and explore different strategies to derive "related" paper pairs to fine-tune the models, including using a mixture of citation, co-citation, and bibliographic-coupling pairs. To further improve the models' performance for non-English papers, we explore the use of generative language models to enrich the non-English papers with English summaries. This allows us to leverage the models' English capabilities to create better representations for non-English papers. Our best model significantly outperforms strong baselines by 7-16% (in mean average precision).

dataset, openmsd, proceedings, (14 more...)

2309.10539

Country:

Asia > China > Hong Kong (0.04)
North America > United States > Washington > King County > Seattle (0.04)
Asia > Singapore (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.46)

Zheng, Qinqing, Zhang, Amy, Grover, Aditya

Online Decision Transformer

arXiv.org Artificial IntelligenceFeb-11-2022

Generative pretraining for sequence modeling has emerged as a unifying paradigm for machine learning in a number of domains and modalities, notably in language and vision (Radford et al., 2018; Chen et al., 2020; Brown et al., 2020; Lu et al., 2022). Recently, such a pretraining paradigm has been extended to offline reinforcement learning (RL) (Chen et al., 2021; Janner et al., 2021), wherein an agent is trained to autoregressively maximize the likelihood of trajectories in the offline dataset. During training, this paradigm essentially converts offline RL to a supervised learning problem (Schmidhuber, 2019; Srivastava et al., 2019; Emmons et al., 2021). However, these works present an incomplete picture as policies learned via offline RL are limited by the quality of the training dataset and need to be finetuned to the task of interest via online interactions. It remains an open question whether such supervised learning paradigm can be extended to online settings. Unlike language and perception, online finetuning for RL is fundamentally different from the pretraining phase as it involves data acquisition via exploration. The need for exploration renders traditional supervised learning objectives (e.g., mean squared error) for offline RL insufficient in the online setting. Moreover, it has been observed that for standard online algorithms, access to offline data can often have zero or even negative effect on the online performance (Nair et al., 2020). Hence, the overall pipeline for offline pretraining followed by online finetuning for RL policies needs a careful consideration of training objectives and protocols.

odt, online, trajectory, (13 more...)

2202.05607

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Indiana (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

da Costa, Victor G. Turrisi, Mastelini, Saulo Martiello, de Carvalho, André C. Ponce de Leon Ferreira, Barbon, Sylvio Jr

Online Local Boosting: improving performance in online decision trees

arXiv.org Machine LearningJul-16-2019

As more data are produced each day, and faster, data stream mining is growing in importance, making clear the need for algorithms able to fast process these data. Data stream mining algorithms are meant to be solutions to extract knowledge online, specially tailored from continuous data problem. Many of the current algorithms for data stream mining have high processing and memory costs. Often, the higher the predictive performance, the higher these costs. To increase predictive performance without largely increasing memory and time costs, this paper introduces a novel algorithm, named Online Local Boosting (OLBoost), which can be combined into online decision tree algorithms to improve their predictive performance without modifying the structure of the induced decision trees. For such, OLBoost applies a boosting to small separate regions of the instances space. Experimental results presented in this paper show that by using OLBoost the online learning decision tree algorithms can significantly improve their predictive performance. Additionally, it can make smaller trees perform as good or better than larger trees.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Machine Learning

1907.07207

Country: South America > Brazil (0.47)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)