Goto

Collaborating Authors

 Search


Diverse Transformer Decoding for Offline Reinforcement Learning Using Financial Algorithmic Approaches

arXiv.org Artificial Intelligence

Offline Reinforcement Learning (RL) algorithms learn a policy using a fixed training dataset, which is then deployed online to interact with the environment and make decisions. Transformers, a standard choice for modeling time-series data, are gaining popularity in offline RL. In this context, Beam Search (BS), an approximate inference algorithm, is the go-to decoding method. Offline RL eliminates the need for costly or risky online data collection. However, the restricted dataset induces uncertainty as the agent may encounter unfamiliar sequences of states and actions during execution that were not covered in the training data. In this context, BS lacks two important properties essential for offline RL: It does not account for the aforementioned uncertainty, and its greedy left-right search approach often results in sequences with minimal variations, failing to explore potentially better alternatives. To address these limitations, we propose Portfolio Beam Search (PBS), a simple-yet-effective alternative to BS that balances exploration and exploitation within a Transformer model during decoding. We draw inspiration from financial economics and apply these principles to develop an uncertainty-aware diversification mechanism, which we integrate into a sequential decoding algorithm at inference time. We empirically demonstrate the effectiveness of PBS on the D4RL locomotion benchmark, where it achieves higher returns and significantly reduces outcome variability.


Dual Formulation for Non-Rectangular Lp Robust Markov Decision Processes

arXiv.org Artificial Intelligence

We study robust Markov decision processes (RMDPs) with non-rectangular uncertainty sets, which capture interdependencies across states unlike traditional rectangular models. While non-rectangular robust policy evaluation is generally NP-hard, even in approximation, we identify a powerful class of $L_p$-bounded uncertainty sets that avoid these complexity barriers due to their structural simplicity. We further show that this class can be decomposed into infinitely many \texttt{sa}-rectangular $L_p$-bounded sets and leverage its structural properties to derive a novel dual formulation for $L_p$ RMDPs. This formulation provides key insights into the adversary's strategy and enables the development of the first robust policy evaluation algorithms for non-rectangular RMDPs. Empirical results demonstrate that our approach significantly outperforms brute-force methods, establishing a promising foundation for future investigation into non-rectangular robust MDPs.


Inference-time sparse attention with asymmetric indexing

arXiv.org Artificial Intelligence

Self-attention in transformer models is an incremental associative memory that maps key vectors to value vectors. One way to speed up self-attention is to employ GPU-compliant vector search algorithms, yet the standard partitioning methods yield poor results in this context, because (1) keys and queries follow different distributions and (2) the effect of RoPE positional encoding. In this paper, we introduce SAAP (Self-Attention with Asymmetric Partitions), which overcomes these problems. It is an asymmetrical indexing technique that employs distinct partitions for keys and queries, thereby approximating self-attention with a data-adaptive sparsity pattern. It works on pretrained language models without finetuning, as it only requires to train (offline) a small query classifier. On a long context Llama 3.1-8b model, with sequences ranging from 100k to 500k tokens, our method typically reduces by a factor 20 the fraction of memory that needs to be looked-up, which translates to a time saving of 60\% when compared to FlashAttention-v2.


Strong bounds for large-scale Minimum Sum-of-Squares Clustering

arXiv.org Artificial Intelligence

Clustering is a fundamental technique in data analysis and machine learning, used to group similar data points together. Among various clustering methods, the Minimum Sum-of-Squares Clustering (MSSC) is one of the most widely used. MSSC aims to minimize the total squared Euclidean distance between data points and their corresponding cluster centroids. Due to the unsupervised nature of clustering, achieving global optimality is crucial, yet computationally challenging. The complexity of finding the global solution increases exponentially with the number of data points, making exact methods impractical for large-scale datasets. Even obtaining strong lower bounds on the optimal MSSC objective value is computationally prohibitive, making it difficult to assess the quality of heuristic solutions. We address this challenge by introducing a novel method to validate heuristic MSSC solutions through optimality gaps. Our approach employs a divide-and-conquer strategy, decomposing the problem into smaller instances that can be handled by an exact solver. The decomposition is guided by an auxiliary optimization problem, the "anticlustering problem", for which we design an efficient heuristic. Computational experiments demonstrate the effectiveness of the method for large-scale instances, achieving optimality gaps below 3% in most cases while maintaining reasonable computational times. These results highlight the practicality of our approach in assessing feasible clustering solutions for large datasets, bridging a critical gap in MSSC evaluation.


LLM4GNAS: A Large Language Model Based Toolkit for Graph Neural Architecture Search

arXiv.org Artificial Intelligence

Graph Neural Architecture Search (GNAS) facilitates the automatic design of Graph Neural Networks (GNNs) tailored to specific downstream graph learning tasks. However, existing GNAS approaches often require manual adaptation to new graph search spaces, necessitating substantial code optimization and domain-specific knowledge. To address this challenge, we present LLM4GNAS, a toolkit for GNAS that leverages the generative capabilities of Large Language Models (LLMs). LLM4GNAS includes an algorithm library for graph neural architecture search algorithms based on LLMs, enabling the adaptation of GNAS methods to new search spaces through the modification of LLM prompts. This approach reduces the need for manual intervention in algorithm adaptation and code modification. The LLM4GNAS toolkit is extensible and robust, incorporating LLM-enhanced graph feature engineering, LLM-enhanced graph neural architecture search, and LLM-enhanced hyperparameter optimization. Experimental results indicate that LLM4GNAS outperforms existing GNAS methods on tasks involving both homogeneous and heterogeneous graphs.


Review for NeurIPS paper: Optimal visual search based on a model of target detectability in natural images

Neural Information Processing Systems

This paper presents a method to measure target detectability in natural images. It provides a visual search model (based on extracted features of a pre-trained CNN) to perform target detectability as a function of retinal eccentricity for human vision. Reviewers, including myself, appreciate that this paper tackles a topic that has not been well investigated in the visual search literature. The approach is well-motivated and paper is well written, and comparison with human data is a nice validation of the approach. There were issues concerning correctness of the approach, along with minor points, but the author's rebuttal has done an adequate job in addressing the concerns and I expect to see the camera ready version of the paper incorporate improvements to at will improve the clarity of the paper (esp with regards to reviewer's main concerns) using the extra page. I think this will be a nice addition to the NeurIPS2020 conference encouraging the community to look at a fresh topic, so I'm going to recommend we accept this work as a poster.


Review for NeurIPS paper: Minimax Value Interval for Off-Policy Evaluation and Policy Optimization

Neural Information Processing Systems

Weaknesses: The study of bias issue is important, but I am not fully convinced the motivation of this so-called "confidence interval". Normally the confidence interval is designed for uncertain quantification and thus of great practical interest. However, although the authors explicitly point out they do not consider uncertainties, this will rule out all the important applications that typical CI could do (safe RL or else) (this CI will not be valid in practice due to estimation error). Thus, I can only view the contribution in this paper as sort of additional guarantee for the algorithm proposed in "Minimax Weight and Q-Function Learning for Off-Policy Evaluation" since the algorithms are the same. Solely quantifying a bias of an existing estimator may not be viewed as sufficiently significant.


Review for NeurIPS paper: Minimax Value Interval for Off-Policy Evaluation and Policy Optimization

Neural Information Processing Systems

The paper provides a very general minimax framework for quantifying the bias/approximation error in off-policy evaluation, and the results apply to a range of OPE methods. Reviewers generally agree that this is a good paper and there is contribution. One potentially improvable direction would be to quantify the statistical noise in off-policy evaluation, which is nontrivial but extremely important. Reviewers, AC and SAC also agree that such analysis could be left for future work. We would also like to strongly suggest that the authors consider rephrase/explain the wording "confidence interval".


Review for NeurIPS paper: Minimax Classification with 0-1 Loss and Performance Guarantees

Neural Information Processing Systems

Summary and Contributions: This paper presents minimax risk classifiers (MRCs) that do not rely on a choice of surrogate loss and family of rules. The goal of MRC is to find a classification rule that minimize the worst-case expected 0-1 loss with respect to a class of possible distributions. It first represents data, probability distributions and classification rules by matrices. The estimated classifier is cast as a linear optimization problem in which the uncertainty set is cast as the linear constraints. Some performance guarantees are proved, and numerical comparisons are conducted.


Review for NeurIPS paper: Minimax Classification with 0-1 Loss and Performance Guarantees

Neural Information Processing Systems

This paper presents an interesting new perspective on the design of learning methods: the idea is to choose a classifier that minimizes the risk function uniformly over a family of distributions, constructed based on an iid data set, with the guarantee that (with high probability) the true data-generating distribution is contained in the family. This inherently supplies an upper bound on the risk of the chosen classifier. The family of distributions is generated by constraints on the expectation of a function Phi of (x,y), using data-dependent confidence bounds on its true expectation to set the constraints. Thus, the method is highly dependent on the choice of the function Phi. One significant concern noted by the reviewers is that the paper doesn't seem to explore this dependence in much depth, such as providing an array of illustrative examples and design principles for Phi, discussion of how choices of Phi for a given sample size may relate to notions of expressiveness and overfitting, or checking whether the technique can provide guarantees competitive with known results obtained by more traditional approaches (e.g., kernel methods, or ERM guarantees from uniform convergence).