Goto

Collaborating Authors

 Mehta, Aashay


On the Query Complexity of Verifier-Assisted Language Generation

arXiv.org Artificial Intelligence

In the simplest form, called best-of-N, the language model generates N candidate responses, which are then scored by the verifier, and the highestscored candidate response is chosen as the output of the inference process (Cobbe et al., 2021; Nakano et al., 2022). If the verifier can score partial generations (sometimes called process reward), the space for inference-time algorithms gets much richer: e.g., the final answer can be generated incrementally, using the verifier to guide the process (e.g., by incremental (blockwise) best-of-N, or more complicated strategies like Monte-Carlo-Tree-Search (Browne et al., 2012; Hao et al., 2023)). Importantly, though a flurry of recent papers consider "scaling laws" of natural strategies, the algorithm design space of verifier-aided inferencetime algorithms is still opaque. In particular, the value of a verifier--and the relationship it needs to have to the generator is not well understood. In this paper, we show that a good verifier can substantially (both in theory and in practice) decrease the computational cost of natural generation tasks, using a pre-trained language model as an oracle. In particular, we show that: Even simple constrained generation tasks--where we are trying to generate a string in the support of a language oracle, subject to some structural constraint (e.g.


Data Cross-Segmentation for Improved Generalization in Reinforcement Learning Based Algorithmic Trading

arXiv.org Artificial Intelligence

The use of machine learning in algorithmic trading systems is increasingly common. In a typical set-up, supervised learning is used to predict the future prices of assets, and those predictions drive a simple trading and execution strategy. This is quite effective when the predictions have sufficient signal, markets are liquid, and transaction costs are low. However, those conditions often do not hold in thinly traded financial markets and markets for differentiated assets such as real estate or vehicles. In these markets, the trading strategy must consider the long-term effects of taking positions that are relatively more difficult to change. In this work, we propose a Reinforcement Learning (RL) algorithm that trades based on signals from a learned predictive model and addresses these challenges. We test our algorithm on 20+ years of equity data from Bursa Malaysia.