mean return
Constrained Meta Agnostic Reinforcement Learning
Daaboul, Karam, Kuhm, Florian, Joseph, Tim, Zoellner, J. Marius
Meta-Reinforcement Learning (Meta-RL) aims to acquire meta-knowledge for quick adaptation to diverse tasks. However, applying these policies in real-world environments presents a significant challenge in balancing rapid adaptability with adherence to environmental constraints. Our novel approach, Constraint Model Agnostic Meta Learning (C-MAML), merges meta learning with constrained optimization to address this challenge. C-MAML enables rapid and efficient task adaptation by incorporating task-specific constraints directly into its meta-algorithm framework during the training phase. This fusion results in safer initial parameters for learning new tasks. We demonstrate the effectiveness of C-MAML in simulated locomotion with wheeled robot tasks of varying complexity, highlighting its practicality and robustness in dynamic environments.
- Oceania > Australia > New South Wales > Sydney (0.14)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (2 more...)
- Research Report > Promising Solution (0.48)
- Overview > Innovation (0.34)
THaLLE: Text Hyperlocally Augmented Large Language Extension -- Technical Report
Labs, KBTG, Khamnuansin, Danupat, Petchsod, Atthakorn, Lertpiya, Anuruth, Balee, Pornchanan, Lodkaew, Thanawat, Chalothorn, Tawunrat, Pongthawornkamol, Thadpong, Lertsutthiwong, Monchai
Large Language Models (LLMs) have emerged as leading tools in Natural Language Processing (NLP) due to their exceptional performance across various tasks. The advent of open-source models such as Llama [1] from Meta, Gemma [2] from Google, and Qwen [3] from Alibaba has significantly enhanced public access to advanced LLMs. Additionally, low-cost techniques for LLM fine-tuning, such as Low-rank Adaptation (LoRA) [4], have enabled the fine-tuning of these models on consumer-grade hardware, thereby accelerating their development and adoption. LLMs are now utilized in a wide array of applications, ranging from personal assistants, i.e., ChatGPT, to specialized tasks in diverse domains. In the financial sector, BloombergGPT [5], a proprietary LLM trained from the ground up with an infusion of financial data, has demonstrated superior performance on financial benchmarks compared to other models in the market.
- Europe > Monaco (0.04)
- Asia > Middle East > Jordan (0.04)
Challenges for Reinforcement Learning in Quantum Computing
Altmann, Philipp, Bärligea, Adelina, Stein, Jonas, Kölle, Michael, Gabor, Thomas, Phan, Thomy, Linnhoff-Popien, Claudia
Quantum computing (QC) in the current NISQ-era is still limited. To gain early insights and advantages, hybrid applications are widely considered mitigating those shortcomings. Hybrid quantum machine learning (QML) comprises both the application of QC to improve machine learning (ML), and the application of ML to improve QC architectures. This work considers the latter, focusing on leveraging reinforcement learning (RL) to improve current QC approaches. We therefore introduce various generic challenges arising from quantum architecture search and quantum circuit optimization that RL algorithms need to solve to provide benefits for more complex applications and combinations of those. Building upon these challenges we propose a concrete framework, formalized as a Markov decision process, to enable to learn policies that are capable Figure 1: Quantum Circuit Designer for qubits with depth of controlling a universal set of quantum gates. Furthermore, we provide benchmark results to assess shortcomings and strengths of current state-of-the-art algorithms.
- North America > United States > California (0.14)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Scaling Laws for Imitation Learning in NetHack
Tuyls, Jens, Madeka, Dhruv, Torkkola, Kari, Foster, Dean, Narasimhan, Karthik, Kakade, Sham
Imitation Learning (IL) is one of the most widely used methods in machine learning. Yet, while powerful, many works find it is often not able to fully recover the underlying expert behavior [1-3]. However, none of these works deeply investigate the role of scaling up the model and data size. Inspired by recent work in Natural Language Processing (NLP) [4, 5] where "scaling up" has resulted in increasingly more capable LLMs, we investigate whether carefully scaling up model and data size can bring similar improvements in the imitation learning setting. To demonstrate our findings, we focus on the game of NetHack, a challenging environment featuring procedural generation, stochasticity, long-term dependencies, and partial observability. We find IL loss and mean return scale smoothly with the compute budget and are strongly correlated, resulting in power laws for training compute-optimal IL agents with respect to model size and number of samples. We forecast and train several NetHack agents with IL and find they outperform prior state-of-the-art by at least 2x in all settings. Our work both demonstrates the scaling behavior of imitation learning in a challenging domain, as well as the viability of scaling up current approaches for increasingly capable agents in NetHack, a game that remains elusively hard for current AI systems.
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
DIRECT: Learning from Sparse and Shifting Rewards using Discriminative Reward Co-Training
Altmann, Philipp, Phan, Thomy, Ritz, Fabian, Gabor, Thomas, Linnhoff-Popien, Claudia
We propose discriminative reward co-training (DIRECT) as an extension to deep reinforcement learning algorithms. Building upon the concept of self-imitation learning (SIL), we introduce an imitation buffer to store beneficial trajectories generated by the policy determined by their return. A discriminator network is trained concurrently to the policy to distinguish between trajectories generated by the current policy and beneficial trajectories generated by previous policies. The discriminator's verdict is used to construct a reward signal for optimizing the policy. By interpolating prior experience, DIRECT is able to act as a surrogate, steering policy optimization towards more valuable regions of the reward landscape thus learning an optimal policy. Our results show that DIRECT outperforms state-of-the-art algorithms in sparse- and shifting-reward environments being able to provide a surrogate reward to the policy and direct the optimization towards valuable areas.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > Canada > Alberta (0.14)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
- (5 more...)
Deep Learning Statistical Arbitrage
Guijarro-Ordonez, Jorge, Pelger, Markus, Zanotti, Greg
Statistical arbitrage exploits temporal price differences between similar assets. We develop a unifying conceptual framework for statistical arbitrage and a novel data driven solution. First, we construct arbitrage portfolios of similar assets as residual portfolios from conditional latent asset pricing factors. Second, we extract their time series signals with a powerful machine-learning time-series solution, a convolutional transformer. Lastly, we use these signals to form an optimal trading policy, that maximizes risk-adjusted returns under constraints. Our comprehensive empirical study on daily US equities shows a high compensation for arbitrageurs to enforce the law of one price. Our arbitrage strategies obtain consistently high out-of-sample mean returns and Sharpe ratios, and substantially outperform all benchmark approaches.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > Indonesia > Bali (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Information Technology > Data Science (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)