Goto

Collaborating Authors

 taxnodes:Technology: Instructional Materials


Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning

Neural Information Processing Systems

However, existing algorithms and theories for learning near-optimal policies in these two settings are rather different and disconnected. Towards bridging this gap, this paper initiates the theoretical study of policy finetuning, that is, online RL where the learner has additional access to a "reference policy" µ close to the optimal policy π


Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning

Neural Information Processing Systems

However, existing algorithms and theories for learning near-optimal policies in these two settings are rather different and disconnected. Towards bridging this gap, this paper initiates the theoretical study of policy finetuning, that is, online RL where the learner has additional access to a "reference policy" µ close to the optimal policy π


The Hardness Analysis of Thompson Sampling for Combinatorial Semi-bandits with Greedy Oracle Fang Kong Yueran Yang 1 Wei Chen 2 Shuai Li

Neural Information Processing Systems

Thompson sampling (TS) has attracted a lot of interest in the bandit area. It was introduced in the 1930s but has not been theoretically proven until recent years. All of its analysis in the combinatorial multi-armed bandit (CMAB) setting requires an exact oracle to provide optimal solutions with any input. However, such an oracle is usually not feasible since many combinatorial optimization problems are NP-hard and only approximation oracles are available. An example [30] has shown the failure of TS to learn with an approximation oracle. However, this oracle is uncommon and is designed only for a specific problem instance. It is still an open question whether the convergence analysis of TS can be extended beyond the exact oracle in CMAB. In this paper, we study this question under the greedy oracle, which is a common (approximation) oracle with theoretical guarantees to solve many (offline) combinatorial optimization problems.


Hokoff: Real Game Dataset from Honor of Kings and its Offline Reinforcement Learning Benchmarks

Neural Information Processing Systems

The advancement of Offline Reinforcement Learning (RL) and Offline Multi-Agent Reinforcement Learning (MARL) critically depends on the availability of highquality, pre-collected offline datasets that represent real-world complexities and practical applications. However, existing datasets often fall short in their simplicity and lack of realism. To address this gap, we propose Hokoff, a comprehensive set of pre-collected datasets that covers both offline RL and offline MARL, accompanied by a robust framework, to facilitate further research. This data is derived from Honor of Kings, a recognized Multiplayer Online Battle Arena (MOBA) game known for its intricate nature, closely resembling real-life situations.


Evolving Standardization for Continual Domain Generalization over Temporal Drift

Neural Information Processing Systems

The capability of generalizing to out-of-distribution data is crucial for the deployment of machine learning models in the real world. Existing domain generalization (DG) mainly embarks on offline and discrete scenarios, where multiple source domains are simultaneously accessible and the distribution shift among domains is abrupt and violent. Nevertheless, such setting may not be universally applicable to all real-world applications, as there are cases where the data distribution gradually changes over time due to various factors, e.g., the process of aging. Additionally, as the domain constantly evolves, new domains will continually emerge. Re-training and updating models with both new and previous domains using existing DG methods can be resource-intensive and inefficient.


Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search Department of Computer Science Department of Computer Science Aalto University

Neural Information Processing Systems

In this work we consider Code World Models, world models generated by a Large Language Model (LLM) in the form of Python code for model-based Reinforcement Learning (RL). Calling code instead of LLMs for planning has potential to be more precise, reliable, interpretable, and extremely efficient. However, writing appropriate Code World Models requires the ability to understand complex instructions, to generate exact code with non-trivial logic and to self-debug a long program with feedback from unit tests and environment trajectories. To address these challenges, we propose Generate, Improve and Fix with Monte Carlo Tree Search (GIF-MCTS), a new code generation strategy for LLMs. To test our approach in an offline RL setting, we introduce the Code World Models Benchmark (CWMB), a suite of program synthesis and planning tasks comprised of 18 diverse RL environments paired with corresponding textual descriptions and curated trajectories. GIF-MCTS surpasses all baselines on the CWMB and two other benchmarks, and we show that the Code World Models synthesized with it can be successfully used for planning, resulting in model-based RL agents with greatly improved sample efficiency and inference speed.


This AI photo editor is like Photoshop without the learning curve -- save 86%

Mashable

TL;DR: Save 86% on the award-winning AI photo editor Luminar Neo, which comes with a video training course and six packs of preset photo filters. You've heard of Photoshop, but what is Luminar Neo? Luminar Neo is an easy-to-use and award-winning AI photo editor that allows everyone to edit their photos with beginner and advanced tools. Instead of spending years mastering Photoshop, spend minutes learning to use this AI photo editing tool. Now, Mashable readers can get a lifetime pass to this AI Photoshop alternative for just 69.58 (reg. First, the Luminar Neo photo editor, which is available for Windows, Mac, or as a plugin for Photoshop or Lightroom.


Symbolic Regression via Neural-Guided Genetic Programming Population Seeding

Neural Information Processing Systems

Symbolic regression is the process of identifying mathematical expressions that fit observed output from a black-box process. It is a discrete optimization problem generally believed to be NP-hard. Prior approaches to solving the problem include neural-guided search (e.g. using reinforcement learning) and genetic programming. In this work, we introduce a hybrid neural-guided/genetic programming approach to symbolic regression and other combinatorial optimization problems. We propose a neural-guided component used to seed the starting population of a random restart genetic programming component, gradually learning better starting populations. On a number of common benchmark tasks to recover underlying expressions from a dataset, our method recovers 65% more expressions than a recently published top-performing model using the same experimental setup. We demonstrate that running many genetic programming generations without interdependence on the neural-guided component performs better for symbolic regression than alternative formulations where the two are more strongly coupled. Finally, we introduce a new set of 22 symbolic regression benchmark problems with increased difficulty over existing benchmarks.


Slide2Text: Leveraging LLMs for Personalized Textbook Generation from PowerPoint Presentations

arXiv.org Artificial Intelligence

The rapid advancements in Large Language Models (LLMs) have revolutionized educational technology, enabling innovative approaches to automated and personalized content creation. This paper introduces Slide2Text, a system that leverages LLMs to transform PowerPoint presentations into customized textbooks. By extracting slide content using OCR, organizing it into a coherent structure, and generating tailored materials such as explanations, exercises, and references, Slide2Text streamlines the textbook creation process. Flexible customization options further enhance its adaptability to diverse educational needs. The system highlights the potential of LLMs in modernizing textbook creation and improving educational accessibility. Future developments will explore multimedia inputs and advanced user customization features.


Lifelong Evolution of Swarms

arXiv.org Artificial Intelligence

Adapting to task changes without forgetting previous knowledge is a key skill for intelligent systems, and a crucial aspect of lifelong learning. Swarm controllers, however, are typically designed for specific tasks, lacking the ability to retain knowledge across changing tasks. Lifelong learning, on the other hand, focuses on individual agents with limited insights into the emergent abilities of a collective like a swarm. To address this gap, we introduce a lifelong evolutionary framework for swarms, where a population of swarm controllers is evolved in a dynamic environment that incrementally presents novel tasks. This requires evolution to find controllers that quickly adapt to new tasks while retaining knowledge of previous ones, as they may reappear in the future. We discover that the population inherently preserves information about previous tasks, and it can reuse it to foster adaptation and mitigate forgetting. In contrast, the top-performing individual for a given task catastrophically forgets previous tasks. To mitigate this phenomenon, we design a regularization process for the evolutionary algorithm, reducing forgetting in top-performing individuals. Evolving swarms in a lifelong fashion raises fundamental questions on the current state of deep lifelong learning and on the robustness of swarm controllers in dynamic environments.