Jeong, Jihwan
Factual and Personalized Recommendations using Language Models and Reinforcement Learning
Jeong, Jihwan, Chow, Yinlam, Tennenholtz, Guy, Hsu, Chih-Wei, Tulepbergenov, Azamat, Ghavamzadeh, Mohammad, Boutilier, Craig
Recommender systems (RSs) play a central role in connecting users to content, products, and services, matching candidate items to users based on their preferences. While traditional RSs rely on implicit user feedback signals, conversational RSs interact with users in natural language. In this work, we develop a comPelling, Precise, Personalized, Preference-relevant language model (P4LM) that recommends items to users while putting emphasis on explaining item characteristics and their relevance. P4LM uses the embedding space representation of a user's preferences to generate compelling responses that are factually-grounded and relevant w.r.t. the user's preferences. Moreover, we develop a joint reward function that measures precision, appeal, and personalization, which we use as AI-based feedback in a reinforcement learning-based language model framework. Using the MovieLens 25M dataset, we demonstrate that P4LM delivers compelling, personalized movie narratives to users.
Demystifying Embedding Spaces using Large Language Models
Tennenholtz, Guy, Chow, Yinlam, Hsu, Chih-Wei, Jeong, Jihwan, Shani, Lior, Tulepbergenov, Azamat, Ramachandran, Deepak, Mladenov, Martin, Boutilier, Craig
Embeddings have become a pivotal means to represent complex, multi-faceted information about entities, concepts, and relationships in a condensed and useful format. Nevertheless, they often preclude direct interpretation. While downstream tasks make use of these compressed representations, meaningful interpretation usually requires visualization using dimensionality reduction or specialized machine learning interpretability methods. This paper addresses the challenge of making such embeddings more interpretable and broadly useful, by employing Large Language Models (LLMs) to directly interact with embeddings -- transforming abstract vectors into understandable narratives. By injecting embeddings into LLMs, we enable querying and exploration of complex embedding data. We demonstrate our approach on a variety of diverse tasks, including: enhancing concept activation vectors (CAVs), communicating novel embedded entities, and decoding user preferences in recommender systems. Our work couples the immense information potential of embeddings with the interpretative power of LLMs.
pyRDDLGym: From RDDL to Gym Environments
Taitler, Ayal, Gimelfarb, Michael, Jeong, Jihwan, Gopalakrishnan, Sriram, Mladenov, Martin, Liu, Xiaotian, Sanner, Scott
Reinforcement Learning (RL) Sutton and Barto [2018] and Probabilistic planning Puterman [2014] are two research branches that address stochastic problems, often under the Markov assumption for state dynamics. The planning approach requires a given model, while the learning approach improves through repeated interaction with an environment, which can be viewed as a black box. Thus, the tools and the benchmarks for these two branches have grown apart. Learning agents do not require to be able to simulate model-based transitions, and thus frameworks such as OpenAI Gym Brockman et al. [2016] have become a standard, serving also as an interface for third-party benchmarks such as Todorov et al. [2012], Bellemare et al. [2013] and more. As the model is not necessary for solving the learning problem, the environments are hard-coded in a programming language. This has several downsides; if one does wish to see the model describing the environment, it has to be reverse-engineered from the environment framework, complex problems can result in a significant development period, code bugs may make their way into the environment and finally, there is no clean way to verify the model or reuse it directly. Thus, the creation of a verified acceptable benchmark is a challenging task. Planning agents on the other hand can interact with an environment Sanner [2010a], but in many cases simulate the model within the planning agent in order to solve the problem Keller and Eyerich [2012]. The planning community has also come up with formal description languages for various types of problems; these include the Planning Domain Definition Language (PDDL) Aeronautiques et al. [1998] for classical planning problems, PDDL2.1 Fox and Long [2003] for problems involving time and continuous variables, PPDDL Bryce and Buet [2008] for classical planning problems with action probabilistic effects and rewards, and Relational Dynamic Influence Diagram Language (RDDL)
Conservative Bayesian Model-Based Value Expansion for Offline Policy Optimization
Jeong, Jihwan, Wang, Xiaoyu, Gimelfarb, Michael, Kim, Hyunwoo, Abdulhai, Baher, Sanner, Scott
Offline reinforcement learning (RL) addresses the problem of learning a performant policy from a fixed batch of data collected by following some behavior policy. Model-based approaches are particularly appealing in the offline setting since they can extract more learning signals from the logged dataset by learning a model of the environment. However, the performance of existing model-based approaches falls short of model-free counterparts, due to the compounding of estimation errors in the learned model. Driven by this observation, we argue that it is critical for a model-based method to understand when to trust the model and when to rely on model-free estimates, and how to act conservatively w.r.t. both. To this end, we derive an elegant and simple methodology called conservative Bayesian model-based value expansion for offline policy optimization (CBOP), that trades off model-free and model-based estimates during the policy evaluation step according to their epistemic uncertainties, and facilitates conservatism by taking a lower bound on the Bayesian posterior value estimate. On the standard D4RL continuous control tasks, we find that our method significantly outperforms previous model-based approaches: e.g., MOPO by $116.4$%, MOReL by $23.2$% and COMBO by $23.7$%. Further, CBOP achieves state-of-the-art performance on $11$ out of $18$ benchmark datasets while doing on par on the remaining datasets.
Adversarial Shapley Value Experience Replay for Task-Free Continual Learning
Mai, Zheda, Shim, Dongsub, Jeong, Jihwan, Sanner, Scott, Kim, Hyunwoo, Jang, Jongseong
Continual learning is a branch of deep learning that seeks to strike a balance between learning stability and plasticity. In this paper, we specifically focus on the task-free setting where data are streamed online without task metadata and clear task boundaries. A simple and highly effective algorithm class for this setting is known as Experience Replay (ER) that selectively stores data samples from previous experience and leverages them to interleave memory-based and online batch learning updates. Recent advances in ER have proposed novel methods for scoring which samples to store in memory and which memory samples to interleave with online data during learning updates. In this paper, we contribute a novel Adversarial Shapley value ER (ASER) method that scores memory data samples according to their ability to preserve latent decision boundaries for previously observed classes (to maintain learning stability and avoid forgetting) while interfering with latent decision boundaries of current classes being learned (to encourage plasticity and optimal learning of new class boundaries). Overall, we observe that ASER provides competitive or improved performance on a variety of datasets compared to state-of-the-art ER-based continual learning methods.
Batch-level Experience Replay with Review for Continual Learning
Mai, Zheda, Kim, Hyunwoo, Jeong, Jihwan, Sanner, Scott
Current CL methods can be taxonomized into three major categories: regularization-based, parameter isolation, Continual learning is a branch of deep learning that and memory-based methods [14]. Some regularizationbased seeks to strike a balance between learning stability and methods encode the knowledge from past tasks into plasticity. The CVPR 2020 CLVision Continual Learning a prior and utilize the prior to either regularize the update for Computer Vision challenge is dedicated to evaluating of parameters that were important to past tasks [8, 18, 13] and advancing the current state-of-the-art continual while others leverage knowledge distillation from the model learning methods using the CORe50 dataset with trained on previous tasks to the model being trained on three different continual learning scenarios.