AITopics | Mladenov, Martin

Collaborating Authors

Mladenov, Martin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Minimizing Live Experiments in Recommender Systems: User Simulation to Evaluate Preference Elicitation Policies

Hsu, Chih-Wei, Mladenov, Martin, Meshi, Ofer, Pine, James, Pham, Hubert, Li, Shane, Liang, Xujian, Polishko, Anton, Yang, Li, Scheetz, Ben, Boutilier, Craig

arXiv.org Artificial IntelligenceSep-25-2024

Evaluation of policies in recommender systems typically involves A/B testing using live experiments on real users to assess a new policy's impact on relevant metrics. This ``gold standard'' comes at a high cost, however, in terms of cycle time, user cost, and potential user retention. In developing policies for ``onboarding'' new users, these costs can be especially problematic, since on-boarding occurs only once. In this work, we describe a simulation methodology used to augment (and reduce) the use of live experiments. We illustrate its deployment for the evaluation of ``preference elicitation'' algorithms used to onboard new users of the YouTube Music platform. By developing counterfactually robust user behavior models, and a simulation service that couples such models with production infrastructure, we are able to test new algorithms in a way that reliably predicts their performance on key metrics when deployed live. We describe our domain, our simulation models and platform, results of experiments and deployment, and suggest future steps needed to further realistic simulation as a powerful complement to live experiments.

artificial intelligence, artist, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3626772.3661358

2409.17436

Country:

Asia (0.93)
North America > Canada (0.67)
North America > United States > Massachusetts (0.28)
Oceania > Australia > Victoria > Melbourne (0.14)

Genre: Research Report > Experimental Study (0.46)

Industry:

Media (0.68)
Information Technology > Services (0.67)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Demystifying Embedding Spaces using Large Language Models

Tennenholtz, Guy, Chow, Yinlam, Hsu, Chih-Wei, Jeong, Jihwan, Shani, Lior, Tulepbergenov, Azamat, Ramachandran, Deepak, Mladenov, Martin, Boutilier, Craig

arXiv.org Artificial IntelligenceOct-6-2023

Embeddings have become a pivotal means to represent complex, multi-faceted information about entities, concepts, and relationships in a condensed and useful format. Nevertheless, they often preclude direct interpretation. While downstream tasks make use of these compressed representations, meaningful interpretation usually requires visualization using dimensionality reduction or specialized machine learning interpretability methods. This paper addresses the challenge of making such embeddings more interpretable and broadly useful, by employing Large Language Models (LLMs) to directly interact with embeddings -- transforming abstract vectors into understandable narratives. By injecting embeddings into LLMs, we enable querying and exploration of complex embedding data. We demonstrate our approach on a variety of diverse tasks, including: enhancing concept activation vectors (CAVs), communicating novel embedded entities, and decoding user preferences in recommender systems. Our work couples the immense information potential of embeddings with the interpretative power of LLMs.

demystifying embedding space, large language model, natural language, (1 more...)

arXiv.org Artificial Intelligence

2310.04475

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Modeling Recommender Ecosystems: Research Challenges at the Intersection of Mechanism Design, Reinforcement Learning and Generative Models

Boutilier, Craig, Mladenov, Martin, Tennenholtz, Guy

arXiv.org Artificial IntelligenceSep-21-2023

Modern recommender systems lie at the heart of complex ecosystems that couple the behavior of users, content providers, advertisers, and other actors. Despite this, the focus of the majority of recommender research -- and most practical recommenders of any import -- is on the local, myopic optimization of the recommendations made to individual users. This comes at a significant cost to the long-term utility that recommenders could generate for its users. We argue that explicitly modeling the incentives and behaviors of all actors in the system -- and the interactions among them induced by the recommender's policy -- is strictly necessary if one is to maximize the value the system brings to these actors and improve overall ecosystem "health". Doing so requires: optimization over long horizons using techniques such as reinforcement learning; making inevitable tradeoffs in the utility that can be generated for different actors using the methods of social choice; reducing information asymmetry, while accounting for incentives and strategic behavior, using the tools of mechanism design; better modeling of both user and item-provider behaviors by incorporating notions from behavioral economics and psychology; and exploiting recent advances in generative and foundation models to make these mechanisms interpretable and actionable. We propose a conceptual framework that encompasses these elements, and articulate a number of research challenges that emerge at the intersection of these different disciplines.

machine learning, natural language, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2309.06375

Country:

Asia (1.00)
North America > United States > California (0.67)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre:

Research Report (0.63)
Instructional Material (0.45)

Industry:

Leisure & Entertainment (0.92)
Marketing (0.92)
Information Technology > Services (0.67)
Media > Music (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

Content Prompting: Modeling Content Provider Dynamics to Improve User Welfare in Recommender Ecosystems

Prasad, Siddharth, Mladenov, Martin, Boutilier, Craig

arXiv.org Artificial IntelligenceSep-2-2023

Users derive value from a recommender system (RS) only to the extent that it is able to surface content (or items) that meet their needs/preferences. While RSs often have a comprehensive view of user preferences across the entire user base, content providers, by contrast, generally have only a local view of the preferences of users that have interacted with their content. This limits a provider's ability to offer new content to best serve the broader population. In this work, we tackle this information asymmetry with content prompting policies. A content prompt is a hint or suggestion to a provider to make available novel content for which the RS predicts unmet user demand. A prompting policy is a sequence of such prompts that is responsive to the dynamics of a provider's beliefs, skills and incentives. We aim to determine a joint prompting policy that induces a set of providers to make content available that optimizes user social welfare in equilibrium, while respecting the incentives of the providers themselves. Our contributions include: (i) an abstract model of the RS ecosystem, including content provider behaviors, that supports such prompting; (ii) the design and theoretical analysis of sequential prompting policies for individual providers; (iii) a mixed integer programming formulation for optimal joint prompting using path planning in content space; and (iv) simple, proof-of-concept experiments illustrating how such policies improve ecosystem health and user welfare.

artificial intelligence, machine learning, provider, (18 more...)

arXiv.org Artificial Intelligence

2309.0094

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

pyRDDLGym: From RDDL to Gym Environments

Taitler, Ayal, Gimelfarb, Michael, Jeong, Jihwan, Gopalakrishnan, Sriram, Mladenov, Martin, Liu, Xiaotian, Sanner, Scott

arXiv.org Artificial IntelligenceJul-19-2023

Reinforcement Learning (RL) Sutton and Barto [2018] and Probabilistic planning Puterman [2014] are two research branches that address stochastic problems, often under the Markov assumption for state dynamics. The planning approach requires a given model, while the learning approach improves through repeated interaction with an environment, which can be viewed as a black box. Thus, the tools and the benchmarks for these two branches have grown apart. Learning agents do not require to be able to simulate model-based transitions, and thus frameworks such as OpenAI Gym Brockman et al. [2016] have become a standard, serving also as an interface for third-party benchmarks such as Todorov et al. [2012], Bellemare et al. [2013] and more. As the model is not necessary for solving the learning problem, the environments are hard-coded in a programming language. This has several downsides; if one does wish to see the model describing the environment, it has to be reverse-engineered from the environment framework, complex problems can result in a significant development period, code bugs may make their way into the environment and finally, there is no clean way to verify the model or reuse it directly. Thus, the creation of a verified acceptable benchmark is a challenging task. Planning agents on the other hand can interact with an environment Sanner [2010a], but in many cases simulate the model within the planning agent in order to solve the problem Keller and Eyerich [2012]. The planning community has also come up with formal description languages for various types of problems; these include the Planning Domain Definition Language (PDDL) Aeronautiques et al. [1998] for classical planning problems, PDDL2.1 Fox and Long [2003] for problems involving time and continuous variables, PPDDL Bryce and Buet [2008] for classical planning problems with action probabilistic effects and rewards, and Relational Dynamic Influence Diagram Language (RDDL)

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2211.05939

Country: North America > Canada > Ontario > Toronto (0.15)

Genre: Research Report (0.40)

Industry: Transportation > Air (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

Reinforcement Learning with History-Dependent Dynamic Contexts

Tennenholtz, Guy, Merlis, Nadav, Shani, Lior, Mladenov, Martin, Boutilier, Craig

arXiv.org Artificial IntelligenceMay-17-2023

We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel reinforcement learning framework for history-dependent environments that generalizes the contextual MDP framework to handle non-Markov environments, where contexts change over time. We consider special cases of the model, with a focus on logistic DCMDPs, which break the exponential dependence on history length by leveraging aggregation functions to determine context transitions. This special structure allows us to derive an upper-confidence-bound style algorithm for which we establish regret bounds. Motivated by our theoretical results, we introduce a practical model-based algorithm for logistic DCMDPs that plans in a latent space and uses optimism over history-dependent features. We demonstrate the efficacy of our approach on a recommendation task (using MovieLens data) where user behavior dynamics evolve in response to recommendations.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2302.02061

Country: North America > United States (0.67)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

RecSim NG: Toward Principled Uncertainty Modeling for Recommender Ecosystems

Mladenov, Martin, Hsu, Chih-Wei, Jain, Vihan, Ie, Eugene, Colby, Christopher, Mayoraz, Nicolas, Pham, Hubert, Tran, Dustin, Vendrov, Ivan, Boutilier, Craig

arXiv.org Artificial IntelligenceMar-14-2021

The development of recommender systems that optimize multi-turn interaction with users, and model the interactions of different agents (e.g., users, content providers, vendors) in the recommender ecosystem have drawn increasing attention in recent years. Developing and training models and algorithms for such recommenders can be especially difficult using static datasets, which often fail to offer the types of counterfactual predictions needed to evaluate policies over extended horizons. To address this, we develop RecSim NG, a probabilistic platform for the simulation of multi-agent recommender systems. RecSim NG is a scalable, modular, differentiable simulator implemented in Edward2 and TensorFlow. It offers: a powerful, general probabilistic programming language for agent-behavior specification; tools for probabilistic inference and latent-variable model learning, backed by automatic differentiation and tracing; and a TensorFlow-based runtime for running simulations on accelerated hardware. We describe RecSim NG and illustrate how it can be used to create transparent, configurable, end-to-end models of a recommender ecosystem, complemented by a small set of simple use cases that demonstrate how RecSim NG can help both researchers and practitioners easily develop and train novel algorithms for recommender systems.

deep learning, neural network, recsim ng, (21 more...)

arXiv.org Artificial Intelligence

2103.08057

Country:

Asia (0.93)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment > Games (0.93)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
(3 more...)

Add feedback

Meta-Thompson Sampling

Kveton, Branislav, Konobeev, Mikhail, Zaheer, Manzil, Hsu, Chih-wei, Mladenov, Martin, Boutilier, Craig, Szepesvari, Csaba

arXiv.org Machine LearningFeb-11-2021

Efficient exploration in multi-armed bandits is a fundamental online learning problem. In this work, we propose a variant of Thompson sampling that learns to explore better as it interacts with problem instances drawn from an unknown prior distribution. Our algorithm meta-learns the prior and thus we call it Meta-TS. We propose efficient implementations of Meta-TS and analyze it in Gaussian bandits. Our analysis shows the benefit of meta-learning the prior and is of a broader interest, because we derive the first prior-dependent upper bound on the Bayes regret of Thompson sampling. This result is complemented by empirical evaluation, which shows that Meta-TS quickly adapts to the unknown prior.

big data, educational setting, thompson, (18 more...)

arXiv.org Machine Learning

2102.06129

Country: North America > Canada > Alberta (0.14)

Genre: Research Report (0.64)

Industry: Education > Educational Setting (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.67)

Add feedback

Optimizing Long-term Social Welfare in Recommender Systems: A Constrained Matching Approach

Mladenov, Martin, Creager, Elliot, Ben-Porat, Omer, Swersky, Kevin, Zemel, Richard, Boutilier, Craig

arXiv.org Artificial IntelligenceAug-18-2020

Most recommender systems (RS) research assumes that a user's utility can be maximized independently of the utility of the other agents (e.g., other users, content providers). In realistic settings, this is often not true---the dynamics of an RS ecosystem couple the long-term utility of all agents. In this work, we explore settings in which content providers cannot remain viable unless they receive a certain level of user engagement. We formulate the recommendation problem in this setting as one of equilibrium selection in the induced dynamical system, and show that it can be solved as an optimal constrained matching problem. Our model ensures the system reaches an equilibrium with maximal social welfare supported by a sufficiently diverse set of viable providers. We demonstrate that even in a simple, stylized dynamical RS model, the standard myopic approach to recommendation---always matching a user to the best provider---performs poorly. We develop several scalable techniques to solve the matching problem, and also draw connections to various notions of user regret and fairness, arguing that these outcomes are fairer in a utilitarian sense.

artificial intelligence, optimization problem, provider, (13 more...)

arXiv.org Artificial Intelligence

2008.00104

Country: North America > United States (1.00)

Genre: Research Report (0.81)

Industry:

Media (0.67)
Information Technology (0.46)
Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

Differentiable Meta-Learning in Contextual Bandits

Kveton, Branislav, Mladenov, Martin, Hsu, Chih-Wei, Zaheer, Manzil, Szepesvari, Csaba, Boutilier, Craig

arXiv.org Machine LearningJun-9-2020

We study a contextual bandit setting where the learning agent has access to sampled bandit instances from an unknown prior distribution $\mathcal{P}$. The goal of the agent is to achieve high reward on average over the instances drawn from $\mathcal{P}$. This setting is of a particular importance because it formalizes the offline optimization of bandit policies, to perform well on average over anticipated bandit instances. The main idea in our work is to optimize differentiable bandit policies by policy gradients. We derive reward gradients that reflect the structure of our problem, and propose contextual policies that are parameterized in a differentiable way and have low regret. Our algorithmic and theoretical contributions are supported by extensive experiments that show the importance of baseline subtraction, learned biases, and the practicality of our approach on a range of classification tasks.

big data, cosoftelim, health & medicine, (20 more...)

arXiv.org Machine Learning

2006.05094

Country:

North America > Canada > Alberta (0.14)
North America > United States (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
Information Technology > Data Science > Data Mining > Big Data (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback