AITopics | Slivkins, Aleksandrs

Collaborating Authors

Slivkins, Aleksandrs

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Greedy Algorithm for Structured Bandits: A Sharp Characterization of Asymptotic Success / Failure

Slivkins, Aleksandrs, Xu, Yunzong, Zuo, Shiliang

arXiv.org Artificial IntelligenceMar-5-2025

We study the greedy (exploitation-only) algorithm in bandit problems with a known reward structure. We allow arbitrary finite reward structures, while prior work focused on a few specific ones. We fully characterize when the greedy algorithm asymptotically succeeds or fails, in the sense of sublinear vs. linear regret as a function of time. Our characterization identifies a partial identifiability property of the problem instance as the necessary and sufficient condition for the asymptotic success. Notably, once this property holds, the problem becomes easy--any algorithm will succeed (in the same sense as above), provided it satisfies a mild non-degeneracy condition. We further extend our characterization to contextual bandits and interactive decision-making with arbitrary feedback, and demonstrate its broad applicability across various examples. Keywords: Multi-armed bandits, contextual bandits, structured bandits, greedy algorithm, regret.

bandit, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2503.0401

Country:

North America > United States > Illinois (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.81)

Add feedback

Should You Use Your Large Language Model to Explore or Exploit?

Harris, Keegan, Slivkins, Aleksandrs

arXiv.org Artificial IntelligenceJan-31-2025

We evaluate the ability of the current generation of large language models (LLMs) to help a decision-making agent facing an exploration-exploitation tradeoff. We use LLMs to explore and exploit in silos in various (contextual) bandit tasks. We find that while the current LLMs often struggle to exploit, in-context mitigations may be used to substantially improve performance for small-scale tasks. However even then, LLMs perform worse than a simple linear regression. On the other hand, we find that LLMs do help at exploring large action spaces with inherent semantics, by suggesting suitable candidates to explore.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2502.00225

Country: North America > United States (0.92)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Banking & Finance > Trading (1.00)
Energy > Oil & Gas > Upstream (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploration and Persuasion

Slivkins, Aleksandrs

arXiv.org Artificial IntelligenceOct-22-2024

How to incentivize self-interested agents to explore when they prefer to exploit? Consider a population of self-interested agents that make decisions under uncertainty. They "explore" to acquire new information and "exploit" this information to make good decisions. Collectively they need to balance these two objectives, but their incentives are skewed toward exploitation. This is because exploration is costly, but its benefits are spread over many agents in the future. "Incentivized Exploration" addresses this issue via strategic communication. Consider a benign ``principal" which can communicate with the agents and make recommendations, but cannot force the agents to comply. Moreover, suppose the principal can observe the agents' decisions and the outcomes of these decisions. The goal is to design a communication and recommendation policy which (i) achieves a desirable balance between exploration and exploitation, and (ii) incentivizes the agents to follow recommendations. What makes it feasible is "information asymmetry": the principal knows more than any one agent, as it collects information from many. It is essential that the principal does not fully reveal all its knowledge to the agents. Incentivized exploration combines two important problems in, resp., machine learning and theoretical economics. First, if agents always follow recommendations, the principal faces a multi-armed bandit problem: essentially, design an algorithm that balances exploration and exploitation. Second, interaction with a single agent corresponds to "Bayesian persuasion", where a principal leverages information asymmetry to convince an agent to take a particular action. We provide a brief but self-contained introduction to each problem through the lens of incentivized exploration, solving a key special case of the former as a sub-problem of the latter.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2410.17086

Country:

Europe (0.28)
North America > United States (0.28)

Genre: Research Report (0.40)

Industry:

Health & Medicine (1.00)
Information Technology > Services (0.67)
Energy > Oil & Gas > Upstream (0.54)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.34)

Add feedback

Can large language models explore in-context?

Krishnamurthy, Akshay, Harris, Keegan, Foster, Dylan J., Zhang, Cyril, Slivkins, Aleksandrs

arXiv.org Artificial IntelligenceJul-12-2024

We investigate the extent to which contemporary Large Language Models (LLMs) can engage in exploration, a core capability in reinforcement learning and decision making. We focus on native performance of existing LLMs, without training interventions. We deploy LLMs as agents in simple multi-armed bandit environments, specifying the environment description and interaction history entirely in-context, i.e., within the LLM prompt. We experiment with GPT-3.5, GPT-4, and Llama2, using a variety of prompt designs, and find that the models do not robustly engage in exploration without substantial interventions: i) Across all of our experiments, only one configuration resulted in satisfactory exploratory behavior: GPT-4 with chain-of-thought reasoning and an externally summarized interaction history, presented as sufficient statistics; ii) All other configurations did not result in robust exploratory behavior, including those with chain-of-thought reasoning but unsummarized history. Although these findings can be interpreted positively, they suggest that external summarization -- which may not be possible in more complex settings -- is important for obtaining desirable behavior from LLM agents. We conclude that non-trivial algorithmic interventions, such as fine-tuning or dataset curation, may be required to empower LLM-based decision making agents in complex settings.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2403.15371

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Impact of Decentralized Learning on Player Utilities in Stackelberg Games

Donahue, Kate, Immorlica, Nicole, Jagadeesan, Meena, Lucier, Brendan, Slivkins, Aleksandrs

arXiv.org Artificial IntelligenceJun-21-2024

When deployed in the world, a learning agent such as a recommender system or a chatbot often repeatedly interacts with another learning agent (such as a user) over time. In many such two-agent systems, each agent learns separately and the rewards of the two agents are not perfectly aligned. To better understand such cases, we examine the learning dynamics of the two-agent system and the implications for each agent's objective. We model these systems as Stackelberg games with decentralized learning and show that standard regret benchmarks (such as Stackelberg equilibrium payoffs) result in worst-case linear regret for at least one player. To better capture these systems, we construct a relaxed regret benchmark that is tolerant to small learning errors by agents. We show that standard learning algorithms fail to provide sublinear regret, and we develop algorithms to achieve near-optimal $O(T^{2/3})$ regret for both players with respect to these benchmarks. We further design relaxed environments under which faster learning ($O(\sqrt{T})$) is possible. Altogether, our results take a step towards assessing how two-agent interactions in sequential and decentralized learning environments affect the utility of both agents.

artificial intelligence, follower, game theory, (17 more...)

arXiv.org Artificial Intelligence

2403.00188

Country:

Europe (1.00)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.70)

Industry: Leisure & Entertainment (0.67)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Incentivized Exploration via Filtered Posterior Sampling

Kalvit, Anand, Slivkins, Aleksandrs, Gur, Yonatan

arXiv.org Artificial IntelligenceFeb-20-2024

A principal(social planner) interacts sequentially with a flow of self-interested agents that each take actions, consume information, and produce information over time. The planner's goal is to maximize the aggregate utility of all agents it interacts with, which necessitates agents to occasionally take exploratory actions that might otherwise be deemed inferior from an empirical standpoint. While such exploratory actions are the cornerstone of online learning as they help the principal learn the best actions over time, they also represent misaligned incentives between the principal and individual agents. How can a welfare-maximizing principal achieve her goal in the presence of such misaligned incentives? This is the essence of the incentivized exploration problem.

artificial intelligence, bandit, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2402.13338

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Industry: Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.86)

Add feedback

Robust and Performance Incentivizing Algorithms for Multi-Armed Bandits with Strategic Agents

Esmaeili, Seyed A., Shin, Suho, Slivkins, Aleksandrs

arXiv.org Artificial IntelligenceDec-13-2023

We consider a variant of the stochastic multi-armed bandit problem. Specifically, the arms are strategic agents who can improve their rewards or absorb them. The utility of an agent increases if she is pulled more or absorbs more of her rewards but decreases if she spends more effort improving her rewards. Agents have heterogeneous properties, specifically having different means and able to improve their rewards up to different levels. Further, a non-empty subset of agents are ''honest'' and in the worst case always give their rewards without absorbing any part. The principal wishes to obtain a high revenue (cumulative reward) by designing a mechanism that incentives top level performance at equilibrium. At the same time, the principal wishes to be robust and obtain revenue at least at the level of the honest agent with the highest mean in case of non-equilibrium behaviour. We identify a class of MAB algorithms which we call performance incentivizing which satisfy a collection of properties and show that they lead to mechanisms that incentivize top level performance at equilibrium and are robust under any strategy profile. Interestingly, we show that UCB is an example of such a MAB algorithm. Further, in the case where the top performance level is unknown we show that combining second price auction ideas with performance incentivizing algorithms achieves performance at least at the second top level while also being robust.

artificial intelligence, big data, data mining, (19 more...)

arXiv.org Artificial Intelligence

2312.07929

Country: North America > United States > Maryland (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.36)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

Algorithmic Persuasion Through Simulation: Information Design in the Age of Generative AI

Harris, Keegan, Immorlica, Nicole, Lucier, Brendan, Slivkins, Aleksandrs

arXiv.org Artificial IntelligenceNov-29-2023

How can an informed sender persuade a receiver, having only limited information about the receiver's beliefs? Motivated by research showing generative AI can simulate economic agents, we initiate the study of information design with an oracle. We assume the sender can learn more about the receiver by querying this oracle, e.g., by simulating the receiver's behavior. Aside from AI motivations such as general-purpose Large Language Models (LLMs) and problem-specific machine learning models, alternate motivations include customer surveys and querying a small pool of live users. Specifically, we study Bayesian Persuasion where the sender has a second-order prior over the receiver's beliefs. After a fixed number of queries to an oracle to refine this prior, the sender commits to an information structure. Upon receiving the message, the receiver takes a payoff-relevant action maximizing her expected utility given her posterior beliefs. We design polynomial-time querying algorithms that optimize the sender's expected utility in this Bayesian Persuasion game. As a technical contribution, we show that queries form partitions of the space of receiver beliefs that can be used to quantify the sender's knowledge.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2311.18138

Country:

North America > United States > Hawaii (0.14)
Europe > Middle East > Cyprus (0.14)

Genre:

Research Report (0.82)
Questionnaire & Opinion Survey (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.60)

Add feedback

Bandit Social Learning: Exploration under Myopic Behavior

Banihashem, Kiarash, Hajiaghayi, MohammadTaghi, Shin, Suho, Slivkins, Aleksandrs

arXiv.org Artificial IntelligenceNov-3-2023

Reviews and ratings are pervasive in many online platforms. A customer consults reviews/ratings, then chooses a product and then (often) leaves feedback, which is aggregated by the platform and served to future customers. Collectively, customers face a tradeoff between exploration and exploitation, i.e., between acquiring new information while making potentially suboptimal decisions and making optimal decisions using available information. However, individual customers tend to act myopically and favor exploitation, without regards to exploration for the sake of the others. On a high level, we ask whether/how the myopic behavior interferes with efficient exploration. We are particularly interested in learning failures when only a few agents choose an optimal action.

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2302.07425

Country:

North America > United States (1.00)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre:

Research Report (0.64)
Instructional Material (0.46)

Industry:

Education > Curriculum (0.41)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression

Slivkins, Aleksandrs, Sankararaman, Karthik Abinav, Foster, Dylan J.

arXiv.org Artificial IntelligenceJun-11-2023

We consider contextual bandits with linear constraints (CBwLC), a variant of contextual bandits in which the algorithm consumes multiple resources subject to linear constraints on total consumption. This problem generalizes contextual bandits with knapsacks (CBwK), allowing for packing and covering constraints, as well as positive and negative resource consumption. We provide the first algorithm for CBwLC (or CBwK) that is based on regression oracles. The algorithm is simple, computationally efficient, and admits vanishing regret. It is statistically optimal for the variant of CBwK in which the algorithm must stop once some constraint is violated. Further, we provide the first vanishing-regret guarantees for CBwLC (or CBwK) that extend beyond the stochastic environment. We side-step strong impossibility results from prior work by identifying a weaker (and, arguably, fairer) benchmark to compare against. Our algorithm builds on LagrangeBwK (Immorlica et al., FOCS 2019), a Lagrangian-based technique for CBwK, and SquareCB (Foster and Rakhlin, ICML 2020), a regression-based technique for contextual bandits. Our analysis leverages the inherent modularity of both techniques.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2211.07484

Country:

North America > United States > Massachusetts (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Game Theory (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Add feedback