AITopics | Lee, Dabeen

Collaborating Authors

Lee, Dabeen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning Infinite-Horizon Average-Reward Linear Mixture MDPs of Bounded Span

Chae, Woojin, Hong, Kihyuk, Zhang, Yufan, Tewari, Ambuj, Lee, Dabeen

arXiv.org Artificial IntelligenceOct-19-2024

This paper proposes a computationally tractable algorithm for learning infinite-horizon average-reward linear mixture Markov decision processes (MDPs) under the Bellman optimality condition. Our algorithm for linear mixture MDPs achieves a nearly minimax optimal regret upper bound of $\widetilde{\mathcal{O}}(d\sqrt{\mathrm{sp}(v^*)T})$ over $T$ time steps where $\mathrm{sp}(v^*)$ is the span of the optimal bias function $v^*$ and $d$ is the dimension of the feature mapping. Our algorithm applies the recently developed technique of running value iteration on a discounted-reward MDP approximation with clipping by the span. We prove that the value iteration procedure, even with the clipping operation, converges. Moreover, we show that the associated variance term due to random transitions can be bounded even under clipping. Combined with the weighted ridge regression-based parameter estimation scheme, this leads to the nearly minimax optimal regret guarantee.

artificial intelligence, machine learning, proceedings, (14 more...)

arXiv.org Artificial Intelligence

2410.14992

Country: North America > United States > Virginia (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

Add feedback

Improved Regret Bound for Safe Reinforcement Learning via Tighter Cost Pessimism and Reward Optimism

Yu, Kihyun, Lee, Duksang, Overman, William, Lee, Dabeen

arXiv.org Artificial IntelligenceOct-14-2024

This paper studies the safe reinforcement learning problem formulated as an episodic finite-horizon tabular constrained Markov decision process with an unknown transition kernel and stochastic reward and cost functions. We propose a model-based algorithm based on novel cost and reward function estimators that provide tighter cost pessimism and reward optimism. While guaranteeing no constraint violation in every episode, our algorithm achieves a regret upper bound of $\widetilde{\mathcal{O}}((\bar C - \bar C_b)^{-1}H^{2.5} S\sqrt{AK})$ where $\bar C$ is the cost budget for an episode, $\bar C_b$ is the expected cost under a safe baseline policy over an episode, $H$ is the horizon, and $S$, $A$ and $K$ are the number of states, actions, and episodes, respectively. This improves upon the best-known regret upper bound, and when $\bar C- \bar C_b=\Omega(H)$, it nearly matches the regret lower bound of $\Omega(H^{1.5}\sqrt{SAK})$. We deduce our cost and reward function estimators via a Bellman-type law of total variance to obtain tight bounds on the expected sum of the variances of value function estimates. This leads to a tighter dependence on the horizon in the function estimators. We also present numerical results to demonstrate the computational effectiveness of our proposed framework.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2410.10158

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.84)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

Reinforcement Learning for Infinite-Horizon Average-Reward MDPs with Multinomial Logistic Function Approximation

Park, Jaehyun, Lee, Dabeen

arXiv.org Artificial IntelligenceJun-19-2024

We study model-based reinforcement learning with non-linear function approximation where the transition function of the underlying Markov decision process (MDP) is given by a multinomial logistic (MNL) model. In this paper, we develop two algorithms for the infinite-horizon average reward setting. Our first algorithm \texttt{UCRL2-MNL} applies to the class of communicating MDPs and achieves an $\tilde{\mathcal{O}}(dD\sqrt{T})$ regret, where $d$ is the dimension of feature mapping, $D$ is the diameter of the underlying MDP, and $T$ is the horizon. The second algorithm \texttt{OVIFH-MNL} is computationally more efficient and applies to the more general class of weakly communicating MDPs, for which we show a regret guarantee of $\tilde{\mathcal{O}}(d^{2/5} \mathrm{sp}(v^*)T^{4/5})$ where $\mathrm{sp}(v^*)$ is the span of the associated optimal bias function. We also prove a lower bound of $\Omega(d\sqrt{DT})$ for learning communicating MDPs with MNL transitions of diameter at most $D$. Furthermore, we show a regret lower bound of $\Omega(dH^{3/2}\sqrt{K})$ for learning $H$-horizon episodic MDPs with MNL function approximation where $K$ is the number of episodes, which improves upon the best-known lower bound for the finite-horizon setting.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2406.13633

Country: North America > United States > Virginia (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.81)

Add feedback

Parameter-Free Algorithms for Performative Regret Minimization under Decision-Dependent Distributions

Park, Sungwoo, Kwon, Junyeop, Kim, Byeongnoh, Chae, Suhyun, Lee, Jeeyong, Lee, Dabeen

arXiv.org Artificial IntelligenceFeb-23-2024

We consider the general case where the per-formative risk can be non-convex, for which we develop efficient parameter-free optimistic optimization-based methods. Our algorithms significantly improve upon the existing Lips-chitz bandit-based method in many aspects. In particular, our framework does not require knowledge about the sensitivity parameter of the distribution map and the Lipshitz constant of the loss function. This makes our framework practically favorable, together with the efficient optimistic optimization-based tree-search mechanism. We provide experimental results that demonstrate the numerical superiority of our algorithms over the existing method and other black-box optimistic optimization methods.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2402.15188

Country:

Asia (0.46)
North America > United States > New York (0.14)

Genre: Research Report (1.00)

Industry:

Banking & Finance (0.67)
Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.93)

Add feedback

Stochastic-Constrained Stochastic Optimization with Markovian Data

Kim, Yeongjong, Lee, Dabeen

arXiv.org Artificial IntelligenceDec-7-2023

This paper considers stochastic-constrained stochastic optimization where the stochastic constraint is to satisfy that the expectation of a random function is below a certain threshold. In particular, we study the setting where data samples are drawn from a Markov chain and thus are not independent and identically distributed. We generalize the drift-plus-penalty framework, a primal-dual stochastic gradient method developed for the i.i.d. case, to the Markov chain sampling setting. We propose two variants of drift-plus-penalty; one is for the case when the mixing time of the underlying Markov chain is known while the other is for the case of unknown mixing time. In fact, our algorithms apply to a more general setting of constrained online convex optimization where the sequence of constraint functions follows a Markov chain. Both algorithms are adaptive in that the first works without knowledge of the time horizon while the second uses AdaGrad-style algorithm parameters, which is of independent interest. We demonstrate the effectiveness of our proposed methods through numerical experiments on classification with fairness constraints.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2312.04312

Country:

Asia (0.28)
North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Online Resource Allocation in Episodic Markov Decision Processes

Lee, Duksang, Overman, William, Lee, Dabeen

arXiv.org Artificial IntelligenceOct-18-2023

This paper studies a long-term resource allocation problem over multiple periods where each period requires a multi-stage decision-making process. We formulate the problem as an online allocation problem in an episodic finite-horizon constrained Markov decision process with an unknown non-stationary transition function and stochastic non-stationary reward and resource consumption functions. We propose the observe-then-decide regime and improve the existing decide-then-observe regime, while the two settings differ in how the observations and feedback about the reward and resource consumption functions are given to the decision-maker. We develop an online dual mirror descent algorithm that achieves near-optimal regret bounds for both settings. For the observe-then-decide regime, we prove that the expected regret against the dynamic clairvoyant optimal policy is bounded by $\tilde O(\rho^{-1}{H^{3/2}}S\sqrt{AT})$ where $\rho\in(0,1)$ is the budget parameter, $H$ is the length of the horizon, $S$ and $A$ are the numbers of states and actions, and $T$ is the number of episodes. For the decide-then-observe regime, we show that the regret against the static optimal policy that has access to the mean reward and mean resource consumption functions is bounded by $\tilde O(\rho^{-1}{H^{3/2}}S\sqrt{AT})$ with high probability. We test the numerical efficiency of our method for a variant of the resource-constrained inventory management problem.

artificial intelligence, episodic markov decision process, machine learning, (1 more...)

arXiv.org Artificial Intelligence

2305.10744

Genre: Research Report (0.69)

Technology:

Information Technology > Decision Support Systems (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.60)

Add feedback

Online Convex Optimization with Stochastic Constraints: Zero Constraint Violation and Bandit Feedback

Kim, Yeongjong, Lee, Dabeen

arXiv.org Artificial IntelligenceJul-13-2023

This paper studies online convex optimization with stochastic constraints. We propose a variant of the drift-plus-penalty algorithm that guarantees $O(\sqrt{T})$ expected regret and zero constraint violation, after a fixed number of iterations, which improves the vanilla drift-plus-penalty method with $O(\sqrt{T})$ constraint violation. Our algorithm is oblivious to the length of the time horizon $T$, in contrast to the vanilla drift-plus-penalty method. This is based on our novel drift lemma that provides time-varying bounds on the virtual queue drift and, as a result, leads to time-varying bounds on the expected virtual queue length. Moreover, we extend our framework to stochastic-constrained online convex optimization under two-point bandit feedback. We show that by adapting our algorithmic framework to the bandit feedback setting, we may still achieve $O(\sqrt{T})$ expected regret and zero constraint violation, improving upon the previous work for the case of identical constraint functions. Numerical results demonstrate our theoretical results.

artificial intelligence, constraint violation and bandit feedback, constraint-based reasoning, (3 more...)

arXiv.org Artificial Intelligence

2301.11267

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)

Add feedback

Projection-Free Online Convex Optimization with Stochastic Constraints

Lee, Duksang, Ho-Nguyen, Nam, Lee, Dabeen

arXiv.org Artificial IntelligenceMay-16-2023

This paper develops projection-free algorithms for online convex optimization with stochastic constraints. We design an online primal-dual projection-free framework that can take any projection-free algorithms developed for online convex optimization with no long-term constraint. With this general template, we deduce sublinear regret and constraint violation bounds for various settings. Moreover, for the case where the loss and constraint functions are smooth, we develop a primal-dual conditional gradient method that achieves $O(\sqrt{T})$ regret and $O(T^{3/4})$ constraint violations. Furthermore, for the setting where the loss and constraint functions are stochastic and strong duality holds for the associated offline stochastic optimization problem, we prove that the constraint violation can be reduced to have the same asymptotic growth as the regret.

artificial intelligence, constraint, constraint-based reasoning, (15 more...)

arXiv.org Artificial Intelligence

2305.01333

Country: North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)

Add feedback

Learning to Schedule

Lee, Dabeen, Vojnovic, Milan

arXiv.org Machine LearningMay-28-2021

We consider the following algorithmic question: given a list of jobs, each of which requires a certain number of time steps to be completed while incurring a random cost in every time step until finished, learn the relative priorities of jobs and make scheduling decisions of which job to process in each time step, with the objective of minimizing the expected total cumulative cost. Here, we need an algorithm that seamlessly integrates learning and scheduling. This question is motivated by several applications. Modern data processing platforms handle complex jobs whose characteristics are often unknown in advance, in which case, it is difficult to judge which jobs have higher priorities than others before accumulating enough information about the jobs [11]. Here, there is significant uncertainty in determining the relative importance of jobs or tasks, and moreover, the population of jobs may be highly dynamic, e.g., they may have all distinct features. Nevertheless, these platforms need to start processing jobs in a sequence based on partial information, which potentially results in undesired delays. However, as a system learns more about the jobs' features, it may flexibly adjust scheduling decisions to serve the jobs with high priority first.

artificial intelligence, machine learning, time step, (18 more...)

arXiv.org Machine Learning

2105.13655

Country:

North America > United States (0.14)
Europe > United Kingdom (0.14)

Genre: Research Report (0.64)

Industry: Information Technology (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
Information Technology > Artificial Intelligence > Machine Learning (0.67)

Add feedback