AITopics | tfw-ucrl2

Collaborating Authors

tfw-ucrl2

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Regret Minimization for Reinforcement Learning with Vectorial Feedback and Complex Objectives

Wang Chi Cheung

Neural Information Processing SystemsFeb-13-2026, 07:22:20 GMT

Neural Information Processing Systems http://nips.cc/

agent, proceedings, tfw-ucrl2, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

Regret Minimization for Reinforcement Learning with Vectorial Feedback and Complex Objectives

Wang Chi Cheung

Neural Information Processing SystemsOct-3-2025, 08:38:07 GMT

Due to state transitions, it is challenging to balance the contribution from each dimension for achieving near-optimality.

agent, proceedings, tfw-ucrl2, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

respond to the major points raised by the reviewers (for each point, we refer to the particular reviewers that raised it)

Neural Information Processing SystemsAug-16-2025, 03:22:52 GMT

We thank all reviewers for their thoughtful feedback that can help enhance the presentation of our results. We will clarify this decision (as the reviewer recommends). P AC bound by taking the resulting mixture policy. We will add a note in the final version. The knapsack solver is provided in Appendix A.3 and is a linear program with We will discuss the additional challenges that arise in these settings and explicitly state them as future directions.

major point, particular reviewer, reviewer, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.35)

Add feedback

Reviews: Regret Minimization for Reinforcement Learning with Vectorial Feedback and Complex Objectives

Neural Information Processing SystemsJan-26-2025, 03:15:57 GMT

Summary: This paper studies a generalization of online reinforcement learning (in the infinite horizon undiscounted setting with finite state and action space and communicating MDP) where the agent aims at maximizing a certain type of concave function of the rewards (extended to global concave functions in appendix). More precisely, every time an action "a" is played in state "s", the agent receives a vector of rewards V(s,a) (instead of a scalar reward r(s,a)) and tries to maximize a concave function of the empirical average of the vectorial outcomes. This problem is very general and models a wide variety of different settings ranging from multi-objective optimization in MDPs, to maximum entropy exploration and online learning in MDPs with knapsack constraints. In section 2 the authors introduce the necessary background and formalize the notions of "optimal gain" and "regret" in this setting. Defining the "optimal gain" (called the "offline benchmark" in the paper) is not straightforward.

regret minimization, reinforcement learning, vectorial feedback and complex objective, (8 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.76)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.62)

Add feedback