AITopics | soft margin

In offline reinforcement learning (RL) we have no opportunity to explore so we must make assumptions that the data is sufficient to guide picking a good policy, taking the form of assuming some coverage, realizability, Bellman completeness, and/or hard margin (gap). In this work we propose value-based algorithms for offline RL with PAC guarantees under just partial coverage, specifically, coverage of just a single comparator policy, and realizability of soft (entropy-regularized) Q-function of the single policy and a related function defined as a saddle point of certain minimax optimization problem. This offers refined and generally more lax conditions for offline RL. We further show an analogous result for vanilla Q-functions under a soft margin condition. To attain these guarantees, we leverage novel minimax learning algorithms to accurately estimate soft or vanilla Q-functions with $L^2$-convergence guarantees. Our algorithms' loss functions arise from casting the estimation problems as nonlinear convex optimization problems and Lagrangifying.

arxiv preprint arxiv, offline data, q-function, (13 more...)

arXiv.org Machine Learning

2302.02392

Country: North America > United States > Illinois > Cook County > Chicago (0.04)

Genre:

Research Report > Experimental Study (0.68)
Research Report > Strength High (0.46)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Boosting Algorithms for Maximizing the Soft Margin

Neural Information Processing SystemsApr-6-2023, 14:51:29 GMT

We present a novel boosting algorithm, called SoftBoost, designed for sets of bi- nary labeled examples that are not necessarily separable by convex combinations of base hypotheses. Our algorithm achieves robustness by capping the distribu- tions on the examples. Our update of the distribution is motivated by minimizing a relative entropy subject to the capping constraints and constraints on the edges of the obtained base hypotheses. The capping constraints imply a soft margin in the dual optimization problem. Our algorithm produces a convex combination of hypotheses whose soft margin is within δ of its maximum.

algorithm, maximizing, soft margin, (7 more...)

Neural Information Processing Systems

Country: Europe > Germany (0.09)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.43)

Add feedback

Collaborating Authors

soft margin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

c94a589bdd47870b1d74b258d1ce3b33-Paper.pdf

2a095b46705d7e6f81fc50270fe770c2-Supplemental-Conference.pdf

Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage

f5e62af885293cf4d511ceef31e61c80-Paper.pdf

2a095b46705d7e6f81fc50270fe770c2-Supplemental-Conference.pdf

2a095b46705d7e6f81fc50270fe770c2-Paper-Conference.pdf

f5e62af885293cf4d511ceef31e61c80-Paper.pdf

Unfolding recurrence by Green's functions for optimized reservoir computing Sandra Nestler

Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage

Boosting Algorithms for Maximizing the Soft Margin