soft margin
- North America > United States (0.14)
- Europe > Germany > North Rhine-Westphalia > Cologne Region > Aachen (0.04)
- North America > Canada (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Research Report > Experimental Study (0.68)
- Research Report > Strength High (0.46)
- Research Report > Experimental Study (0.68)
- Research Report > Strength High (0.46)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Research Report > Experimental Study (0.68)
- Research Report > Strength High (0.46)
- Research Report > Experimental Study (0.68)
- Research Report > Strength High (0.46)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- North America > United States (0.14)
- Europe > Germany > North Rhine-Westphalia > Cologne Region > Aachen (0.04)
- North America > Canada (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage
Uehara, Masatoshi, Kallus, Nathan, Lee, Jason D., Sun, Wen
In offline reinforcement learning (RL) we have no opportunity to explore so we must make assumptions that the data is sufficient to guide picking a good policy, taking the form of assuming some coverage, realizability, Bellman completeness, and/or hard margin (gap). In this work we propose value-based algorithms for offline RL with PAC guarantees under just partial coverage, specifically, coverage of just a single comparator policy, and realizability of soft (entropy-regularized) Q-function of the single policy and a related function defined as a saddle point of certain minimax optimization problem. This offers refined and generally more lax conditions for offline RL. We further show an analogous result for vanilla Q-functions under a soft margin condition. To attain these guarantees, we leverage novel minimax learning algorithms to accurately estimate soft or vanilla Q-functions with $L^2$-convergence guarantees. Our algorithms' loss functions arise from casting the estimation problems as nonlinear convex optimization problems and Lagrangifying.
- Research Report > Experimental Study (0.68)
- Research Report > Strength High (0.46)
Boosting Algorithms for Maximizing the Soft Margin
We present a novel boosting algorithm, called SoftBoost, designed for sets of bi- nary labeled examples that are not necessarily separable by convex combinations of base hypotheses. Our algorithm achieves robustness by capping the distribu- tions on the examples. Our update of the distribution is motivated by minimizing a relative entropy subject to the capping constraints and constraints on the edges of the obtained base hypotheses. The capping constraints imply a soft margin in the dual optimization problem. Our algorithm produces a convex combination of hypotheses whose soft margin is within δ of its maximum.