AITopics | dual formulation

968b15768f3d19770471e9436d97913c-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-9-2026, 10:34:17 GMT

complexity, experiment, reviewer, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.54)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.50)

Add feedback

Sample Complexity of Distributionally Robust Off-Dynamics Reinforcement Learning with Online Interaction

He, Yiting, Liu, Zhishuai, Wang, Weixin, Xu, Pan

arXiv.org Machine LearningNov-10-2025

Off-dynamics reinforcement learning (RL), where training and deployment transition dynamics are different, can be formulated as learning in a robust Markov decision process (RMDP) where uncertainties in transition dynamics are imposed. Existing literature mostly assumes access to generative models allowing arbitrary state-action queries or pre-collected datasets with a good state coverage of the deployment environment, bypassing the challenge of exploration. In this work, we study a more realistic and challenging setting where the agent is limited to online interaction with the training environment. To capture the intrinsic difficulty of exploration in online RMDPs, we introduce the supremal visitation ratio, a novel quantity that measures the mismatch between the training dynamics and the deployment dynamics. We show that if this ratio is unbounded, online learning becomes exponentially hard. We propose the first computationally efficient algorithm that achieves sublinear regret in online RMDPs with $f$-divergence based transition uncertainties. We also establish matching regret lower bounds, demonstrating that our algorithm achieves optimal dependence on both the supremal visitation ratio and the number of interaction episodes. Finally, we validate our theoretical results through comprehensive numerical experiments.

artificial intelligence, distributionally robust off-dynamic reinforcement learning, machine learning, (13 more...)

arXiv.org Machine Learning

2511.05396

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry: Education > Educational Setting > Online (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.62)

Add feedback

5ea1649a31336092c05438df996a3e59-AuthorFeedback.pdf

Neural Information Processing SystemsOct-2-2025, 20:08:16 GMT

algorithm, artificial intelligence, experiment, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.50)

Add feedback

Checklist 1. For all authors (a)

Neural Information Processing SystemsAug-17-2025, 06:53:26 GMT

Do the main claims made in the abstract and introduction accurately reflect the paper's If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Y es] (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? Did you include the total amount of compute and the type of resources used (e.g., type Did you include any new assets either in the supplemental material or as a URL? [N/A] Did you discuss whether and how consent was obtained from people whose data you're If you used crowdsourcing or conducted research with human subjects... (a) The full version of the Table 1 is given in Table 3. That is, the following relationships hold: 2g p uq " sup This formulation can be found in Lemma 3.1 of Jenatton et al. First we compute the gradient g p uq " ÿ A.7 Log Sum First, we compute the derivative g puq " log p? u ` ε q ùñ g " 0, (46) which gives the inverse mapping? However, it is separable, and in one dimension we have g p uq " null tu ą 0u . " au ě m uq, (69) where ConvpAq is the convex hull of the set A. Similarly define s S Running for 1000 epochs, for example, gets the fraction of nonzeros down to around 0.1, at a slight expense of accuracy.

artificial intelligence, machine learning, penalty, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

e22312179bf43e61576081a2f250f845-Paper.pdf

Neural Information Processing SystemsAug-16-2025, 23:01:18 GMT

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

e22312179bf43e61576081a2f250f845-AuthorFeedback.pdf

Neural Information Processing SystemsAug-16-2025, 23:01:06 GMT

We would like to thank the reviewers for their positive and helpful feedback. Typo: The result of Theorem 1 is on the expected norm of θ indeed, thank you for pointing this out. Derivations can be found in Appendix A. Y et, and although it leads to a related algorithm, our approach is different. We will make sure to insist on the points discussed in this rebuttal in a revised version of the paper. Fastest rates for stochastic mirror descent methods.

algorithm, artificial intelligence, machine learning, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.51)

Add feedback

968b15768f3d19770471e9436d97913c-Paper.pdf

Neural Information Processing SystemsAug-15-2025, 05:49:49 GMT

formulation, linear model, non-negative function, (16 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
North America > United States (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

968b15768f3d19770471e9436d97913c-AuthorFeedback.pdf

Neural Information Processing SystemsAug-15-2025, 05:36:28 GMT

complexity, experiment, reviewer, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.54)

Add feedback

Scalable and adaptive prediction bands with kernel sum-of-squares

Allain, Louis, da Veiga, Sébastien, Staber, Brian

arXiv.org Artificial IntelligenceMay-28-2025

Conformal Prediction (CP) is a popular framework for constructing prediction bands with valid coverage in finite samples, while being free of any distributional assumption. A well-known limitation of conformal prediction is the lack of adaptivity, although several works introduced practically efficient alternate procedures. In this work, we build upon recent ideas that rely on recasting the CP problem as a statistical learning problem, directly targeting coverage and adaptivity. This statistical learning problem is based on reproducible kernel Hilbert spaces (RKHS) and kernel sum-of-squares (SoS) methods. First, we extend previous results with a general representer theorem and exhibit the dual formulation of the learning problem. Crucially, such dual formulation can be solved efficiently by accelerated gradient methods with several hundreds or thousands of samples, unlike previous strategies based on off-the-shelf semidefinite programming algorithms. Second, we introduce a new hyperparameter tuning strategy tailored specifically to target adaptivity through bounds on test-conditional coverage. This strategy, based on the Hilbert-Schmidt Independence Criterion (HSIC), is introduced here to tune kernel lengthscales in our framework, but has broader applicability since it could be used in any CP algorithm where the score function is learned. Finally, extensive experiments are conducted to show how our method compares to related work. All figures can be reproduced with the accompanying code.

artificial intelligence, machine learning, prediction band, (18 more...)

arXiv.org Artificial Intelligence

2505.21039

Country: Europe (0.28)

Genre: Research Report (1.00)

Industry: Education > Focused Education > Special Education (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Nested Stochastic Gradient Descent for (Generalized) Sinkhorn Distance-Regularized Distributionally Robust Optimization

Yang, Yufeng, Zhou, Yi, Lu, Zhaosong

arXiv.org Machine LearningMar-28-2025

Distributionally robust optimization (DRO) is a powerful technique to train robust models against data distribution shift. This paper aims to solve regularized nonconvex DRO problems, where the uncertainty set is modeled by a so-called generalized Sinkhorn distance and the loss function is nonconvex and possibly unbounded. Such a distance allows to model uncertainty of distributions with different probability supports and divergence functions. For this class of regularized DRO problems, we derive a novel dual formulation taking the form of nested stochastic programming, where the dual variable depends on the data sample. To solve the dual problem, we provide theoretical evidence to design a nested stochastic gradient descent (SGD) algorithm, which leverages stochastic approximation to estimate the nested stochastic gradients. We study the convergence rate of nested SGD and establish polynomial iteration and sample complexities that are independent of the data size and parameter dimension, indicating its potential for solving large-scale DRO problems. We conduct numerical experiments to demonstrate the efficiency and robustness of the proposed algorithm.

artificial intelligence, machine learning, sinkhorn dro, (17 more...)

arXiv.org Machine Learning

2503.22923

Country:

North America > United States > Texas > Brazos County > College Station (0.14)
North America > Canada > Ontario > Toronto (0.14)
North America > United States > Minnesota (0.04)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Filters

Collaborating Authors

dual formulation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

968b15768f3d19770471e9436d97913c-AuthorFeedback.pdf

Sample Complexity of Distributionally Robust Off-Dynamics Reinforcement Learning with Online Interaction

5ea1649a31336092c05438df996a3e59-AuthorFeedback.pdf

Checklist 1. For all authors (a)

e22312179bf43e61576081a2f250f845-Paper.pdf

e22312179bf43e61576081a2f250f845-AuthorFeedback.pdf

968b15768f3d19770471e9436d97913c-Paper.pdf

968b15768f3d19770471e9436d97913c-AuthorFeedback.pdf

Scalable and adaptive prediction bands with kernel sum-of-squares

Nested Stochastic Gradient Descent for (Generalized) Sinkhorn Distance-Regularized Distributionally Robust Optimization