AITopics | optimization epoch

Collaborating Authors

optimization epoch

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

430894999584d0bd358611e2ecf00b15-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 14:35:32 GMT

calculation, doi, mrci-f12, (14 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > Wales > Cardiff (0.04)

Genre: Research Report (0.48)

Industry: Materials > Chemicals (0.32)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

A Mathematical Details

Neural Information Processing SystemsAug-17-2025, 23:51:18 GMT

In Section 3.1, the difference between the performance of two joint policies is expressed as follows: In Section 3.1, we claim that We represent the policy using its parameter, i.e. From Proposition 4.7 in (Levin and Peres, 2017), if we have two distributions Then, the following can be derived using Eq. Now we provide a detailed proof. Section 3.2 mentions that there exists a risk of high variance in estimating the policy gradient when Now we use mathematical induction to prove the fact. In Section 3.3, the difference between CoPPO and MAPPO is simplified to the difference between Similar to Appendix A.5, the decentralized policies can be viewed independently, thus The details of our CoPPO algorithm are given in Algorithm 1.

agent, artificial intelligence, section 3, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

430894999584d0bd358611e2ecf00b15-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-14-2025, 11:14:22 GMT

calculation, doi, mrci-f12, (14 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > Wales > Cardiff (0.04)

Genre: Research Report (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

You May Not Need Ratio Clipping in PPO

Sun, Mingfei, Kurin, Vitaly, Liu, Guoqing, Devlin, Sam, Qin, Tao, Hofmann, Katja, Whiteson, Shimon

arXiv.org Artificial IntelligenceJan-31-2022

Proximal Policy Optimization (PPO) methods learn a policy by iteratively performing multiple mini-batch optimization epochs of a surrogate objective with one set of sampled data. Ratio clipping PPO is a popular variant that clips the probability ratios between the target policy and the policy used to collect samples. Ratio clipping yields a pessimistic estimate of the original surrogate objective, and has been shown to be crucial for strong performance. We show in this paper that such ratio clipping may not be a good option as it can fail to effectively bound the ratios. Instead, one can directly optimize the original surrogate objective for multiple epochs; the key is to find a proper condition to early stop the optimization epoch in each iteration. Our theoretical analysis sheds light on how to determine when to stop the optimization epoch, and call the resulting algorithm Early Stopping Policy Optimization (ESPO). We compare ESPO with PPO across many continuous control tasks and show that ESPO significantly outperforms PPO. Furthermore, we show that ESPO can be easily scaled up to distributed training with many workers, delivering strong performance as well.

espo, optimization epoch, ppo, (11 more...)

arXiv.org Artificial Intelligence

2202.00079

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Post-Hoc Domain Adaptation via Guided Data Homogenization

Willis, Kurt, Oala, Luis

arXiv.org Artificial IntelligenceApr-8-2021

Addressing shifts in data distributions is an important prerequisite for the deployment of deep learning models to real-world settings. A general approach to this problem involves the adjustment of models to a new domain through transfer learning. However, in many cases, this is not applicable in a post-hoc manner to deployed models and further parameter adjustments jeopardize safety certifications that were established beforehand. In such a context, we propose to deal with changes in the data distribution via guided data homogenization which shifts the burden of adaptation from the model to the data. This approach makes use of information about the training data contained implicitly in the deep learning model to learn a domain transfer function. This allows for a targeted deployment of models to unknown scenarios without changing the model itself. We demonstrate the potential of data homogenization through experiments on the CIFAR-10 and MNIST data sets.

conference paper, iclr 2021, representation, (16 more...)

arXiv.org Artificial Intelligence

2104.03624

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Germany > Berlin (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms

Zhang, Shangtong, Laroche, Romain, van Seijen, Harm, Whiteson, Shimon, Combes, Remi Tachet des

arXiv.org Artificial IntelligenceOct-2-2020

We investigate the discounting mismatch in actor-critic algorithm implementations from a representation learning perspective. Theoretically, actor-critic algorithms usually have discounting for both actor and critic, i.e., there is a $\gamma^t$ term in the actor update for the transition observed at time $t$ in a trajectory and the critic is a discounted value function. Practitioners, however, usually ignore the discounting ($\gamma^t$) for the actor while using a discounted critic. We investigate this mismatch in two scenarios. In the first scenario, we consider optimizing an undiscounted objective $(\gamma = 1)$ where $\gamma^t$ disappears naturally $(1^t = 1)$. We then propose to interpret the discounting in critic in terms of a bias-variance-representation trade-off and provide supporting empirical results. In the second scenario, we consider optimizing a discounted objective ($\gamma < 1$) and propose to interpret the omission of the discounting in the actor update from an auxiliary task perspective and provide supporting empirical results.

arxiv preprint arxiv, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2010.01069

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback