Goto

Collaborating Authors

 hoof



Fast Efficient Hyperparameter Tuning for Policy Gradient Methods

Neural Information Processing Systems

The performance of policy gradient methods is sensitive to hyperparameter settings that must be tuned for any new application. Widely used grid search methods for tuning hyperparameters are sample inefficient and computationally expensive. More advanced methods like Population Based Training that learn optimal schedules for hyperparameters instead of fixed settings can yield better results, but are also sample inefficient and computationally expensive. In this paper, we propose Hyperparameter Optimisation on the Fly (HOOF), a gradient-free algorithm that requires no more than one training run to automatically adapt the hyperparameter that affect the policy update directly through the gradient. The main idea is to use existing trajectories sampled by the policy gradient method to optimise a one-step improvement objective, yielding a sample and computationally efficient algorithm that is easy to implement. Our experimental results across multiple domains and algorithms show that using HOOF to learn these hyperparameter schedules leads to faster learning with improved performance.


Dinosaur 'mummies' prove some dinos had hooves

Popular Science

Science Dinosaurs Dinosaur'mummies' prove some dinos had hooves'Edmontosaurus annectens' stormed around North America during the Late Cretaceous. Breakthroughs, discoveries, and DIY tips sent every weekday. For the first time, paleontologists have successfully reconstructed the profiles of two massive, duck-billed dinosaurs, right down to their pebbled skin and unexpected hooves. Based in part on remains recovered decades ago in the badlands of Wyoming, the pair of specimens were preserved only thanks to an extremely rare, delicate "mummification" process. At around 39 feet long and weighing about 6.2 tons, was one of the largest and most common dinosaurs in present day North America during the Late Cretaceous period.



743c41a921516b04afde48bb48e28ce6-AuthorFeedback.pdf

Neural Information Processing Systems

HOOF is robust to settings within this range. We could not present results for Ant and Walker due to space constraints. Thus we are restricted to zero order optimisers. For natural gradients like TNPG, HOOF does not add any new hyperparameters beyond those used by grid search - i.e. Other methods like PBT introduce more hyperparameters than these.


Fast Efficient Hyperparameter Tuning for Policy Gradient Methods

Neural Information Processing Systems

The performance of policy gradient methods is sensitive to hyperparameter settings that must be tuned for any new application. Widely used grid search methods for tuning hyperparameters are sample inefficient and computationally expensive. More advanced methods like Population Based Training that learn optimal schedules for hyperparameters instead of fixed settings can yield better results, but are also sample inefficient and computationally expensive. In this paper, we propose Hyperparameter Optimisation on the Fly (HOOF), a gradient-free algorithm that requires no more than one training run to automatically adapt the hyperparameter that affect the policy update directly through the gradient. The main idea is to use existing trajectories sampled by the policy gradient method to optimise a one-step improvement objective, yielding a sample and computationally efficient algorithm that is easy to implement.


As a Data Scientist, You Should Know About a Clever Horse Called Hans

#artificialintelligence

In 1904, math teacher Wilhelm von Osten presented his horse called Hans to the public in Berlin, Germany. Von Osten claimed that Hans was smart enough to answer complex questions. For example, Hans could read the time. He could also identify the composers of music and the painters of famous paintings. Additionally, Hans could solve math problems. As shown in Figure 1, Hans did this by tapping a certain amount of times with his front hoof on a footstep.


How Do You Make a Robot Walk on Mars? It's a Steep Challenge

WIRED

From the Sojourner rover, which landed on Mars in 1997, to Perseverance, which touched down in February, the robots of the Red Planet share a defining feature: wheels. Rolling is far more stable and energy efficient than walking, which even robots on Earth still struggle to master. After all, NASA would hate for its very expensive Martian explorer to topple over and flail around like a turtle on its back. The problem with wheels, though, is that they limit where rovers can go: To explore complicated Martian terra like steep hills, you need the kinds of legs that evolution gave animals on Earth. So a team of scientists from ETH Zurich in Switzerland and the Max Planck Institute for Solar System Research in Germany have been playing around with a small quadrupedal robot called SpaceBok, designed to mimic an antelope known as a springbok.


Fast Efficient Hyperparameter Tuning for Policy Gradient Methods

Paul, Supratik, Kurin, Vitaly, Whiteson, Shimon

Neural Information Processing Systems

The performance of policy gradient methods is sensitive to hyperparameter settings that must be tuned for any new application. Widely used grid search methods for tuning hyperparameters are sample inefficient and computationally expensive. More advanced methods like Population Based Training that learn optimal schedules for hyperparameters instead of fixed settings can yield better results, but are also sample inefficient and computationally expensive. In this paper, we propose Hyperparameter Optimisation on the Fly (HOOF), a gradient-free algorithm that requires no more than one training run to automatically adapt the hyperparameter that affect the policy update directly through the gradient. The main idea is to use existing trajectories sampled by the policy gradient method to optimise a one-step improvement objective, yielding a sample and computationally efficient algorithm that is easy to implement.


Fast Efficient Hyperparameter Tuning for Policy Gradients

Paul, Supratik, Kurin, Vitaly, Whiteson, Shimon

arXiv.org Machine Learning

The performance of policy gradient methods is sensitive to hyperparameter settings that must be tuned for any new application. Widely used grid search methods for tuning hyperparameters are sample inefficient and computationally expensive. More advanced methods like Population Based Training (Jaderberg et al., 2017) that learn optimal schedules for hyperparameters instead of fixed settings canyield better results, but are also sample inefficient and computationally expensive. In this paper, we propose Hyperparameter Optimisation on the Fly (HOOF), a gradient-free meta-learning algorithm that can automatically learn an optimal schedule for hyperparameters that affect the policy updatedirectly through the gradient. The main idea is to use existing trajectories sampled by the policy gradient method to optimise a one-step improvement objective,yielding a sample and computationally efficientalgorithm that is easy to implement. Our experimental results across multiple domains and algorithms show that using HOOF to learn these hyperparameter schedules leads to faster learning with improved performance.