HotEncodings
–Neural Information Processing Systems
Landmarks were distributed atdistance 200 from the origin, everyπ3 radians, starting at0. The listener particle had amassof0.1;itsforceactions Episodeslasted50timesteps. The 9-points environment used the same physics engine astriangle and episodes also lasted 50 timesteps. We found that additional time enabled more convergent behaviors: that is, agents hadtolearn toreach alocation andstopformaximal reward,asopposed totraveling atfull speed in a fixed direction. Whenexperimenting with environmental noise, allagents were trained using identical noise models and evaluated with zeronoise.
Neural Information Processing Systems
Feb-8-2026, 18:50:55 GMT
- Technology: