optimistic
Bigger, Regularized, Optimistic: scaling for compute and sample efficient continuous control
Sample efficiency in Reinforcement Learning (RL) has traditionally been driven by algorithmic enhancements. In this work, we demonstrate that scaling can also lead to substantial improvements. We conduct a thorough investigation into the interplay of scaling model capacity and domain-specific RL enhancements. These empirical findings inform the design choices underlying our proposed BRO (Bigger, Regularized, Optimistic) algorithm. The key innovation behind BRO is that strong regularization allows for effective scaling of the critic networks, which, paired with optimistic exploration, leads to superior performance.
Tesla's Humanoid Robot 'Optimus' Looks Far Less Optimistic
In September this year, Tesla presented a glimpse of their general-purpose humanoid robot'Optimus'. Unfortunately, as many humorously remarked, the launch was "suboptimal". The demo couldn't do much to excite the audience for what's coming next. However, many roboticists and experts in the field believe that if sufficient resources are thrown into it, the vision of an "all-purpose" robot is practical. I am aware of critics who say that the prototype had nothing new that they haven't seen elsewhere, and that there are other more impressive humanoids. There are also people who have doubts on the aggressive timeline Elon had proposed, and I do not necessarily disagree with them.
Decision Making with Dynamic Uncertain Events
Kalech, Meir, Reches, Shulamit
When to make a decision is a key question in decision making problems characterized by uncertainty. In this paper we deal with decision making in environments where information arrives dynamically. We address the tradeoff between waiting and stopping strategies. On the one hand, waiting to obtain more information reduces uncertainty, but it comes with a cost. Stopping and making a decision based on an expected utility reduces the cost of waiting, but the decision is based on uncertain information. We propose an optimal algorithm and two approximation algorithms. We prove that one approximation is optimistic - waits at least as long as the optimal algorithm, while the other is pessimistic - stops not later than the optimal algorithm. We evaluate our algorithms theoretically and empirically and show that the quality of the decision in both approximations is near-optimal and much faster than the optimal algorithm. Also, we can conclude from the experiments that the cost function is a key factor to chose the most effective algorithm.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- (2 more...)