sim
e464656edca5e58850f8cec98cbb979b-Supplemental.pdf
To be consistent with accuracy definition, we denote the correctness ofstj for instance t as sim(stj,rt) = ( 2 distance(stj,rt))/ 2 where sim(stj,rt) is in the range [0,1] and distance(stj,rt) is in range [0, 2], 2 is the largest Euclidean distance in the probability simplex. Given a test dataset I, the correctness of a learner SLj on I can be denoted as 2 corrSLj = 1n Pn t=1sim(stj,rt). In this section, we define multiple metrics for consistency, accuracy, and correct-consistency in detail. Figure 1 shows the metrics computation in our experiments. We have created a git repository for this work and will be posted upon the acceptance and publicationofthiswork.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- North America > United States > California > Los Angeles County > Long Beach (0.14)
- Europe > Austria (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- (8 more...)
Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL Andrew Wagenmaker
Such direct sim2real transfer is not guaranteed to succeed, however, and in cases where it fails, it is unclear how to best utilize the simulator. In this work, we show that in many regimes, while direct sim2real transfer may fail, we can utilize the simulator to learn a set of exploratory policies which enable efficient exploration in the real world.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > California > Alameda County > Livermore (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.92)
- Information Technology (0.67)
- Leisure & Entertainment > Games > Computer Games (0.40)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
- Europe > Spain (0.04)
- Asia > India > Gujarat > Gandhinagar (0.04)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Germany > Berlin (0.04)
- Asia > Middle East > Jordan (0.04)
- Leisure & Entertainment (0.46)
- Education (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)
- (4 more...)
Appendix
Weheldoutavalidation setfromthetraining set,andusedthisvalidation settoselecttheL2 regularization hyperparameter,which weselected from 45logarithmically spaced values between 10 6 and 105, applied to the sum of the per-example losses. Because the optimization problem is convex, we used the previous weights as a warm start as we increased theL2 regularization hyperparameter. Wemeasured eithertop-1ormean per-class accuracy, depending on which was suggested by the dataset creators. A.3 Fine-tuning In our fine-tuning experiments in Table 2, we used standard ImageNet-style data augmentationand trained for 20,000 steps with SGD with momentum of0.9 and cosine annealing [ 20]without restarts. Each curve represents a different model.