Gradient Based Data Subset Selection for Compute-Efficient Hyper-parameter Tuning
–Neural Information Processing Systems
If we perform 1000 training runs (which is not uncommon today) naively using grid search for hyper-parameter tuning, it will take 4000 GPU hours.
Neural Information Processing Systems
Aug-18-2025, 04:33:59 GMT
- Country:
- Asia
- India (0.04)
- Middle East > Qatar
- Europe
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Italy > Tuscany
- Florence (0.04)
- Spain > Andalusia
- Cádiz Province > Cadiz (0.04)
- Belgium > Brussels-Capital Region
- North America
- Canada > Ontario
- Toronto (0.14)
- United States
- California > Santa Clara County
- Palo Alto (0.04)
- Georgia > Fulton County
- Atlanta (0.04)
- Texas (0.04)
- Washington > King County
- Seattle (0.04)
- California > Santa Clara County
- Canada > Ontario
- Asia
- Genre:
- Research Report (0.93)
- Technology: