Learning Robust Policies via Interpretable Hamilton-Jacobi Reachability-Guided Disturbances