Achieving Statistical Optimality of Federated Learning: Beyond Stationary Points