Fake Runs, Real Fixes -- Analyzing xPU Performance Through Simulation