Inferring Functionality of Attention Heads from their Parameters

Open in new window