Do Language Models Use Their Depth Efficiently?

Open in new window