The Download: the mystery of LLMs, and the EU's Big Tech crackdown
Two years ago, Yuri Burda and Harri Edwards, researchers at OpenAI, were trying to find out what it would take to get a large language model to do basic arithmetic. The models memorized the sums they saw but failed to solve new ones. By accident, Burda and Edwards left some of their experiments running for days rather than hours. The models were shown the example sums over and over again, and eventually they learned to add two numbers--it had just taken a lot more time than anybody thought it should. In certain cases, models could seemingly fail to learn a task and then all of a sudden just get it, as if a lightbulb had switched on, a behavior the researchers called grokking.
Mar-4-2024, 13:10:00 GMT