Visualising the loss landscape
When plotting and monitoring an architecture's loss function, we are looking at the loss landscape through a toilet paper tube. On the y-axis is the loss function and on the x the epochs. We have only a one-dimensional view of the loss function's space, and that, too, for a small range of gradients of the parameters. What if we could see, say, the 175-bn-dimensional loss space for GPT on a range of gradients of those billions of parameters? Well, let's not kid ourselves.
Sep-28-2021, 18:36:21 GMT
- Technology: