BGeneraltrade-offs

Neural Information Processing Systems 

However, we make no serious efforts to find the optimal architecture. In fact, we use the same 13 architecture for allour experiments, across the scales. Webelievethe performance onaparticular task can be further improved by carefully curating the neural architecture.