GPT-3, the latest state-of-the-art in Deep Learning, achieved incredible results in a range of language tasks without additional training. The main difference between this model and its predecessor was in terms of size. GPT-3 was trained on hundreds of billions of words -- nearly the whole Internet -- yielding a wildly compute-heavy, 175 billion parameter model. OpenAI's authors note that we can't scale models forever: "A more fundamental limitation of the general approach described in this paper -- scaling up any LM-like model, whether autoregressive or bidirectional -- is that it may eventually run into (or could already be running into) the limits of the pretraining objective." This is the law of diminishing returns in action.
Aug-1-2020, 13:55:09 GMT