Note: The entire model is trained in a purely supervised fashion as opposed to any form of reinforcement learning. The first question you may ask is how the model takes different types of inputs like tabular data, images, sound, audio, video, etc. The answer to this is that everything is first converted to the same format, i.e. After converting data into tokens, they use the following canonical sequence ordering. The goal here is to put everything in the same format with a particular ordering depending upon the task.
May-22-2022, 20:04:58 GMT