The capabilities of GPT -3 has led to a debate between some as to whether or not GPT-3 and its underlying architecture will enable Artificial General Intelligence (AGI) in the future against those (many being from the school of logic and symbolic AI) who believe that without some form of logic there can be no AGI. The truth of the matter is that we don't know as we don't really fully understand the human brain. With science and engineering we work upon the basis of observation and testing. This section also addresses points raised by Esaú Flores. Gary Grossman in an article entitled Are we entering the AI Twilight Zone between AI and AGI? observed that in February 2020, Geoffrey Hinton, the University of Toronto professor who is a pioneer of Deep Learning, noted: "There are one trillion synapses in a cubic centimeter of the brain. If there is such a thing as general AI, [the system] would probably require one trillion synapses." The human brain has a huge number of synapses. Each of the 1011 (one hundred billion) neurons has on average 7,000 synaptic connections (synapses) to other neurons. It has been estimated that the brain of a three-year-old child has about 1015 synapses (1 quadrillion).
Transformer models have become the go-to model in most of the NLP tasks. Many transformer-based models like BERT, ROBERTa, GPT series, etc are considered as the state-of-the-art models in NLP. While NLP is overwhelming with all these models, Transformers are gaining popularity in Computer vision also. Transformers are now used for recognizing and constructing images, image encoding, and many more. While transformer models are taking over the AI field, it is also important to have a low-level understanding of these models.
This article is part of Demystifying AI, a series of posts that (try to) disambiguate the jargon and myths surrounding AI. In recent years, the transformer model has become one of the main highlights of advances in deep learning and deep neural networks. It is mainly used for advanced applications in natural language processing. Google is using it to enhance its search engine results. OpenAI has used transformers to create its famous GPT-2 and GPT-3 models.
This is arguably the most important architecture for natural language processing (NLP) today. Specifically, we look at modeling frameworks such as the generative pretrained transformer (GPT), bidirectional encoder representations from transformers (BERT) and multilingual BERT (mBERT). These methods employ neural networks with more parameters than most deep convolutional and recurrent neural network models. Despite the larger size, they've exploded in popularity because they scale comparatively more effectively on parallel computing architecture. This enables even larger and more sophisticated models to be developed in practice. Until the arrival of the transformer, the dominant NLP models relied on recurrent and convolutional components. Additionally, the best sequence modeling and transduction problems, such as machine translation, rely on an encoder-decoder architecture with an attention mechanism to detect which parts of the input influence each part of the output. The transformer aims to replace the recurrent and convolutional components entirely with attention.
Day by day the number of machine learning models is increasing at a pace. With this increasing rate, it is hard for beginners to choose an effective model to perform Natural Language Understanding (NLU) and Natural Language Generation (NLG) mechanisms. Researchers across the globe are working around the clock to achieve more progress in artificial intelligence to build agile and intuitive sequence-to-sequence learning models. And in recent times transformers are one such model which gained more prominence in the field of machine learning to perform speech-to-text activities. The wide availability of other sequence-to-sequence learning models like RNNs, LSTMs, and GRU always raises a challenge for beginners when they think about transformers.