The introduction of transfer learning and pretrained language models in natural language processing (NLP) pushed forward the limits of language understanding and generation. Transfer learning and applying transformers to different downstream NLP tasks have become the main trend of the latest research advances. At the same time, there is a controversy in the NLP community regarding the research value of the huge pretrained language models occupying the leaderboards. While lots of AI experts agree with Anna Rogers's statement that getting state-of-the-art results just by using more data and computing power is not research news, other NLP opinion leaders point out some positive moments in the current trend, like, for example, the possibility of seeing the fundamental limitations of the current paradigm. Anyway, the latest improvements in NLP language models seem to be driven not only by the massive boosts in computing capacity but also by the discovery of ingenious ways to lighten models while maintaining high performance.
AI systems that understand and generate text, known as language models, are the hot new thing in the enterprise. A recent survey found that 60% of tech leaders said that their budgets for AI language technologies increased by at least 10% in 2020 while 33% reported a 30% increase. But not all language models are created equal. Several types are emerging as dominant, including large, general-purpose models like OpenAI's GPT-3 and models fine-tuned for particular tasks (think answering IT desk questions). At the edge exists a third category of model -- one that tends to be highly compressed in size and limited to few capabilities, designed specifically to run on Internet of Things devices and workstations.
Large language models are computer programs that can analyze and create text. They are trained using massive amounts of text data, which helps them become better at tasks like generating text. Language models are the foundation for many natural language processing (NLP) activities, like speech-to-text and sentiment analysis. These models can look at a text and predict the next word. Examples of LLMs include ChatGPT, LaMDA, PaLM, etc. Parameters in LLMs help the model to understand relationships in the text, which helps them to predict the likelihood of word sequences.
Swiss data services company Unit8 highlights the key analytics trends that we will see accelerating in 2022 in its "Advanced Analytic Trends Report". The report compiles feedback of industry leaders from Merck, Credit Suisse, and Swiss Re, on using mega models in top-tier companies. Mega models (e.g GPT-3, Wu Dao 2.0, etc.) show impressive performance yet are extremely costly to train. Only a few companies are able to compete in this space, nonetheless, the availability of these mega models opens the possibilities to new applications. There is still a major challenge around quality control before these are broadly adopted in a business environment but they already assist developers in writing snippets of code.
I have been spending a significant amount of time learning AI-related topics. ChatGPT has been my companion and mentor during this time. I asked ChatGPT to summarize each of the top 5 AI papers published in the year 2020, ordered by the most number of citations. I thought these summaries will be useful for others and hence am sharing them. The 2020 paper "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" proposed a new approach for image recognition using a variant of the transformer neural network architecture.