Large Language Model
Why ChatGPT is not a threat to Google Search – TechTalks
Since OpenAI released ChatGPT, there has been a lot of speculation about what its killer app will be. And perhaps topping the list is online search. According to The New York Times, Google's management has declared a "code red" and is scrambling to protect its online search monopoly against the disruption that ChatGPT will bring. ChatGPT is a wonderful technology, one that has a great chance of redefining the way we create and interact with digital information. It can have many interesting applications, including for online search.
[2301.00704] Muse: Text-To-Image Generation via Masked Generative Transformers
We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models. Muse is trained on a masked modeling task in discrete token space: given the text embedding extracted from a pre-trained large language model (LLM), Muse is trained to predict randomly masked image tokens. Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding. The use of a pre-trained LLM enables fine-grained language understanding, translating to high-fidelity image generation and the understanding of visual concepts such as objects, their spatial relationships, pose, cardinality etc. Our 900M parameter model achieves a new SOTA on CC3M, with an FID score of 6.06. The Muse 3B parameter model achieves an FID of 7.88 on zero-shot COCO evaluation, along with a CLIP score of 0.32. Muse also directly enables a number of image editing applications without the need to fine-tune or invert the model: inpainting, outpainting, and mask-free editing. More results are available at https://muse-model.github.io
Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task
Benefiting from large-scale datasets and pre-trained models, the field of generative models has recently gained significant momentum. However, most datasets for symbolic music are very small, which potentially limits the performance of data-driven multimodal models. An intuitive solution to this problem is to leverage pre-trained models from other modalities (e.g., natural language) to improve the performance of symbolic music-related multimodal tasks. In this paper, we carry out the first study of generating complete and semantically consistent symbolic music scores from text descriptions, and explore the efficacy of using publicly available checkpoints (i.e., BERT, GPT-2, and BART) for natural language processing in the task of text-to-music generation. Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity. We analyse the capabilities and limitations of our model to better understand the potential of language-music models.
Language Models are Drummers: Drum Composition with Natural Language Pre-Training
Zhang, Li, Callison-Burch, Chris
Automatic music generation with artificial intelligence typically requires a large amount of data which is hard to obtain for many less common genres and musical instruments. To tackle this issue, we present ongoing work and preliminary findings on the possibility for deep models to transfer knowledge from language to music, by finetuning large language models pre-trained on a massive text corpus on only hundreds of MIDI files of drum performances. We show that by doing so, one of the largest, state-of-the-art models (GPT3) is capable of generating reasonable drum grooves, while models that are not pre-trained (Transformer) shows no such ability beyond naive repetition. Evaluating generated music is a challenging task, more so is evaluating drum grooves with little precedence in literature. Hence, we propose a tailored structural evaluation method and analyze drum grooves produced by GPT3 compared to those played by human professionals, exposing the strengths and weaknesses of such generation by language-to-music transfer. Our findings suggest that language-to-music transfer learning with large language models is viable and promising.
A Survey On Few-shot Knowledge Graph Completion with Structural and Commonsense Knowledge
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
Vocabulary-informed Zero-shot and Open-set Learning
Fu, Yanwei, Wang, Xiaomei, Dong, Hanze, Jiang, Yu-Gang, Wang, Meng, Xue, Xiangyang, Sigal, Leonid
Despite significant progress in object categorization, in recent years, a number of important challenges remain; mainly, the ability to learn from limited labeled data and to recognize object classes within large, potentially open, set of labels. Zero-shot learning is one way of addressing these challenges, but it has only been shown to work with limited sized class vocabularies and typically requires separation between supervised and unsupervised classes, allowing former to inform the latter but not vice versa. We propose the notion of vocabulary-informed learning to alleviate the above mentioned challenges and address problems of supervised, zero-shot, generalized zero-shot and open set recognition using a unified framework. Specifically, we propose a weighted maximum margin framework for semantic manifold-based recognition that incorporates distance constraints from (both supervised and unsupervised) vocabulary atoms. Distance constraints ensure that labeled samples are projected closer to their correct prototypes, in the embedding space, than to others. We illustrate that resulting model shows improvements in supervised, zero-shot, generalized zero-shot, and large open set recognition, with up to 310K class vocabulary on Animal with Attributes and ImageNet datasets.
Google's End Is Near Because of ChatGPT?
Currently, using ChatGPT is free and as AI is somewhat expensive there may be a fee for using it in the future. When I tried asking questions to ChatGPT I was blown away by its answers. There is not the slightest chance that Google can answer in such a human manner which we have all needed for decades. I asked tons of questions ChatGPT and it answered all of them pretty easily. I started asking illogical questions to ChatGPT to see if it works like a normal chatbot and would give me some old "can't be found" response but ChatGPT worked like a charm answering my stupid questions. My first question was "Will We Survive in 2023?"
10 Ways You Can Use ChatGPT for Your Content Marketing
As a solo or small business owner, you know how important content marketing is for your business. But coming up with ideas, writing and editing content, and promoting it can be time-consuming and overwhelming. That's where ChatGPT comes in. This powerful artificial intelligence (AI) tool can help you save time and effort on your content marketing by generating ideas, writing and editing content, and promoting it. In this blog post, you'll learn ten ways to use ChatGPT for your content marketing.
2023 could be the year for large language models - Jack Of All Techs
Check out all the on-demand sessions from the Intelligent Security Summit here. The launch of OpenAI's ChatGPT has the world abuzz about the advanced capabilities of artificial intelligence (AI). How will it transform industries? What does it mean for Google Search? These are just a small sampling of the questions many have been asking about the possibilities.