Large Language Model
FREE: Feature Refinement for Generalized Zero-Shot Learning
Chen, Shiming, Wang, Wenjie, Xia, Beihao, Peng, Qinmu, You, Xinge, Zheng, Feng, Shao, Ling
Generalized zero-shot learning (GZSL) has achieved significant progress, with many efforts dedicated to overcoming the problems of visual-semantic domain gap and seen-unseen bias. However, most existing methods directly use feature extraction models trained on ImageNet alone, ignoring the cross-dataset bias between ImageNet and GZSL benchmarks. Such a bias inevitably results in poor-quality visual features for GZSL tasks, which potentially limits the recognition performance on both seen and unseen classes. In this paper, we propose a simple yet effective GZSL method, termed feature refinement for generalized zero-shot learning (FREE), to tackle the above problem. FREE employs a feature refinement (FR) module that incorporates \textit{semantic$\rightarrow$visual} mapping into a unified generative model to refine the visual features of seen and unseen class samples. Furthermore, we propose a self-adaptive margin center loss (SAMC-loss) that cooperates with a semantic cycle-consistency loss to guide FR to learn class- and semantically-relevant representations, and concatenate the features in FR to extract the fully refined features. Extensive experiments on five benchmark datasets demonstrate the significant performance gain of FREE over its baseline and current state-of-the-art methods. Our codes are available at https://github.com/shiming-chen/FREE .
Github copilot and the threat of developers losing their jobs.
Github has recently launched Github Copilot, which is an Artificial Intelligence tool that can suggest lines of code and even complete functions to the programmer to make it more efficient. I will not stop here to talk in detail about the tool. The news of the launch of GitHub copilot has reignited the controversy over the loss of jobs that Artificial Intelligence can cause, now for the programming community. Certainly, the tool's capabilities are impressive and it joins the many and spectacular achievements of narrow Artificial Intelligence, but can it replace a programmer? It has been almost 30 years since Frederick P. Brooks wrote "The Mythical Man-Month", a book of essays related to software engineering.
The Rise of the Transformers: Explaining the Tech Underlying GPT-3
The capabilities of GPT -3 has led to a debate between some as to whether or not GPT-3 and its underlying architecture will enable Artificial General Intelligence (AGI) in the future against those (many being from the school of logic and symbolic AI) who believe that without some form of logic there can be no AGI. The truth of the matter is that we don't know as we don't really fully understand the human brain. With science and engineering we work upon the basis of observation and testing. This section also addresses points raised by Esaú Flores. Gary Grossman in an article entitled Are we entering the AI Twilight Zone between AI and AGI? observed that in February 2020, Geoffrey Hinton, the University of Toronto professor who is a pioneer of Deep Learning, noted: "There are one trillion synapses in a cubic centimeter of the brain. If there is such a thing as general AI, [the system] would probably require one trillion synapses." The human brain has a huge number of synapses. Each of the 1011 (one hundred billion) neurons has on average 7,000 synaptic connections (synapses) to other neurons. It has been estimated that the brain of a three-year-old child has about 1015 synapses (1 quadrillion).
Artificial intelligence in structural biology is here to stay
"I didn't think we would get to this point in my lifetime." That's how one research leader in structural biology responded to last week's publication of research in which artificial intelligence (AI) was used to predict the structure of more than 20,000 human proteins, as well as that of nearly all the known proteins produced by 20 model organisms such as Escherichia coli, fruit flies and yeast, but also soya bean and Asian rice. That is a combined total of around 365,000 predictions1. The data, publicly accessible for the first time (see https://alphafold.ebi.ac.uk), were released online on 22 July by researchers at DeepMind, a London-based AI company owned by Google's parent company, Alphabet, and the European Bioinformatics Institute, based at the European Molecular Biology Laboratory (EBI-EMBL) near Cambridge, UK. DeepMind's AI predicts structures for a vast trove of proteins The DeepMind team developed a machine-learning tool called AlphaFold.
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing
Liu, Pengfei, Yuan, Weizhe, Fu, Jinlan, Jiang, Zhengbao, Hayashi, Hiroaki, Neubig, Graham
This paper surveys and organizes research works in a new paradigm in natural language processing, which we dub "prompt-based learning". Unlike traditional supervised learning, which trains a model to take in an input x and predict an output y as P(y|x), prompt-based learning is based on language models that model the probability of text directly. To use these models to perform prediction tasks, the original input x is modified using a template into a textual string prompt x' that has some unfilled slots, and then the language model is used to probabilistically fill the unfilled information to obtain a final string x, from which the final output y can be derived. This framework is powerful and attractive for a number of reasons: it allows the language model to be pre-trained on massive amounts of raw text, and by defining a new prompting function the model is able to perform few-shot or even zero-shot learning, adapting to new scenarios with few or no labeled data. In this paper we introduce the basics of this promising paradigm, describe a unified set of mathematical notations that can cover a wide variety of existing work, and organize existing work along several dimensions, e.g.the choice of pre-trained models, prompts, and tuning strategies. To make the field more accessible to interested beginners, we not only make a systematic review of existing works and a highly structured typology of prompt-based concepts, but also release other resources, e.g., a website http://pretrain.nlpedia.ai/ including constantly-updated survey, and paperlist.
Everything you need to know about Github Copilot
I was fortunate enough to be given early access to GitHub's new "AI pair programmer," Copilot, which generates quite a stir. My early ideas and experiences with this tool are shared in this blog post. It's made me shout "wow" a couple of times in the last few hours, which isn't something you'd expect from your developer tools! However, there are some real-world limits to this tool right now, which I'll go through in this article. In summary: Copilot appears out of nowhere, interrupting my flow.
Generally capable agents emerge from open-ended play
In recent years, artificial intelligence agents have succeeded in a range of complex game environments. For instance, AlphaZero beat world-champion programs in chess, shogi, and Go after starting out with knowing no more than the basic rules of how to play. But AlphaZero still trained separately on each game -- unable to simply learn another game or task without repeating the RL process from scratch. The same is true for other successes of RL, such as Atari, Capture the Flag, StarCraft II, Dota 2, and Hide-and-Seek. DeepMind's mission of solving intelligence to advance science and humanity led us to explore how we could overcome this limitation to create AI agents with more general and adaptive behaviour.
Artificial intelligence predicts the shapes of molecules to come
Now, any biochemist can speed their work in much the same way. On Thursday, DeepMind released the predicted shapes of more than 350,000 proteins -- the microscopic mechanisms that drive the behavior of bacteria, viruses, the human body and all other living things. This new database includes the 3D structures for all proteins expressed by the human genome, as well as those for proteins that appear in 20 other organisms, including the mouse, fruit fly and E. coli bacterium.
Core Challenges in Embodied Vision-Language Planning
Francis, Jonathan, Kitamura, Nariaki, Labelle, Felix, Lu, Xiaopeng, Navarro, Ingrid, Oh, Jean
Recent advances in the areas of multimodal machine learning and artificial intelligence (AI) have led to the development of challenging tasks at the intersection of Computer Vision, Natural Language Processing, and Embodied AI. Whereas many approaches and previous survey pursuits have characterised one or two of these dimensions, there has not been a holistic analysis at the center of all three. Moreover, even when combinations of these topics are considered, more focus is placed on describing, e.g., current architectural methods, as opposed to also illustrating high-level challenges and opportunities for the field. In this survey paper, we discuss Embodied Vision-Language Planning (EVLP) tasks, a family of prominent embodied navigation and manipulation problems that jointly use computer vision and natural language. We propose a taxonomy to unify these tasks and provide an in-depth analysis and comparison of the new and current algorithmic approaches, metrics, simulated environments, as well as the datasets used for EVLP tasks. Finally, we present the core challenges that we believe new EVLP works should seek to address, and we advocate for task construction that enables model generalizability and furthers real-world deployment.
Don't Sweep your Learning Rate under the Rug: A Closer Look at Cross-modal Transfer of Pretrained Transformers
Rothermel, Danielle, Li, Margaret, Rocktäschel, Tim, Foerster, Jakob
Self-supervised pre-training of large-scale transformer models on text corpora followed by finetuning has achieved state-of-the-art on a number of natural language processing tasks. Recently, Lu et al. (2021, arXiv:2103.05247) claimed that frozen pretrained transformers (FPTs) match or outperform training from scratch as well as unfrozen (fine-tuned) pretrained transformers in a set of transfer tasks to other modalities. In our work, we find that this result is, in fact, an artifact of not tuning the learning rates. After carefully redesigning the empirical setup, we find that when tuning learning rates properly, pretrained transformers do outperform or match training from scratch in all of our tasks, but only as long as the entire model is finetuned. Thus, while transfer from pretrained language models to other modalities does indeed provide gains and hints at exciting possibilities for future work, properly tuning hyperparameters is important for arriving at robust findings.