On the Importance of Attention in Meta-Learning for Few-Shot Text Classification Machine Learning

Current deep learning based text classification methods are limited by their ability to achieve fast learning and generalization when the data is scarce. We address this problem by integrating a meta-learning procedure that uses the knowledge learned across many tasks as an inductive bias towards better natural language understanding. Based on the Model-Agnostic Meta-Learning framework (MAML), we introduce the Attentive Task-Agnostic Meta-Learning (ATAML) algorithm for text classification. The essential difference between MAML and ATAML is in the separation of task-agnostic representation learning and task-specific attentive adaptation. The proposed ATAML is designed to encourage task-agnostic representation learning by way of task-agnostic parameterization and facilitate task-specific adaptation via attention mechanisms. We provide evidence to show that the attention mechanism in ATAML has a synergistic effect on learning performance. In comparisons with models trained from random initialization, pretrained models and meta trained MAML, our proposed ATAML method generalizes better on single-label and multi-label classification tasks in miniRCV1 and miniReuters-21578 datasets.

Reviving and Improving Recurrent Back-Propagation Machine Learning

In this paper, we revisit the recurrent back-propagation (RBP) algorithm, discuss the conditions under which it applies as well as how to satisfy them in deep neural networks. We show that RBP can be unstable and propose two variants based on conjugate gradient on the normal equations (CG-RBP) and Neumann series (Neumann-RBP). We further investigate the relationship between Neumann-RBP and back propagation through time (BPTT) and its truncated version (TBPTT). Our Neumann-RBP has the same time complexity as TBPTT but only requires constant memory, whereas TBPTT's memory cost scales linearly with the number of truncation steps. We examine all RBP variants along with BPTT and TBPTT in three different application domains: associative memory with continuous Hopfield networks, document classification in citation networks using graph neural networks and hyperparameter optimization for fully connected networks. All experiments demonstrate that RBPs, especially the Neumann-RBP variant, are efficient and effective for optimizing convergent recurrent neural networks.

Meta Reinforcement Learning with Latent Variable Gaussian Processes Machine Learning

Data efficiency, i.e., learning from small data sets, is critical in many practical applications where data collection is time consuming or expensive, e.g., robotics, animal experiments or drug design. Meta learning is one way to increase the data efficiency of learning algorithms by generalizing learned concepts from a set of training tasks to unseen, but related, tasks. Often, this relationship between tasks is hard coded or relies in some other way on human expertise. In this paper, we propose to automatically learn the relationship between tasks using a latent variable model. Our approach finds a variational posterior over tasks and averages over all plausible (according to this posterior) tasks when making predictions. We apply this framework within a model-based reinforcement learning setting for learning dynamics models and controllers of many related tasks. We apply our framework in a model-based reinforcement learning setting, and show that our model effectively generalizes to novel tasks, and that it reduces the average interaction time needed to solve tasks by up to 60% compared to strong baselines.

AI Is Changing Our Brains – argodesign – Medium


In 1976, philosopher Julian Jaynes issued the provocative theory that recent ancestors lacked self-awareness. Instead, they mistook their inner voices for outside sources–the voice of God, say, or the ghosts of their ancestors. Jaynes called his theory "bicameralism" (Westworld fans will recall an episode from the last season called "The Bicameral Mind") and, in his telling, it persisted in early humans until about 3,000 years ago.

Meta Multi-Task Learning for Sequence Modeling

AAAI Conferences

Semantic composition functions have been playing a pivotal role in neural representation learning of text sequences. In spite of their success, most existing models suffer from the underfitting problem: they use the same shared compositional function on all the positions in the sequence, thereby lacking expressive power due to incapacity to capture the richness of compositionality. Besides, the composition functions of different tasks are independent and learned from scratch. In this paper, we propose a new sharing scheme of composition function across multiple tasks. Specifically, we use a shared meta-network to capture the meta-knowledge of semantic composition and generate the parameters of the task-specific semantic composition models. We conduct extensive experiments on two types of tasks, text classification and sequence tagging, which demonstrate the benefits of our approach. Besides, we show that the shared meta-knowledge learned by our proposed model can be regarded as off-the-shelf knowledge and easily transferred to new tasks.

META: A Unifying Framework for the Management and Analysis of Text Data


Recent years have seen a dramatic growth of natural language text data, including web pages, news articles, scientific literature, emails, enterprise documents, and social media such as blog articles, forum posts, product reviews, and tweets. This has led to an increasing demand for powerful software tools to help people manage and analyze vast amount of text data effectively and efficiently. Unlike data generated by a computer system or sensors, text data are usually generated directly by humans for humans.

Storage will continue to play a role in the advancement of AI: Pure Storage


Storage is an important component underpinning artificial intelligence (AI) and other emerging technologies with similar infrastructure demands, according to Robert Lee, VP and chief architect at Pure Storage, and therefore needs to be included in discussions about such technologies. Lee told ZDNet that significant advancements in technology -- particularly around parallelisation, compute, and networking -- enable new algorithms to apply more compute power against data. "Historically, the limit to how much data has been able to be processed, the limit to how much insight we've been able to garner from data has been bottlenecked by storage's ability to keep the compute fed," said Lee, who previously worked at Oracle before joining Pure Storage in 2013. "Somewhere around the early 2000s, the hardware part of compute, CPUs started getting more parallel. It started doing multi-socket architectures, hyper threading multi-core.

AI Is Changing Our Brains Co.Design


We are in a similar pre-conscious state now, but the voice we hear is not the other side of our brains. It's our digital self–a version of us that is quickly becoming inseparable from our physical self. I call this comingled digital and analog self our "Meta Me." The more the Meta Me uses digital tools, the more conscious it will become–a development that will have tremendous social, ethical, and legal implications. Some are already coming to light.

New tools aim to automate the hunt for the latest research


Originally posted on The Horizons Tracker. Automation is reaching into a vast range of professions, and my own is certainly no different. Through this blog and various other means, I try to locate interesting research and practices from around the world, and bring them together into some kind of narrative. With so much going on around the world, it stands to reason that a computer could be trained to do a similar task, and that's certainly the aim of Semantic Scholar, a new tool launched by The Allen Institute for Artificial Intelligence. The tool offers users a means of hunting for papers in specific fields, and then filter your search by date, publication and so on.