One Network Fits All? Modular versus Monolithic Task Formulations in Neural Networks

Agarwala, Atish, Das, Abhimanyu, Juba, Brendan, Panigrahy, Rina, Sharan, Vatsal, Wang, Xin, Zhang, Qiuyi

arXiv.org Artificial Intelligence 

Can deep learning solve multiple tasks simultaneously, even when they are unrelated and very different? We investigate how the representations of the underlying tasks affect the ability of a single neural network to learn them jointly. We present theoretical and empirical findings that a single neural network is capable of simultaneously learning multiple tasks from a combined data set, for a variety of methods for representing tasks--for example, when the distinct tasks are encoded by well-separated clusters or decision trees over certain task-code attributes. More concretely, we present a novel analysis that shows that families of simple programming-like constructs for the codes encoding the tasks are learnable by two-layer neural networks with standard training. We study more generally how the complexity of learning such combined tasks grows with the complexity of the task codes; we find that combining many tasks may incur a sample complexity penalty, even though the individual tasks are easy to learn. We provide empirical support for the usefulness of the learning bounds by training networks on clusters, decision trees, and SQL-style aggregation. Standard practice in machine learning has long been to only address carefully circumscribed, often very related tasks. For example, we might train a single classifier to label an image as containing objects from a certain predefined set, or to label the words of a sentence with their semantic roles. Indeed, when working with relatively simple classes of functions like linear classifiers, it would be unreasonable to expect to train a classifier that handles more than such a carefully scoped task (or related tasks in standard multitask learning). As techniques for learning with relatively rich classes such as neural networks have been developed, it is natural to ask whether or not such scoping of tasks is inherently necessary. Indeed, many recent works (see Section 1.2) have proposed eschewing this careful scoping of tasks, and instead training a single, "monolithic" function spanning many tasks. Large, deep neural networks can, in principle, represent multiple classifiers in such a monolithic learned function (Hornik, 1991), giving rise to the field of multitask learning. This combined function might be learned by combining all of the training data for all of the tasks into one large batch-see Section 1.2 for some examples. Taken to an extreme, we could consider seeking to learn a universal circuit--that is, a circuit that interprets arbitrary programs in a programming language which can encode various tasks. But, the ability to represent such a monolithic combined function does not necessarily entail that such a function can be efficiently learned by existing methods.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found