A Dirty Model for Multi-task Learning

Jalali, Ali, Sanghavi, Sujay, Ruan, Chao, Ravikumar, Pradeep K.

Neural Information Processing Systems 

We consider the multiple linear regression problem, in a setting where some of the set of relevant features could be shared across the tasks. However, these papers also caution that the performance of such block-regularized methods are very dependent on the {\em extent} to which the features are shared across tasks. We are far away from a realistic multi-task setting: not only do the set of relevant features have to be exactly the same across tasks, but their values have to as well. Here, we ask the question: can we leverage support and parameter overlap when it exists, but not pay a penalty when it does not? Indeed, this falls under a more general question of whether we can model such \emph{dirty data} which may not fall into a single neat structural bracket (all block-sparse, or all low-rank and so on).