An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training

Open in new window