Eliciting and Understanding Cross-Task Skills with Task-Level Mixture-of-Experts

Nov-21-2022–arXiv.org Artificial Intelligence

Recent works suggest that transformer models are capable of multi-tasking on diverse NLP tasks and adapting to new tasks efficiently. However, the potential of these multi-task models may be limited as they use the same set of parameters for all tasks. In contrast, humans tackle tasks in a more flexible way, by making proper presumptions on what skills and knowledge are relevant and executing only the necessary computations. Inspired by this, we propose to use task-level mixture-of-expert models, which has a collection of transformer layers (i.e., experts) and a router component that chooses from these experts dynamically and flexibly. We find that these models help improve the average performance gain (ARG) metric by 2.6% when adapting to unseen tasks in the few-shot setting and by 5.6% in the zero-shot generalization setting. Further, we show that the learned routing decisions partly rediscover human categorization of NLP tasks -- certain experts are strongly associated with extractive tasks, some with classification tasks, and some with tasks requiring world knowledge.

computational linguistic, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Nov-21-2022

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - California (0.14)
    - Oregon (0.04)
    - New York (0.04)
    - Washington > King County
      - Seattle (0.04)
    - Texas > Travis County
      - Austin (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - Michigan > Washtenaw County
      - Ann Arbor (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
  - Canada
    - Quebec > Montreal (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
- Europe
  - Germany > Berlin (0.04)
  - Czechia > Prague (0.04)
  - Iceland > Capital Region
    - Reykjavik (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)
  - Romania > Sud - Muntenia Development Region
    - Giurgiu County > Giurgiu (0.04)
  - Portugal > Lisbon
    - Lisbon (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
  - United Kingdom > Scotland
    - City of Edinburgh > Edinburgh (0.04)
- Asia
  - China > Hong Kong (0.04)
  - Middle East > Jordan (0.04)
  - South Korea > Seoul
    - Seoul (0.04)
  - Japan
    - Kyūshū & Okinawa > Kyūshū
      - Miyazaki Prefecture > Miyazaki (0.04)
    - Honshū > Kantō
      - Tokyo Metropolis Prefecture > Tokyo (0.14)
- Africa > Ethiopia
  - Addis Ababa > Addis Ababa (0.04)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Health & Medicine (1.00)
- Education (0.69)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Text Processing (1.00)
  - Cognitive Science (1.00)
  - Representation & Reasoning > Commonsense Reasoning (0.93)
  - Machine Learning > Neural Networks
    - Deep Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found