Information Aggregation for Multi-Head Attention with Routing-by-Agreement

Open in new window