How do Decisions Emerge across Layers in Neural Models? Interpretation with Differentiable Masking