On the Banach spaces associated with multi-layer ReLU networks: Function representation, approximation theory and gradient descent dynamics