I think one thing that might help... look at it like this. A standard densely connected layer can represent a convolutional layer (standard'under the hood' implementation of convolution layers even converts it into a dense layer in a lot of cases so it can leverage fast matrix operations). In theory, if the dense layer can represent a convolutional layer, why's the convolutional layer used instead? You could just say that it's because there's less parameters, but it goes deeper than that. It'makes an assumption' that things likely to be seen in the dataset should be translation equivarient.
Oct-14-2020, 19:01:25 GMT