S4ND: Modeling Images and Videos as Multidimensional Signals with State Spaces