The architecture of the network is surprisingly simple! It takes N points as an unordered set of 3D points. It applies some transformations to make sure that the order of the points would not matter. And then, those points are passed through a series of MLPs (multi-layer perceptrons) and max pooling layers to get global features at the end. For classification, these features are then fed to another MLP to get K outputs representing K classes.
Nov-25-2021, 22:25:05 GMT