Removing Speaker Information from Speech Representation using Variable-Length Soft Pooling