White-Box Transformers via Sparse Rate Reduction Y aodong Y u 1 Sam Buchanan

Neural Information Processing Systems 

Much of this success is owed to effective learning of the data distribution and then transforming the distribution to a parsimonious, i.e. structured and compact, representation [39, 50, 52, 62], which facilitates many downstream tasks (e.g., in vision,

Similar Docs  Excel Report  more

TitleSimilaritySource
None found