Neural networks trained with SGD learn distributions of increasing complexity