Convergence rates for pretraining and dropout: Guiding learning parameters using network structure