Parallel SGD: When does averaging help?