Alternating optimization of decision trees, with application to learning sparse oblique trees