Learning-enabled multi-modal motion prediction in urban environments