Since our task is detection and not segmentation, correctly predicting only a sufficient amount of voxels around the vertebra centroid is needed to detect normal or fractured vertebrae in an image. We leverage this observation to construct 3D label images for our training database in a semi-automated fashion. First, radiologist S.R. created a text file with annotations for every vertebra present in the field of view as described in section 2. Next, J.N. enriched these labels with 3D centroid coordinates by manually localizing every vertebra centroid in the image using MeVisLab . This step required an average of less than two minutes per image in our dataset. Finally, we extended the method described by Glocker et al.  to automatically generate 3D label images from these sparse annotations.