What limits performance of weakly supervised deep learning for chest CT classification?