Toward matrix multiplication for deep learning inference on the Xilinx Versal