Towards predicting Pedestrian Evacuation Time and Density from Floorplans using a Vision Transformer