FlashFormer: Whole-Model Kernels for Efficient Low-Batch Inference