Giga-scale Kernel Matrix-Vector Multiplication on GPU