Integration of Pre-trained Networks with Continuous Token Interface for End-to-End Spoken Language Understanding