SLM: Bridge the thin gap between speech and text foundation models