Efficient Vision-Language Models by Summarizing Visual Tokens into Compact Registers