Stark: Social Long-Term Multi-Modal Conversation with Persona Commonsense Knowledge