Data-driven simulator of multi-animal behavior with unknown dynamics via offline and online reinforcement learning