Cell-o1: Training LLMs to Solve Single-Cell Reasoning Puzzles with Reinforcement Learning