Small LLMs Do Not Learn a Generalizable Theory of Mind via Reinforcement Learning