Mirage or Method? How Model-Task Alignment Induces Divergent RL Conclusions

Open in new window