Bias Mitigation or Cultural Commonsense? Evaluating LLMs with a Japanese Dataset