Benchmarking community drug response prediction models: datasets, models, tools, and metrics for cross-dataset generalization analysis