A Critical Re-evaluation of Benchmark Datasets for (Deep) Learning-Based Matching Algorithms