Comparing Apples to Oranges: Learning Similarity Functions for Data Produced by Different Distributions