Interpretable Data Fusion for Distributed Learning: A Representative Approach via Gradient Matching