A sampling-based approach for efficient clustering in large datasets