Every Model Learned by Gradient Descent Is Approximately a Kernel Machine

Open in new window