r/MachineLearning - [D] Gradient Descent on (deterministic) Mean Absolute Error (L1 loss)
Gradient-based optimization of absolute errors is tricky, since the gradient is "never" zero. In theory, adaptive methods should be able to damp oscillations so that it converges to the minimum. However, I found none of the'standard' methods were able to do this "out of the box". Learning rate decay could alleviate the problem, but needs manual tuning which I would rather avoid. Does anyone know of a method that can do this?
Jan-21-2019, 14:28:35 GMT
- Technology: