implicit differentiation
Minimax Generalized Cross-Entropy
Bondugula, Kartheek, Mazuelas, Santiago, Pérez, Aritz, Liu, Anqi
Loss functions play a central role in supervised classification. Cross-entropy (CE) is widely used, whereas the mean absolute error (MAE) loss can offer robustness but is difficult to optimize. Interpolating between the CE and MAE losses, generalized cross-entropy (GCE) has recently been introduced to provide a trade-off between optimization difficulty and robustness. Existing formulations of GCE result in a non-convex optimization over classification margins that is prone to underfitting, leading to poor performances with complex datasets. In this paper, we propose a minimax formulation of generalized cross-entropy (MGCE) that results in a convex optimization over classification margins. Moreover, we show that MGCEs can provide an upper bound on the classification error. The proposed bilevel convex optimization can be efficiently implemented using stochastic gradient computed via implicit differentiation. Using benchmark datasets, we show that MGCE achieves strong accuracy, faster convergence, and better calibration, especially in the presence of label noise.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.05)
- North America > United States (0.04)
- Europe > Spain > Basque Country > Biscay Province > Bilbao (0.04)
- Africa > Middle East > Morocco > Tanger-Tetouan-Al Hoceima Region > Tangier (0.04)
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur > Alpes-Maritimes > Nice (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
One-step differentiation of iterative algorithms
For iterative algorithms, implicit differentiation alleviates this issue but requires custom implementation of Jacobian evaluation. In this paper, we study one-step differentiation, also known as Jacobian-free backpropagation, a method as easy as automatic differentiation and as efficient as implicit differentiation for fast algorithms (e.g., superlinear
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur > Alpes-Maritimes > Nice (0.04)
- Europe > Finland > Uusimaa > Helsinki (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
- Europe > Austria > Vienna (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- North America > Canada > Ontario > Toronto (0.14)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Asia > China > Hong Kong (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- North America > Canada > Ontario > Toronto (0.14)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Asia > China > Hong Kong (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- North America > Canada > British Columbia > Vancouver (0.04)
- Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)