test accuracy
Minimax Generalized Cross-Entropy
Bondugula, Kartheek, Mazuelas, Santiago, Pérez, Aritz, Liu, Anqi
Loss functions play a central role in supervised classification. Cross-entropy (CE) is widely used, whereas the mean absolute error (MAE) loss can offer robustness but is difficult to optimize. Interpolating between the CE and MAE losses, generalized cross-entropy (GCE) has recently been introduced to provide a trade-off between optimization difficulty and robustness. Existing formulations of GCE result in a non-convex optimization over classification margins that is prone to underfitting, leading to poor performances with complex datasets. In this paper, we propose a minimax formulation of generalized cross-entropy (MGCE) that results in a convex optimization over classification margins. Moreover, we show that MGCEs can provide an upper bound on the classification error. The proposed bilevel convex optimization can be efficiently implemented using stochastic gradient computed via implicit differentiation. Using benchmark datasets, we show that MGCE achieves strong accuracy, faster convergence, and better calibration, especially in the presence of label noise.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.05)
- North America > United States (0.04)
- Europe > Spain > Basque Country > Biscay Province > Bilbao (0.04)
- Africa > Middle East > Morocco > Tanger-Tetouan-Al Hoceima Region > Tangier (0.04)
Improving Simple Models with Confidence Profiles
In this paper, we propose a new method called ProfWeight for transferring information from a pre-trained deep neural network that has a high test accuracy to a simpler interpretable model or a very shallow network of low complexity and a priori low test accuracy. We are motivated by applications in interpretability and model deployment in severely memory constrained environments (like sensors). Our method uses linear probes to generate confidence scores through flattened intermediate representations. Our transfer method involves a theoretically justified weighting of samples during the training of the simple model using confidence scores of these intermediate layers. The value of our method is first demonstrated on CIFAR-10, where our weighting method significantly improves (3-4\%) networks with only a fraction of the number of Resnet blocks of a complex Resnet model. We further demonstrate operationally significant results on a real manufacturing problem, where we dramatically increase the test accuracy of a CART model (the domain standard) by roughly $13\%$.
- North America > Canada > Quebec > Montreal (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- North America > United States (0.14)
- Asia (0.04)
- Information Technology > Security & Privacy (0.47)
- Government (0.47)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
- North America > United States > Virginia (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (3 more...)
- Asia > Singapore (0.04)
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- (4 more...)