Supplementary Materials for the Paper " L2T-DLN: Learning to Teach with Dynamic Loss Network "

Apr-28-2026, 21:43:01 GMT–Neural Information Processing Systems

In this supplementary material, we provide the proofs of convergence analysis in Section 1, 1-vs-1 transformation employed in the classification and semantic segmentation tasks in Section 2, the coordinate-wise and the preprocessing method of the LSTM teacher in Section 3, the loss functions of YOLO-v3 in Section 4, more experiments of image classification in Section 5, and the inferences of semantic segmentation in Section 6. A differentiable function e()is L-smooth with gradient Lipschitz constant C (uniformly Lipschitz continuous), if e(x) e(y) C x y, x,y. The function is called block-wise smooth with gradient Lipschitz Ci, if i e(x i,xi) ie(x i,x i) Ci xi x i, x,x (1) or with gradient Lipschitz constants { Ci}, if i e(x i,xi) ie(x i,xi) Ci x i x i, x,x (2) Further, Let Gmax max{Ci, Ci, k} C. Definition 2. For a differentiable function e(), if e(x) = 0, then x is a first-order stationary solution (SS1). For a differentiable function e(), if x is a SS1, and there exists ϵ > 0 so that for any y in the ϵ-neighborhood of x, we have e(x) e(y), then xis a local minimum. A saddle point xis an SS1 that is not a local minimum. If λmin( 2e(x)) < 0, x is a strict (non-degenerate) saddle point.

artificial intelligence, machine learning, student model, (18 more...)

Neural Information Processing Systems

Apr-28-2026, 21:43:01 GMT

Conferences PDF

Add feedback

Country:
- Asia > China (0.14)

Genre:
- Research Report > New Finding (0.47)

Industry:
- Education (0.32)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Duplicate Docs Excel Report

Title
8667f264f88c7938a73a53ab01eb1327-Supplemental-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found