Mathematical & Statistical Methods
A Learning Algorithm Algorithm 1: Learning algorithm for Dr.k-NN Input: S
B.1 Proof of Theorem 1 The proof of Theorem 1 is based on the following two lemmas. Moreover, when there is a tie (i.e., the set Proof of Lemma 2. Recall that the Wasserstein metric of order 1 is defined as W ( P,P For the sake of completeness, we extend our algorithm to non-few-training-sample setting. The depth of the shaded area shows the level of samples entropy. The entropy of a sample is defined as follows. As a simple example, for Bernoulli random variable (which can represent, e.g., the outcome for flipping a coin with bias Now we use this entropy to define the "uncertainty" associated with each training points. Figure 6 reveals that the most informative samples usually lie in between categories.
A Damped Newton Method Achieves Global O null 1 k 2 null and Local Quadratic Convergence Rate
Newton method of Polyak and Nesterov (2006) and of regularized Newton method of Mishchenko (2021) and Doikov and Nesterov (2021), b) we prove a local quadratic rate, which matches the best-known local rate of second-order methods, and c) our stepsize formula is simple, explicit, and does not require solving any subproblem.
Supplementary material
Theorem A.1 (Deterministic scaling limit of stochastic processes) . The reader interested in the proof is referred to the supplementary materials of [21, 31]. Although the theorem wasn't originally proven in the A.1 corresponds to 1 /δt, where δt is defined in Theorem 2.1. Before proving this proposition, we begin with a small lemma: Lemma B.2. We are now in a position to show Theorem B.1: 16 Proof.