Appendix
A.1 Proof of Proposition 1 and 3 To prove Proposition 1, we first need the following lemma: Lemma 1 (Alternative equivalent definition of functional KL divergence [47]). Readers may refer to [47] for the proof of this lemma. Proposition 1. Suppose c has full support on T At q(f |D), both divergence achieves minimum value, 0. Therefore, D Proposition 3. Let n, X A.2 Proof of Proposition 2 Proposition 2. Let p(f) and q(f) be two distributions for random functions. U(T), 1 k n That is, c first samples a positive integer n from the distribution p(n), and then draw n samples from T independently and uniformly. We will now discuss these two cases separately. The first inequality is due to information processing inequality. For example,p(n) could be a geometric distribution with mean parameter greater than 1 (or success probability that is strictly greater than 0, and strictly smaller than 1).. Since geometric distribution has full support in Z Similar to the first case, let p(n) be a geometric distribution with mean parameter greater than 1 (or success probability that is strictly greater than 0, and strictly smaller than 1).
Appendix for " Beyond the Signs: Nonparametric Tensor Completion via Sign Series "
The appendix consists of proofs (Section A), additional theoretical results (Section B), and numerical experiments (Section C). When g is strictly increasing, the mapping x g(x) is sign preserving. See Section B.2 for constructive examples. Based on the definition of classification loss L(,), the function Risk() relies only on the sign pattern of the tensor. The equality (2) is attained when z = sgn(ฮธ ฯ) or ฮธ = ฯ.
Beyond the Signs: Nonparametric Tensor Completion via Sign Series
We consider the problem of tensor estimation from noisy observations with possibly missing entries. A nonparametric approach to tensor completion is developed based on a new model which we coin as sign representable tensors. The model represents the signal tensor of interest using a series of structured sign tensors. Unlike earlier methods, the sign series representation effectively addresses both low-and high-rank signals, while encompassing many existing tensor models-- including CP models, Tucker models, single index models, structured tensors with repeating entries--as special cases. We provably reduce the tensor estimation problem to a series of structured classification tasks, and we develop a learning reduction machinery to empower existing low-rank tensor algorithms for more challenging high-rank estimation. Excess risk bounds, estimation errors, and sample complexities are established. We demonstrate the outperformance of our approach over previous methods on two datasets, one on human brain connectivity networks and the other on topic data mining.
Stateful ODE-Nets using Basis Function Expansions
The recently-introduced class of ordinary differential equation networks (ODE-Nets) establishes a fruitful connection between deep learning and dynamical systems. In this work, we reconsider formulations of the weights as continuous-indepth functions using linear combinations of basis functions which enables us to leverage parameter transformations such as function projections. In turn, this view allows us to formulate a novel stateful ODE-Block that handles stateful layers. The benefits of this new ODE-Block are twofold: first, it enables incorporating meaningful continuous-in-depth batch normalization layers to achieve state-of-theart performance; second, it enables compressing the weights through a change of basis, without retraining, while maintaining near state-of-the-art performance and reducing both inference time and memory footprint. Performance is demonstrated by applying our stateful ODE-Block to (a) image classification tasks using convolutional units and (b) sentence-tagging tasks using transformer encoder units.
We sincerely thank all the reviewers for their helpful comments
We sincerely thank all the reviewers for their helpful comments. A: The actual accuracy is 75.8%, A: In Fig.3, the curves are not simple linear regression and unknown to us. A: We set the magnitude of RandAugment to be 9 with a std as 0.5 in all networks. We found that "resolution and depth are The performance of TinyNets is about 0.3-3.8%