Goto

Collaborating Authors

 xij





4fc81f4cd2715d995018e0799262176b-Supplemental-Conference.pdf

Neural Information Processing Systems

Two other important techniques are mixed precision training [36] and in-place activated BatchNorm [53]. Mixed precision training involves training using both 32-bit and 16-bit IEEE floating point numbers depending onthenumerical sensitivityofdifferent layers [36].


FactorGraphNeuralNet--SupplementaryFile AProof of propositions

Neural Information Processing Systems

First we provide Lemma 8, which will be used in the proof of Proposition 2 and 4. Lemma 8. Given n non-negative feature vectors fi =[fi0,fi1,...,fim], where i=1,...,n, there exists n matrices Qi with shape nm m and n vector ˆfi =QifTi, s.t. Proposition 2. A factor graph G =(V,C,E) with variable log potentialsθi(xi) and factor log potentials ϕc(xc) can be converted to a factor graph G0 with the same variable potentials and the decomposed log-potentials ϕic(xi,zc) using a one-layer FGNN. Without loss of generality, we assume that logφc(xc)>1. Then for each i the item θic(xi,zc) in (9) have kn+1 entries, and each entry is either a scaled entry of the vectorgc or arbitrary negative number less than maxxcθc(xc). Thusifweorganize θic(xi,zc) asalength-kn+1 vector fic, thenwedefinea kn+1 kn matrix Qci, where if and only if thelth entry of fic is set to the mth entry of gc multiplied by 12 1/|s(c)|, the entry of Qci in lth row, mth column will be set to 1/|s(c)|; all the other entries of Qci is set to some negative number smaller than maxxcθc(xc).


C qNEHVIunderDifferentComputationalApproaches C.1 DerivationofIEPformulationofqNEHVI From(4),theexpectednoisyjointhypervolumeimprovementisgivenby

Neural Information Processing Systems

Bayesian Optimization specifically aims toincrease sample efficiencyfor hard optimization algorithms, and consequently can help achieve better solutions without incurring large societal costs. In the 2-objective case, instead of padding the box decomposition, the Pareto frontier under each posterior sample can be padded instead by repeating a point on the Pareto Frontier such that the padded Pareto frontier under every posterior sample has exactlymaxt|Pt| points. Since the sequentialNEHVIis equivalent to theqNEHVIwith q = 1, we prove Theorem 1 for the generalq > 1 case. Recall from Section C.2, that using the method of common random numbers tofixthebasesamples, theIEPandCBDformulations areequivalent. Note that the box decomposition of the non-dominated space{S1,...,SKt} and the number of rectangles in the box decomposition depend onζt.


07bc722f08f096e6ea7ee99349ff0a86-Paper-Conference.pdf

Neural Information Processing Systems

In this paper, we study dataset distillation (DD), from a novel perspective and introduce adataset factorizationapproach, termedHaBa, which is a plug-andplay strategy portable to any existing DD baseline.