xij
Neural Generalized Mixed-Effects Models
Slavutsky, Yuli, Salazar, Sebastian, Blei, David M.
Generalized linear mixed-effects models (GLMMs) are widely used to analyze grouped and hierarchical data. In a GLMM, each response is assumed to follow an exponential-family distribution where the natural parameter is given by a linear function of observed covariates and a latent group-specific random effect. Since exact marginalization over the random effects is typically intractable, model parameters are estimated by maximizing an approximate marginal likelihood. In this paper, we replace the linear function with neural networks. The result is a more flexible model, the neural generalized mixed-effects model (NGMM), which captures complex relationships between covariates and responses. To fit NGMM to data, we introduce an efficient optimization procedure that maximizes the approximate marginal likelihood and is differentiable with respect to network parameters. We show that the approximation error of our objective decays at a Gaussian-tail rate in a user-chosen parameter. On synthetic data, NGMM improves over GLMMs when covariate-response relationships are nonlinear, and on real-world datasets it outperforms prior methods. Finally, we analyze a large dataset of student proficiency to demonstrate how NGMM can be extended to more complex latent-variable models.
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Tennessee (0.04)
- Health & Medicine (1.00)
- Education (1.00)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > Middle East > Republic of Türkiye > Ankara Province > Ankara (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Minnesota (0.04)
- (7 more...)
FactorGraphNeuralNet--SupplementaryFile AProof of propositions
First we provide Lemma 8, which will be used in the proof of Proposition 2 and 4. Lemma 8. Given n non-negative feature vectors fi =[fi0,fi1,...,fim], where i=1,...,n, there exists n matrices Qi with shape nm m and n vector ˆfi =QifTi, s.t. Proposition 2. A factor graph G =(V,C,E) with variable log potentialsθi(xi) and factor log potentials ϕc(xc) can be converted to a factor graph G0 with the same variable potentials and the decomposed log-potentials ϕic(xi,zc) using a one-layer FGNN. Without loss of generality, we assume that logφc(xc)>1. Then for each i the item θic(xi,zc) in (9) have kn+1 entries, and each entry is either a scaled entry of the vectorgc or arbitrary negative number less than maxxcθc(xc). Thusifweorganize θic(xi,zc) asalength-kn+1 vector fic, thenwedefinea kn+1 kn matrix Qci, where if and only if thelth entry of fic is set to the mth entry of gc multiplied by 12 1/|s(c)|, the entry of Qci in lth row, mth column will be set to 1/|s(c)|; all the other entries of Qci is set to some negative number smaller than maxxcθc(xc).
C qNEHVIunderDifferentComputationalApproaches C.1 DerivationofIEPformulationofqNEHVI From(4),theexpectednoisyjointhypervolumeimprovementisgivenby
Bayesian Optimization specifically aims toincrease sample efficiencyfor hard optimization algorithms, and consequently can help achieve better solutions without incurring large societal costs. In the 2-objective case, instead of padding the box decomposition, the Pareto frontier under each posterior sample can be padded instead by repeating a point on the Pareto Frontier such that the padded Pareto frontier under every posterior sample has exactlymaxt|Pt| points. Since the sequentialNEHVIis equivalent to theqNEHVIwith q = 1, we prove Theorem 1 for the generalq > 1 case. Recall from Section C.2, that using the method of common random numbers tofixthebasesamples, theIEPandCBDformulations areequivalent. Note that the box decomposition of the non-dominated space{S1,...,SKt} and the number of rectangles in the box decomposition depend onζt.