discrete variable
- Asia > Japan > Honshū > Tōhoku > Iwate Prefecture > Morioka (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > Canada (0.14)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- (5 more...)
Leveraging Inter-Layer Dependency for Post -Training Quantization
Prior works on Post-training Quantization (PTQ) typically separate a neural network into sub-nets and quantize them sequentially. This process pays little attention to the dependency across the sub-nets, hence is less optimal. In this paper, we propose a novel Network-Wise Quantization (NWQ) approach to fully leveraging inter-layer dependency. NWQ faces a larger scale combinatorial optimization problem of discrete variables than in previous works, which raises two major challenges: over-fitting and discrete optimization problem. NWQ alleviates over-fitting via a Activation Regularization (AR) technique, which better controls the activation distribution. To optimize discrete variables, NWQ introduces Annealing Softmax (ASoftmax) and Annealing Mixup (AMixup) to progressively transition quantized weights and activations from continuity to discretization, respectively. Extensive experiments demonstrate that NWQ outperforms previous state-of-the-art by a large margin: 20.24\% for the challenging configuration of MobileNetV2 with 2 bits on ImageNet, pushing extremely low-bit PTQ from feasibility to usability. In addition, NWQ is able to achieve competitive results with only 10\% computation cost of previous works.
- Asia > Japan > Honshū > Tōhoku > Iwate Prefecture > Morioka (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
1. [ALL] As R3 appreciates, our paper is mainly theoretical in nature and the focus has been to present a correct
Regarding "plots are noisy and don't really support well the claim that the algorithm recovers the true Check the sharp jump in Figure 2 which is expected based on Theorem 3. Similarly, Figure 3 shows that Markov blanket can be recovered with sufficient number of observational data. NP-hard [Chickering, 1996, Learning Bayesian Networks Is NP-Complete]. Rank-2 is only used for clarity. Reviewer 2 has asked to present a case where Assumption 4 is violated. Assume that every variable can take 4 values.
Bouncy particle sampler with infinite exchanging parallel tempering
Saito, Yohei, Kimura, Shun, Takeda, Koujin
Bayesian inference is useful to obtain a predictive distribution with a small generalization error. However, since posterior distributions are rarely evaluated analytically, we employ the variational Bayesian inference or sampling method to approximate posterior distributions. When we obtain samples from a posterior distribution, Hamiltonian Monte Carlo (HMC) has been widely used for the continuous variable part and Markov chain Monte Carlo (MCMC) for the discrete variable part. Another sampling method, the bouncy particle sampler (BPS), has been proposed, which combines uniform linear motion and stochastic reflection to perform sampling. BPS was reported to have the advantage of being easier to set simulation parameters than HMC. To accelerate the convergence to a posterior distribution, we introduced parallel tempering (PT) to BPS, and then proposed an algorithm when the inverse temperature exchange rate is set to infinity. We performed numerical simulations and demonstrated its effectiveness for multimodal distribution.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)
NeurIPS 2019: Pseudo-Extended Markov chain Monte Carlo (paper ID: 2415) 1 We would like to thank the reviewers for dedicating their time to review our paper and the helpful feedback they have
All of the reviewers' minor comments and corrections have been added to Below, we address the reviewers' main questions. The paper focuses on HMC sampling. Unfortunately, HMC can't be applied in the discrete setting due to discontinuous How do you recommend setting π and g to best estimate β? Therefore, it's quite straightforward to implement pseudo-extended HMC within Stan by As a minor comment in line 58, it would be good to state that delta is an arbitrary differentiable function. This is a good point and we've corrected this in the paper. The experiments in 4.1 and 4.2 use the RMSE error of the target variables which is quite unusual.