Memory-Based Learning
- Europe > Netherlands > North Holland > Amsterdam (0.05)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Germany > Brandenburg > Potsdam (0.04)
- (2 more...)
- Health & Medicine (0.93)
- Information Technology (0.76)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.40)
- Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Case Based Reasoning (0.40)
On Memorization in Probabilistic Deep Generative Models
Gerrit J.J. van den Burg, Christopher K.I. Williams
Of course, spotting near duplicates of training observations is only possible because these models yield realistic samples. This section describes additional details of the data sets, model architectures, and experimental setup. CIFAR-10 contains color images from 10 different categories and does not require further preprocessing. For CIFAR-10 and CelebA we used random horizontal flips during training as data augmentation. Full details of the model architecture are given in Table 1.
- Information Technology > Artificial Intelligence > Natural Language > Generation (0.78)
- Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.53)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.41)
Appendix of " Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning "
T ( x) = [CLS]x It was [MASK]. PLM to extract the label-related words from the whole unlabeled training corpus. We report the hyper-parameters in Table 2. Most of the hyper-parameters are the default parameters Thus, we provide insight into the effect of β, k and λ on the final results. We think the model may require more reference when there is no data for training. We will leave the engineering optimization about retrieval speed in our future work.
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.41)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.32)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Michigan (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- (10 more...)
- Media > Film (0.46)
- Leisure & Entertainment (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.64)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (16 more...)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.67)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Europe > France (0.04)
- North America > Canada > British Columbia > Vancouver (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.42)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.98)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.62)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.42)
Impact of Layer Norm on Memorization and Generalization in Transformers
Layer Normalization (LayerNorm) is one of the fundamental components in transformers that stabilizes training and improves optimization. In recent times, Pre-LayerNorm transformers have become the preferred choice over Post-LayerNorm transformers due to their stable gradient flow. However, the impact of LayerNorm on learning and memorization across these architectures remains unclear. In this work, we investigate how LayerNorm influences memorization and learning for Pre- and Post-LayerNorm transformers. We identify that LayerNorm serves as a key factor for stable learning in Pre-LayerNorm transformers, while in Post-LayerNorm transformers, it impacts memorization. Our analysis reveals that eliminating LayerNorm parameters in Pre-LayerNorm models exacerbates memorization and destabilizes learning, while in Post-LayerNorm models, it effectively mitigates memorization by restoring genuine labels. We further precisely identify that early layers LayerNorm are the most critical over middle/later layers and their influence varies across Pre and Post LayerNorm models. We have validated it through 13 models across 6 Vision and Language datasets. These insights shed new light on the role of LayerNorm in shaping memorization and learning in transformers.
- North America > United States > North Carolina (0.40)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)