l1-norm
Reasons and Solutions for the Decline in Model Performance after Editing
Knowledge editing technology has received widespread attention for low-cost updates of incorrect or outdated knowledge in large-scale language models. However, recent research has found that edited models often exhibit varying degrees of performance degradation. The reasons behind this phenomenon and potential solutions have not yet been provided. In order to investigate the reasons for the performance decline of the edited model and optimize the editing method, this work explores the underlying reasons from both data and model perspectives.
Redistribution of Weights and Activations for AdderNet Quantization
Adder Neural Network (AdderNet) provides a new way for developing energy-efficient neural networks by replacing the expensive multiplications in convolution with cheaper additions (i.e., L1-norm). To achieve higher hardware efficiency, it is necessary to further study the low-bit quantization of AdderNet. Due to the limitation that the commutative law in multiplication does not hold in L1-norm, the well-established quantization methods on convolutional networks cannot be applied on AdderNets. Thus, the existing AdderNet quantization techniques propose to use only one shared scale to quantize both the weights and activations simultaneously. Admittedly, such an approach can keep the commutative law in the L1-norm quantization process, while the accuracy drop after low-bit quantization cannot be ignored.
Reasons and Solutions for the Decline in Model Performance after Editing
Knowledge editing technology has received widespread attention for low-cost updates of incorrect or outdated knowledge in large-scale language models. However, recent research has found that edited models often exhibit varying degrees of performance degradation. The reasons behind this phenomenon and potential solutions have not yet been provided. In order to investigate the reasons for the performance decline of the edited model and optimize the editing method, this work explores the underlying reasons from both data and model perspectives. Specifically, 1) from a data perspective, to clarify the impact of data on the performance of editing models, this paper first constructs a Multi-Question Dataset (MQD) to evaluate the impact of different types of editing data on model performance. The performance of the editing model is mainly affected by the diversity of editing targets and sequence length, as determined through experiments.
Redistribution of Weights and Activations for AdderNet Quantization
Adder Neural Network (AdderNet) provides a new way for developing energy-efficient neural networks by replacing the expensive multiplications in convolution with cheaper additions (i.e., L1-norm). To achieve higher hardware efficiency, it is necessary to further study the low-bit quantization of AdderNet. Due to the limitation that the commutative law in multiplication does not hold in L1-norm, the well-established quantization methods on convolutional networks cannot be applied on AdderNets. Thus, the existing AdderNet quantization techniques propose to use only one shared scale to quantize both the weights and activations simultaneously. Admittedly, such an approach can keep the commutative law in the L1-norm quantization process, while the accuracy drop after low-bit quantization cannot be ignored.
Functional form of motion priors in human motion perception
It has been speculated that the human motion system combines noisy measurements with prior expectations in an optimal, or rational, manner. The basic goal of our work is to discover experimentally which prior distribution is used. More specifically, we seek to infer the functional form of the motion prior from the performance of human subjects on motion estimation tasks. We restricted ourselves to priors which combine three terms for motion slowness, first-order smoothness, and second-order smoothness. We focused on two functional forms for prior distributions: L2-norm and L1-norm regularization corresponding to the Gaussian and Laplace distributions respectively.
Functional form of motion priors in human motion perception
Lu, Hongjing, Lin, Tungyou, Lee, Alan, Vese, Luminita, Yuille, Alan L.
It has been speculated that the human motion system combines noisy measurements with prior expectations in an optimal, or rational, manner. The basic goal of our work is to discover experimentally which prior distribution is used. More specifically, we seek to infer the functional form of the motion prior from the performance of human subjects on motion estimation tasks. We restricted ourselves to priors which combine three terms for motion slowness, first-order smoothness, and second-order smoothness. We focused on two functional forms for prior distributions: L2-norm and L1-norm regularization corresponding to the Gaussian and Laplace distributions respectively.
Robust Bhattacharyya bound linear discriminant analysis through adaptive algorithm
Li, Chun-Na, Shao, Yuan-Hai, Wang, Zhen, Deng, Nai-Yang
In this paper, we propose a novel linear discriminant analysis criterion via the Bhattacharyya error bound estimation based on a novel L1-norm (L1BLDA) and L2-norm (L2BLDA). Both L1BLDA and L2BLDA maximize the between-class scatters which are measured by the weighted pairwise distances of class means and meanwhile minimize the within-class scatters under the L1-norm and L2-norm, respectively. The proposed models can avoid the small sample size (SSS) problem and have no rank limit that may encounter in LDA. It is worth mentioning that, the employment of L1-norm gives a robust performance of L1BLDA, and L1BLDA is solved through an effective non-greedy alternating direction method of multipliers (ADMM), where all the projection vectors can be obtained once for all. In addition, the weighting constants of L1BLDA and L2BLDA between the between-class and within-class terms are determined by the involved data set, which makes our L1BLDA and L2BLDA adaptive. The experimental results on both benchmark data sets as well as the handwritten digit databases demonstrate the effectiveness of the proposed methods.
- North America > United States > New York (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- Asia > India > West Bengal > Kolkata (0.04)
- (4 more...)
EMHMM Simulation Study
Chan, Antoni B., Hsiao, Janet H.
Eye Movement analysis with Hidden Markov Models (EMHMM) is a method for modeling eye fixation sequences using hidden Markov models (HMMs). In this report, we run a simulation study to investigate the estimation error for learning HMMs with variational Bayesian inference, with respect to the number of sequences and the sequence lengths. We also relate the estimation error measured by KL divergence and L1-norm to a corresponding distortion in the ground-truth HMM parameters.
A Trace Lasso Regularized L1-norm Graph Cut for Highly Correlated Noisy Hyperspectral Image
Mohanty, Ramanarayan, Happy, S L, Suthar, Nilesh, Routray, Aurobinda
This work proposes an adaptive trace lasso regularized L1-norm based graph cut method for dimensionality reduction of Hyperspectral images, called as `Trace Lasso-L1 Graph Cut' (TL-L1GC). The underlying idea of this method is to generate the optimal projection matrix by considering both the sparsity as well as the correlation of the data samples. The conventional L2-norm used in the objective function is sensitive to noise and outliers. Therefore, in this work L1-norm is utilized as a robust alternative to L2-norm. Besides, for further improvement of the results, we use a penalty function of trace lasso with the L1GC method. It adaptively balances the L2-norm and L1-norm simultaneously by considering the data correlation along with the sparsity. We obtain the optimal projection matrix by maximizing the ratio of between-class dispersion to within-class dispersion using L1-norm with trace lasso as the penalty. Furthermore, an iterative procedure for this TL-L1GC method is proposed to solve the optimization function. The effectiveness of this proposed method is evaluated on two benchmark HSI datasets.
Differences between L1 and L2 as Loss Function and Regularization
Next time I will not draw mspaint but actually plot it out.] While practicing machine learning, you may have come upon a choice of the mysterious L1 vs L2. Usually the two decisions are: 1) L1-norm vs L2-norm loss function; and 2) L1-regularization vs L2-regularization. L1-norm loss function is also known as least absolute deviations (LAD), least absolute errors (LAE). L2-norm loss function is also known as least squares error (LSE). The method of least absolute deviations finds applications in many areas, due to its robustness compared to the least squares method. Least absolute deviations is robust in that it is resistant to outliers in the data.