Goto

Collaborating Authors

 bias neuron




CRISPR: Eliminating Bias Neurons from an Instruction-following Language Model

Yang, Nakyeong, Kang, Taegwan, Jung, Kyomin

arXiv.org Artificial Intelligence

Large language models (LLMs) executing tasks through instruction-based prompts often face challenges stemming from distribution differences between user instructions and training instructions. This leads to distractions and biases, especially when dealing with inconsistent dynamic labels. In this paper, we introduces a novel bias mitigation method, CRISPR, designed to alleviate instruction-label biases in LLMs. CRISPR utilizes attribution methods to identify bias neurons influencing biased outputs and employs pruning to eliminate the bias neurons. Experimental results demonstrate the method's effectiveness in mitigating biases in instruction-based prompting, enhancing language model performance on social bias benchmarks without compromising pre-existing knowledge. CRISPR proves highly practical, model-agnostic, offering flexibility in adapting to evolving social biases.


A Guide to Deep Learning and Neural Networks

#artificialintelligence

Every neural network consists of neurons, synapses, weights, biases, and functions. A neuron or a node of a neural network is a computing unit that receives information, performs simple calculations with it, and passes it further. In a large neural network with many neurons and connections between them, neurons are organized in layers. An input layer receives information, n hidden layers (at least three or more) process it, and an output layer provides some result. If this is the first layer, input output.


Ultra-low power on-chip learning of speech commands with phase-change memories

Miriyala, Venkata Pavan Kumar, Ishii, Masatoshi

arXiv.org Artificial Intelligence

Embedding artificial intelligence at the edge (edge-AI) is an elegant solution to tackle the power and latency issues in the rapidly expanding Internet of Things. As edge devices typically spend most of their time in sleep mode and only wake-up infrequently to collect and process sensor data, non-volatile in-memory computing (NVIMC) is a promising approach to design the next generation of edge-AI devices. Recently, we proposed an NVIMC-based neuromorphic accelerator using the phase change memories (PCMs), which we call as Raven. In this work, we demonstrate the ultra-low-power on-chip training and inference of speech commands using Raven. We showed that Raven can be trained on-chip with power consumption as low as 30~uW, which is suitable for edge applications. Furthermore, we showed that at iso-accuracies, Raven needs 70.36x and 269.23x less number of computations to be performed than a deep neural network (DNN) during inference and training, respectively. Owing to such low power and computational requirements, Raven provides a promising pathway towards ultra-low-power training and inference at the edge.


Understanding Weight Normalized Deep Neural Networks with Rectified Linear Units

Xu, Yixi, Wang, Xiao

Neural Information Processing Systems

This paper presents a general framework for norm-based capacity control for $L_{p,q}$ weight normalized deep neural networks. We establish the upper bound on the Rademacher complexities of this family. With an $L_{p,q}$ normalization where $q\le p^*$ and $1/p+1/p^{*}=1$, we discuss properties of a width-independent capacity control, which only depends on the depth by a square root term. We further analyze the approximation properties of $L_{p,q}$ weight normalized deep neural networks. In particular, for an $L_{1,\infty}$ weight normalized network, the approximation error can be controlled by the $L_1$ norm of the output layer, and the corresponding generalization error only depends on the architecture by the square root of the depth.


Understanding Weight Normalized Deep Neural Networks with Rectified Linear Units

Xu, Yixi, Wang, Xiao

Neural Information Processing Systems

This paper presents a general framework for norm-based capacity control for $L_{p,q}$ weight normalized deep neural networks. We establish the upper bound on the Rademacher complexities of this family. With an $L_{p,q}$ normalization where $q\le p^*$ and $1/p+1/p^{*}=1$, we discuss properties of a width-independent capacity control, which only depends on the depth by a square root term. We further analyze the approximation properties of $L_{p,q}$ weight normalized deep neural networks. In particular, for an $L_{1,\infty}$ weight normalized network, the approximation error can be controlled by the $L_1$ norm of the output layer, and the corresponding generalization error only depends on the architecture by the square root of the depth.


Neural Network Bias: Bias Neuron, Overfitting and Underfitting -

#artificialintelligence

Practically, when training a neural network model, you will attempt to gather a training set that is as large as possible and resembles the real population as much as possible. You will then break up the training set into at least two groups: one group will be used as the training set and the other as the validation set. Bias--high bias means the model is not "fitting" well on the training set. This means the training error will be large. Low bias means the model is fitting well, and training error will be low.


Understanding Weight Normalized Deep Neural Networks with Rectified Linear Units

Xu, Yixi, Wang, Xiao

arXiv.org Machine Learning

This paper presents a general framework for norm-based capacity control for $L_{p,q}$ weight normalized deep neural networks. We establish the upper bound on the Rademacher complexities of this family. With an $L_{p,q}$ normalization where $q\le p^*$, and $1/p+1/p^{*}=1$, we discuss properties of a width-independent capacity control, which only depends on depth by a square root term. We further analyze the approximation properties of $L_{p,q}$ weight normalized deep neural networks. In particular, for an $L_{1,\infty}$ weight normalized network, the approximation error can be controlled by the $L_1$ norm of the output layer, and the corresponding generalization error only depends on the architecture by the square root of the depth.


How to teach logic to your #NeuralNetworks ? -- Autonomous Agents -- #AI

#artificialintelligence

Logic gates are the fundamental building blocks of electronics. They help in addition, choice, negation and combination thereof to form complex circuits. In this post, I shall walk you through couple of basic logic gates to demonstrate how Neural Networks learn on their own without us having to manually code the if-then-else logic. Let's get started with the "Hello World" of Neural Networks, which is the XOR gate. A XOR gate is a exclusive OR gate with two inputs A and B and an output.