Goto

Collaborating Authors

 vector norm


Rethinking Deep Learning: Non-backpropagation and Non-optimization Machine Learning Approach Using Hebbian Neural Networks

arXiv.org Artificial Intelligence

Developing strong AI could provide a powerful tool for addressing social and scientific challenges. Neural networks (NNs), inspired by biological systems, have the potential to achieve this. However, weight optimization techniques using error backpropagation are not observed in biological systems, raising doubts about current NNs approaches. In this context, Itoh (2024) solved the MNIST classification problem without using objective functions or backpropagation. However, weight updates were not used, so it does not qualify as machine learning AI. In this study, I develop a machine learning method that mimics biological neural systems by implementing Hebbian learning in NNs without backpropagation and optimization method to solve the MNIST classification problem and analyze its output. Development proceeded in three stages. In the first stage, I applied the Hebbian learning rule to the MNIST character recognition algorithm by Itoh (2024), resulting in lower accuracy than non-Hebbian NNs, highlighting the limitations of conventional training procedures for Hebbian learning. In the second stage, I examined the properties of individually trained NNs using norm-based cognition, showing that NNs trained on a specific label respond powerfully to that label. In the third stage, I created an MNIST character recognition program using vector norm magnitude as the criterion, achieving an accuracy of approximately 75%. This demonstrates that the Hebbian learning NNs can recognize handwritten characters without objective functions, backpropagation, optimization processes, and large data set. Based on these results, developing a mechanism based on norm-based cognition as a fundamental unit and then increasing complexity to achieve indirect similarity cognition should help mimic biological neural systems and contribute to realizing strong AI.


Attention Score is not All You Need for Token Importance Indicator in KV Cache Reduction: Value Also Matters

arXiv.org Artificial Intelligence

Scaling the context size of large language models (LLMs) enables them to perform various new tasks, e.g., book summarization. However, the memory cost of the Key and Value (KV) cache in attention significantly limits the practical applications of LLMs. Recent works have explored token pruning for KV cache reduction in LLMs, relying solely on attention scores as a token importance indicator. However, our investigation into value vector norms revealed a notably non-uniform pattern questioning their reliance only on attention scores. Inspired by this, we propose a new method: Value-Aware Token Pruning (VATP) which uses both attention scores and the $ \ell_{1} $ norm of value vectors to evaluate token importance. Extensive experiments on LLaMA2-7B-chat and Vicuna-v1.5-7B across 16 LongBench tasks demonstrate VATP's superior performance.


Explicit Formulae to Interchangeably use Hyperplanes and Hyperballs using Inversive Geometry

arXiv.org Machine Learning

Many algorithms require discriminative boundaries, such as separating hyperplanes or hyperballs, or are specifically designed to work on spherical data. By applying inversive geometry, we show that the two discriminative boundaries can be used interchangeably, and that general Euclidean data can be transformed into spherical data, whenever a change in point distances is acceptable. We provide explicit formulae to embed general Euclidean data into spherical data and to unembed it back. We further show a duality between hyperspherical caps, i.e., the volume created by a separating hyperplane on spherical data, and hyperballs and provide explicit formulae to map between the two. We further provide equations to translate inner products and Euclidean distances between the two spaces, to avoid explicit embedding and unembedding. We also provide a method to enforce projections of the general Euclidean space onto hemi-hyperspheres and propose an intrinsic dimensionality based method to obtain "all-purpose" parameters. To show the usefulness of the cap-ball-duality, we discuss example applications in machine learning and vector similarity search.


Uniform Convergence of Deep Neural Networks with Lipschitz Continuous Activation Functions and Variable Widths

arXiv.org Artificial Intelligence

We consider deep neural networks with a Lipschitz continuous activation function and with weight matrices of variable widths. We establish a uniform convergence analysis framework in which sufficient conditions on weight matrices and bias vectors together with the Lipschitz constant are provided to ensure uniform convergence of the deep neural networks to a meaningful function as the number of their layers tends to infinity. In the framework, special results on uniform convergence of deep neural networks with a fixed width, bounded widths and unbounded widths are presented. In particular, as convolutional neural networks are special deep neural networks with weight matrices of increasing widths, we put forward conditions on the mask sequence which lead to uniform convergence of resulting convolutional neural networks. The Lipschitz continuity assumption on the activation functions allows us to include in our theory most of commonly used activation functions in applications.


Vector Norms in Machine Learning

#artificialintelligence

All norm functions originate from a standard equation of Norm, known as the p-norm. For different values of the parameter p (p should be a real number greater than or equal to 1), we obtain a different norm function. This takes an n-dimensional vector x and raises each element to its p-th power. Then, we sum all the obtained elements and take the p-th root to get the p-norm of the vector, also known as its magnitude. Now, with different values of the parameter p, we will obtain a different norm function.


Is L2-Norm = Euclidean Distance?

#artificialintelligence

One of the concepts that can be a little confusing is the difference between Norms and Distances in Machine Learning. When do you call it an L2 Norm or euclidean distance? Today let's clarify this forever. Let's say we have a 2D vector A. The distance of vector A from the origin is called the norm of the vector A. As you can see, this is how we represent a vector in 2D and the distance from the origin to vector A is called the Norm of Vector A. This distance can be calculated using various methods such as Euclidean distance, Manhattan distance, etc. Let's calculate the distance of Vector A from the origin using Euclidean distance, this is how it will look like for 2D. Vector Norm using Euclidean distance is also called L2-Norm.


Exploding Gradients in Neural Networks

#artificialintelligence

Get exclusive access to writing opportunities and advice in our community Discord. Exploding Gradients in Neural Networks is the way and scale calculated during the training of a neural network. It is used to keep informed of the network weights in the right path and by the right amount. Exploding Gradients may collect during an update and outcome in very big gradients in deep networks or recurrent neural networks. The standards of weights may develop as bulky as to overflow and result in NaN values at a risky.


Error Bounds for Generalized Group Sparsity

arXiv.org Machine Learning

In high-dimensional statistical inference, sparsity regularizations have shown advantages in consistency and convergence rates for coefficient estimation. We consider a generalized version of Sparse-Group Lasso which captures both element-wise sparsity and group-wise sparsity simultaneously. We state one universal theorem which is proved to obtain results on consistency and convergence rates for different forms of double sparsity regularization. The universality of the results lies in an generalization of various convergence rates for single regularization cases such as LASSO and group LASSO and also double regularization cases such as sparse-group LASSO. Our analysis identifies a generalized norm of $\epsilon$-norm, which provides a dual formulation for our double sparsity regularization.


H2O.ai Prague Meetup Number 4

#artificialintelligence

This meetup was recorded in Prague on September 19. Talk 1: Customized Loss Function in Gradient Boosting Machine by Veronika Maurerova About Veronika: * Software Engineer at H2O.ai * https://twitter.com/MaureVer Talk 3: General pipeline for Computer Vision problems by Yauhen Babakhin In this talk, we will consider the whole process of addressing Computer Vision problems. Proceeding to the training process accompanied by some recent methods in Deep Learning. And finishing with some practical tips and tricks that could help to increase the quality of the model.


Gentle Introduction to Vector Norms in Machine Learning - Machine Learning Mastery

@machinelearnbot

Calculating the length or magnitude of vectors is often required either directly as a regularization method in machine learning, or as part of broader vector or matrix operations. In this tutorial, you will discover the different ways to calculate vector lengths or magnitudes, called the vector norm. Gentle Introduction to Vector Norms in Machine Learning Photo by Cosimo, some rights reserved. Take my free 7-day email crash course now (with sample code). Click to sign-up and also get a free PDF Ebook version of the course.