Goto

Collaborating Authors

 cross-entropy loss function


GrokAlign: Geometric Characterisation and Acceleration of Grokking

arXiv.org Machine Learning

A key challenge for the machine learning community is to understand and accelerate the training dynamics of deep networks that lead to delayed generalisation and emergent robustness to input perturbations, also known as grokking. Prior work has associated phenomena like delayed generalisation with the transition of a deep network from a linear to a feature learning regime, and emergent robustness with changes to the network's functional geometry, in particular the arrangement of the so-called linear regions in deep networks employing continuous piecewise affine nonlinearities. Here, we explain how grokking is realised in the Jacobian of a deep network and demonstrate that aligning a network's Jacobians with the training data (in the sense of cosine similarity) ensures grokking under a low-rank Jacobian assumption. Our results provide a strong theoretical motivation for the use of Jacobian regularisation in optimizing deep networks -- a method we introduce as GrokAlign -- which we show empirically to induce grokking much sooner than more conventional regularizers like weight decay. Moreover, we introduce centroid alignment as a tractable and interpretable simplification of Jacobian alignment that effectively identifies and tracks the stages of deep network training dynamics. Accompanying webpage (https://thomaswalker1.github.io/blog/grokalign.html) and code (https://github.com/ThomasWalker1/grokalign).


Divergence of Empirical Neural Tangent Kernel in Classification Problems

arXiv.org Machine Learning

This paper demonstrates that in classification problems, fully connected neural networks (FCNs) and residual neural networks (ResNets) cannot be approximated by kernel logistic regression based on the Neural Tangent Kernel (NTK) under overtraining (i.e., when training time approaches infinity). Specifically, when using the cross-entropy loss, regardless of how large the network width is (as long as it is finite), the empirical NTK diverges from the NTK on the training samples as training time increases. To establish this result, we first demonstrate the strictly positive definiteness of the NTKs for multi-layer FCNs and ResNets. Then, we prove that during training, % with the cross-entropy loss, the neural network parameters diverge if the smallest eigenvalue of the empirical NTK matrix (Gram matrix) with respect to training samples is bounded below by a positive constant. This behavior contrasts sharply with the lazy training regime commonly observed in regression problems. Consequently, using a proof by contradiction, we show that the empirical NTK does not uniformly converge to the NTK across all times on the training samples as the network width increases. We validate our theoretical results through experiments on both synthetic data and the MNIST classification task. This finding implies that NTK theory is not applicable in this context, with significant theoretical implications for understanding neural networks in classification problems.


Graph Network Models To Detect Illicit Transactions In Block Chain

arXiv.org Artificial Intelligence

The use of cryptocurrencies has led to an increase in illicit activities such as money laundering, with traditional rule-based approaches becoming less effective in detecting and preventing such activities. In this paper, we propose a novel approach to tackling this problem by applying graph attention networks with residual network-like architecture (GAT-ResNet) to detect illicit transactions related to anti-money laundering/combating the financing of terrorism (AML/CFT) in blockchains. We train various models on the Elliptic Bitcoin Transaction dataset, implementing logistic regression, Random Forest, XGBoost, GCN, GAT, and our proposed GAT-ResNet model. Our results demonstrate that the GAT-ResNet model has a potential to outperform the existing graph network models in terms of accuracy, reliability and scalability. Our research sheds light on the potential of graph related machine learning models to improve efforts to combat financial crime and lays the foundation for further research in this area.


Cross-Entropy Loss Functions: Theoretical Analysis and Applications

arXiv.org Artificial Intelligence

Cross-entropy is a widely used loss function in applications. It coincides with the logistic loss applied to the outputs of a neural network, when the softmax is used. But, what guarantees can we rely on when using cross-entropy as a surrogate loss? We present a theoretical analysis of a broad family of loss functions, comp-sum losses, that includes cross-entropy (or logistic loss), generalized cross-entropy, the mean absolute error and other cross-entropy-like loss functions. We give the first $H$-consistency bounds for these loss functions. These are non-asymptotic guarantees that upper bound the zero-one loss estimation error in terms of the estimation error of a surrogate loss, for the specific hypothesis set $H$ used. We further show that our bounds are tight. These bounds depend on quantities called minimizability gaps. To make them more explicit, we give a specific analysis of these gaps for comp-sum losses. We also introduce a new family of loss functions, smooth adversarial comp-sum losses, that are derived from their comp-sum counterparts by adding in a related smooth term. We show that these loss functions are beneficial in the adversarial setting by proving that they admit $H$-consistency bounds. This leads to new adversarial robustness algorithms that consist of minimizing a regularized smooth adversarial comp-sum loss. While our main purpose is a theoretical analysis, we also present an extensive empirical analysis comparing comp-sum losses. We further report the results of a series of experiments demonstrating that our adversarial robustness algorithms outperform the current state-of-the-art, while also achieving a superior non-adversarial accuracy.


Machine Learning with Python: Logistic Regression for Binary Classification - Pierian Training

#artificialintelligence

Logistic Regression is a statistical method used for binary classification problems, where the goal is to predict the probability of an event occurring or not. It is a popular algorithm in machine learning, particularly in the field of supervised learning. In this blog post, we will explore the fundamentals of logistic regression and how it can be used to solve binary classification problems. We will also provide Python code examples to help you understand and implement this powerful algorithm in your own projects. Whether you're new to machine learning or an experienced practitioner, this post will provide valuable insights into logistic regression and its applications. For example, a logistic regression model could be built using patient data such as age, gender, family history, and lifestyle factors to predict whether or not a patient is at high risk for developing heart disease.


Causality Detection using Multiple Annotation Decisions

arXiv.org Artificial Intelligence

The paper describes the work that has been submitted to the 5th workshop on Challenges and Applications of Automated Extraction of socio-political events from text (CASE 2022). The work is associated with Subtask 1 of Shared Task 3 that aims to detect causality in protest news corpus. The authors used different large language models with customized cross-entropy loss functions that exploit annotation information. The experiments showed that bert-based-uncased with refined cross-entropy outperformed the others, achieving a F1 score of 0.8501 on the Causal News Corpus dataset.


Logistic Regression

#artificialintelligence

What do you do when you see a "You have won a lottery" email? You prefer to report it as'spam' rather than ignoring it. The above image gives an overview of spam filtering. Plenty of emails arrive every day. Some of them go to the spam folder while the rest remain in the primary inbox. Also, emails get classified as primary, social, promotion, updates, forum.


A Quick Guide to Cross-Entropy Loss Function

#artificialintelligence

One of the most prominent tasks at which Machine Learning has been historically very good at is classifying items (e.g. Especially in recent years, advancements in the hardware capable of executing mathematical models such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs, LSTMs) made it possible to perform a quantum leap (sometimes literally) in performance. However, defining a model is only half of the story. To find the best parameters to perform such a task, it is also necessary to define a cost or loss function that captures the essence of what we want to optimize, and execute some form of gradient descent to reach a suitable set of parameters. To begin with, we need to define a statistical framework that describes our problem.


NLP using Deep Learning Tutorials : Understand Loss Function

#artificialintelligence

This article is a part of a series that I'm writing, and where I will try to address the topic of using Deep Learning in NLP. First of all, I was writing an article for an example of text classification using a perceptron, but I was thinking that will be better to review some basics before, as activation and loss functions. Loss function also called the objective function, is one of the main bricks in supervised machine learning algorithm which is based on labeled data. A loss function guides the training algorithm to update parameters in the right way. In a much simple definition, a loss function takes a truth (y) and a prediction (ŷ) as input and gives a score of real value number. This value indicates how much the prediction is close to the truth.


Vulnerability Under Adversarial Machine Learning: Bias or Variance?

arXiv.org Machine Learning

Prior studies have unveiled the vulnerability of the deep neural networks in the context of adversarial machine learning, leading to great recent attention into this area. One interesting question that has yet to be fully explored is the bias-variance relationship of adversarial machine learning, which can potentially provide deeper insights into this behaviour. The notion of bias and variance is one of the main approaches to analyze and evaluate the generalization and reliability of a machine learning model. Although it has been extensively used in other machine learning models, it is not well explored in the field of deep learning and it is even less explored in the area of adversarial machine learning. In this study, we investigate the effect of adversarial machine learning on the bias and variance of a trained deep neural network and analyze how adversarial perturbations can affect the generalization of a network. We derive the bias-variance trade-off for both classification and regression applications based on two main loss functions: (i) mean squared error (MSE), and (ii) cross-entropy. Furthermore, we perform quantitative analysis with both simulated and real data to empirically evaluate consistency with the derived bias-variance tradeoffs. Our analysis sheds light on why the deep neural networks have poor performance under adversarial perturbation from a bias-variance point of view and how this type of perturbation would change the performance of a network. Moreover, given these new theoretical findings, we introduce a new adversarial machine learning algorithm with lower computational complexity than well-known adversarial machine learning strategies (e.g., PGD) while providing a high success rate in fooling deep neural networks in lower perturbation magnitudes.