AITopics | cross-entropy loss function

Collaborating Authors

cross-entropy loss function

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GrokAlign: Geometric Characterisation and Acceleration of Grokking

Walker, Thomas, Humayun, Ahmed Imtiaz, Balestriero, Randall, Baraniuk, Richard

arXiv.org Machine LearningAug-1-2025

A key challenge for the machine learning community is to understand and accelerate the training dynamics of deep networks that lead to delayed generalisation and emergent robustness to input perturbations, also known as grokking. Prior work has associated phenomena like delayed generalisation with the transition of a deep network from a linear to a feature learning regime, and emergent robustness with changes to the network's functional geometry, in particular the arrangement of the so-called linear regions in deep networks employing continuous piecewise affine nonlinearities. Here, we explain how grokking is realised in the Jacobian of a deep network and demonstrate that aligning a network's Jacobians with the training data (in the sense of cosine similarity) ensures grokking under a low-rank Jacobian assumption. Our results provide a strong theoretical motivation for the use of Jacobian regularisation in optimizing deep networks -- a method we introduce as GrokAlign -- which we show empirically to induce grokking much sooner than more conventional regularizers like weight decay. Moreover, we introduce centroid alignment as a tractable and interpretable simplification of Jacobian alignment that effectively identifies and tracks the stages of deep network training dynamics. Accompanying webpage (https://thomaswalker1.github.io/blog/grokalign.html) and code (https://github.com/ThomasWalker1/grokalign).

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

2506.12284

Country: North America > Canada > Ontario > Toronto (0.28)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Divergence of Empirical Neural Tangent Kernel in Classification Problems

Yu, Zixiong, Tian, Songtao, Chen, Guhan

arXiv.org Machine LearningApr-15-2025

This paper demonstrates that in classification problems, fully connected neural networks (FCNs) and residual neural networks (ResNets) cannot be approximated by kernel logistic regression based on the Neural Tangent Kernel (NTK) under overtraining (i.e., when training time approaches infinity). Specifically, when using the cross-entropy loss, regardless of how large the network width is (as long as it is finite), the empirical NTK diverges from the NTK on the training samples as training time increases. To establish this result, we first demonstrate the strictly positive definiteness of the NTKs for multi-layer FCNs and ResNets. Then, we prove that during training, % with the cross-entropy loss, the neural network parameters diverge if the smallest eigenvalue of the empirical NTK matrix (Gram matrix) with respect to training samples is bounded below by a positive constant. This behavior contrasts sharply with the lazy training regime commonly observed in regression problems. Consequently, using a proof by contradiction, we show that the empirical NTK does not uniformly converge to the NTK across all times on the training samples as the network width increases. We validate our theoretical results through experiments on both synthetic data and the MNIST classification task. This finding implies that NTK theory is not applicable in this context, with significant theoretical implications for understanding neural networks in classification problems.

artificial intelligence, machine learning, neural network, (16 more...)

arXiv.org Machine Learning

2504.1113

Country:

Asia > China > Beijing > Beijing (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Graph Network Models To Detect Illicit Transactions In Block Chain

Adloori, Hrushyang, Dasanapu, Vaishnavi, Mergu, Abhijith Chandra

arXiv.org Artificial IntelligenceSep-23-2024

The use of cryptocurrencies has led to an increase in illicit activities such as money laundering, with traditional rule-based approaches becoming less effective in detecting and preventing such activities. In this paper, we propose a novel approach to tackling this problem by applying graph attention networks with residual network-like architecture (GAT-ResNet) to detect illicit transactions related to anti-money laundering/combating the financing of terrorism (AML/CFT) in blockchains. We train various models on the Elliptic Bitcoin Transaction dataset, implementing logistic regression, Random Forest, XGBoost, GCN, GAT, and our proposed GAT-ResNet model. Our results demonstrate that the GAT-ResNet model has a potential to outperform the existing graph network models in terms of accuracy, reliability and scalability. Our research sheds light on the potential of graph related machine learning models to improve efforts to combat financial crime and lays the foundation for further research in this area.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2410.0715

Country:

North America > United States > Florida > Hillsborough County > University (0.04)
North America > United States > Alaska > Anchorage Municipality > Anchorage (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Law Enforcement & Public Safety > Fraud (1.00)
Banking & Finance > Trading (1.00)

Technology:

Information Technology > e-Commerce > Financial Technology (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.86)
(2 more...)

Add feedback

Cross-Entropy Loss Functions: Theoretical Analysis and Applications

Mao, Anqi, Mohri, Mehryar, Zhong, Yutao

arXiv.org Artificial IntelligenceJun-19-2023

Cross-entropy is a widely used loss function in applications. It coincides with the logistic loss applied to the outputs of a neural network, when the softmax is used. But, what guarantees can we rely on when using cross-entropy as a surrogate loss? We present a theoretical analysis of a broad family of loss functions, comp-sum losses, that includes cross-entropy (or logistic loss), generalized cross-entropy, the mean absolute error and other cross-entropy-like loss functions. We give the first $H$-consistency bounds for these loss functions. These are non-asymptotic guarantees that upper bound the zero-one loss estimation error in terms of the estimation error of a surrogate loss, for the specific hypothesis set $H$ used. We further show that our bounds are tight. These bounds depend on quantities called minimizability gaps. To make them more explicit, we give a specific analysis of these gaps for comp-sum losses. We also introduce a new family of loss functions, smooth adversarial comp-sum losses, that are derived from their comp-sum counterparts by adding in a related smooth term. We show that these loss functions are beneficial in the adversarial setting by proving that they admit $H$-consistency bounds. This leads to new adversarial robustness algorithms that consist of minimizing a regularized smooth adversarial comp-sum loss. While our main purpose is a theoretical analysis, we also present an extensive empirical analysis comparing comp-sum losses. We further report the results of a series of experiments demonstrating that our adversarial robustness algorithms outperform the current state-of-the-art, while also achieving a superior non-adversarial accuracy.

artificial intelligence, comp, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2304.07288

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Machine Learning with Python: Logistic Regression for Binary Classification - Pierian Training

#artificialintelligenceApr-8-2023, 00:56:00 GMT

Logistic Regression is a statistical method used for binary classification problems, where the goal is to predict the probability of an event occurring or not. It is a popular algorithm in machine learning, particularly in the field of supervised learning. In this blog post, we will explore the fundamentals of logistic regression and how it can be used to solve binary classification problems. We will also provide Python code examples to help you understand and implement this powerful algorithm in your own projects. Whether you're new to machine learning or an experienced practitioner, this post will provide valuable insights into logistic regression and its applications. For example, a logistic regression model could be built using patient data such as age, gender, family history, and lifestyle factors to predict whether or not a patient is at high risk for developing heart disease.

binary classification problem, logistic regression, regression, (12 more...)

#artificialintelligence

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Causality Detection using Multiple Annotation Decisions

Nguyen, Quynh Anh, Mitra, Arka

arXiv.org Artificial IntelligenceDec-1-2022

The paper describes the work that has been submitted to the 5th workshop on Challenges and Applications of Automated Extraction of socio-political events from text (CASE 2022). The work is associated with Subtask 1 of Shared Task 3 that aims to detect causality in protest news corpus. The authors used different large language models with customized cross-entropy loss functions that exploit annotation information. The experiments showed that bert-based-uncased with refined cross-entropy outperformed the others, achieving a F1 score of 0.8501 on the Causal News Corpus dataset.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2210.14852

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China (0.05)
Europe > Switzerland > Zürich > Zürich (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Logistic Regression

#artificialintelligenceJul-12-2021, 09:35:12 GMT

What do you do when you see a "You have won a lottery" email? You prefer to report it as'spam' rather than ignoring it. The above image gives an overview of spam filtering. Plenty of emails arrive every day. Some of them go to the spam folder while the rest remain in the primary inbox. Also, emails get classified as primary, social, promotion, updates, forum.

boundary, decision boundary, logistic regression, (16 more...)

#artificialintelligence

Genre:

Research Report > New Finding (0.51)
Research Report > Experimental Study (0.51)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.78)

Add feedback

A Quick Guide to Cross-Entropy Loss Function

#artificialintelligenceJun-8-2021, 02:40:50 GMT

One of the most prominent tasks at which Machine Learning has been historically very good at is classifying items (e.g. Especially in recent years, advancements in the hardware capable of executing mathematical models such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs, LSTMs) made it possible to perform a quantum leap (sometimes literally) in performance. However, defining a model is only half of the story. To find the best parameters to perform such a task, it is also necessary to define a cost or loss function that captures the essence of what we want to optimize, and execute some form of gradient descent to reach a suitable set of parameters. To begin with, we need to define a statistical framework that describes our problem.

cross-entropy loss function, probability, vector, (15 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

NLP using Deep Learning Tutorials : Understand Loss Function

#artificialintelligenceMar-3-2021, 21:39:32 GMT

This article is a part of a series that I'm writing, and where I will try to address the topic of using Deep Learning in NLP. First of all, I was writing an article for an example of text classification using a perceptron, but I was thinking that will be better to review some basics before, as activation and loss functions. Loss function also called the objective function, is one of the main bricks in supervised machine learning algorithm which is based on labeled data. A loss function guides the training algorithm to update parameters in the right way. In a much simple definition, a loss function takes a truth (y) and a prediction (ŷ) as input and gives a score of real value number. This value indicates how much the prediction is close to the truth.

loss function, prediction, vector, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.79)

Add feedback

Vulnerability Under Adversarial Machine Learning: Bias or Variance?

Aboutalebi, Hossein, Shafiee, Mohammad Javad, Karg, Michelle, Scharfenberger, Christian, Wong, Alexander

arXiv.org Machine LearningJul-31-2020

Prior studies have unveiled the vulnerability of the deep neural networks in the context of adversarial machine learning, leading to great recent attention into this area. One interesting question that has yet to be fully explored is the bias-variance relationship of adversarial machine learning, which can potentially provide deeper insights into this behaviour. The notion of bias and variance is one of the main approaches to analyze and evaluate the generalization and reliability of a machine learning model. Although it has been extensively used in other machine learning models, it is not well explored in the field of deep learning and it is even less explored in the area of adversarial machine learning. In this study, we investigate the effect of adversarial machine learning on the bias and variance of a trained deep neural network and analyze how adversarial perturbations can affect the generalization of a network. We derive the bias-variance trade-off for both classification and regression applications based on two main loss functions: (i) mean squared error (MSE), and (ii) cross-entropy. Furthermore, we perform quantitative analysis with both simulated and real data to empirically evaluate consistency with the derived bias-variance tradeoffs. Our analysis sheds light on why the deep neural networks have poor performance under adversarial perturbation from a bias-variance point of view and how this type of perturbation would change the performance of a network. Moreover, given these new theoretical findings, we introduce a new adversarial machine learning algorithm with lower computational complexity than well-known adversarial machine learning strategies (e.g., PGD) while providing a high success rate in fooling deep neural networks in lower perturbation magnitudes.

artificial intelligence, machine learning, perturbation, (18 more...)

arXiv.org Machine Learning

2008.00138

Country:

North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
Europe > Germany (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback