AITopics

Country: Asia > China (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.94)

Neural Information Processing SystemsDec-25-2025, 17:27:45 GMT

Understanding the Failure of Batch Normalization for Transformers in NLP

Batch Normalization (BN) is a core and prevalent technique in accelerating the training of deep neural networks and improving the generalization on Computer Vision (CV) tasks. However, it fails to defend its position in Natural Language Processing (NLP), which is dominated by Layer Normalization (LN). In this paper, we are trying to answer why BN usually performs worse than LN in NLP tasks with Transformer models. We find that the inconsistency between training and inference of BN is the leading cause that results in the failure of BN in NLP. We define Training Inference Discrepancy (TID) to quantitatively measure this inconsistency and reveal that TID can indicate BN's performance, supported by extensive experiments, including image classification, neural machine translation, language modeling, sequence labeling, and text classification tasks. We find that BN can obtain much better test performance than LN when TID keeps small through training. To suppress the explosion of TID, we propose Regularized BN (RBN) that adds a simple regularization term to narrow the gap between batch statistics and population statistics of BN. RBN improves the performance of BN consistently and outperforms or is on par with LN on 17 out of 20 settings, including ten datasets and two common variants of Transformer.

batch normalization, name change, transformer, (3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.60)

Neural Information Processing SystemsAug-19-2025, 20:04:11 GMT

Understanding the Failure of Batch Normalization for Transformers in NLP Jiaxi Wang 1, Ji Wu1,2, Lei Huang 3 1 Department of Electronic Engineering, Tsinghua University

Batch Normalization (BN) is a core and prevalent technique in accelerating the training of deep neural networks and improving the generalization on Computer Vision (CV) tasks.

artificial intelligence, machine learning, natural language, (18 more...)

Country:

Asia > China (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJul-1-2025

Attention acts to suppress goal-based conflict under high competition

Claflin, Omar

It is known that w hen multiple stimuli are present, top - down attention selectively enhances the neural signal in the visual cortex for task - relevant stimuli, but this has been tested only under conditions of minimal competition of visual attention . Here w e show during high competition, t hat is, two s timuli in a shared rec e ptive field possessing o pposing modulatory goals, top - down attention suppress es both task - relevant and irrelevant neural signals within 100 ms of stimuli onset. This non - selective engagement of top - down attentional resources serves to reduce the feedforward signal representing irrelevant stimuli . It is well established that attention modulates visual processing in extrastriate cortex (Tsotsos et al., 1995) . Strong evidence for the mutual competition theory, acting at the level of the receptive fields in the extrastriate cortex, suggests that local neuronal activity representing simultaneous stimuli result in suppression of these representations at the level of the receptive field (Moran and Desimone, 1985; Reynolds et al., 1999; Kastner and Ungerleider, 2001) .

amplitude, artificial intelligence, stimuli, (15 more...)

1610.09431

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre:

Research Report > Experimental Study (0.73)
Research Report > New Finding (0.50)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.66)

Technology: Information Technology > Artificial Intelligence (1.00)

Neural Information Processing SystemsJan-19-2025, 07:16:24 GMT

Understanding the Failure of Batch Normalization for Transformers in NLP

Batch Normalization (BN) is a core and prevalent technique in accelerating the training of deep neural networks and improving the generalization on Computer Vision (CV) tasks. However, it fails to defend its position in Natural Language Processing (NLP), which is dominated by Layer Normalization (LN). In this paper, we are trying to answer why BN usually performs worse than LN in NLP tasks with Transformer models. We find that the inconsistency between training and inference of BN is the leading cause that results in the failure of BN in NLP. We define Training Inference Discrepancy (TID) to quantitatively measure this inconsistency and reveal that TID can indicate BN's performance, supported by extensive experiments, including image classification, neural machine translation, language modeling, sequence labeling, and text classification tasks.

batch normalization, nlp, transformer, (1 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.62)

Chiariello, Francesco, Fionda, Valeria, Ielo, Antonio, Ricca, Francesco

Direct Encoding of Declare Constraints in ASP

arXiv.org Artificial IntelligenceDec-13-2024

Answer Set Programming (ASP), a well-known declarative logic programming paradigm, has recently found practical application in Process Mining. In particular, ASP has been used to model tasks involving declarative specifications of business processes. In this area, Declare stands out as the most widely adopted declarative process modeling language, offering a means to model processes through sets of constraints valid traces must satisfy, that can be expressed in Linear Temporal Logic over Finite Traces (LTLf). Existing ASP-based solutions encode Declare constraints by modeling the corresponding LTLf formula or its equivalent automaton which can be obtained using established techniques. In this paper, we introduce a novel encoding for Declare constraints that directly models their semantics as ASP rules, eliminating the need for intermediate representations. We assess the effectiveness of this novel approach on two Process Mining tasks by comparing it with alternative ASP encodings and a Python library for Declare. Under consideration in Theory and Practice of Logic Programming (TPLP).

artificial intelligence, constraint, logic & formal reasoning, (16 more...)

2412.10152

Country:

North America > United States > Oregon > Benton County > Corvallis (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Italy > Campania (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)

Gautam, Akash Kumar, Lange, Lukas, Strötgen, Jannik

Discourse-Aware In-Context Learning for Temporal Expression Normalization

arXiv.org Artificial IntelligenceApr-11-2024

Temporal expression (TE) normalization is a well-studied problem. However, the predominately used rule-based systems are highly restricted to specific settings, and upcoming machine learning approaches suffer from a lack of labeled data. In this work, we explore the feasibility of proprietary and open-source large language models (LLMs) for TE normalization using in-context learning to inject task, document, and example information into the model. We explore various sample selection strategies to retrieve the most relevant set of examples. By using a window-based prompt design approach, we can perform TE normalization across sentences, while leveraging the LLM knowledge without training the model. Our experiments show competitive results to models designed for this task. In particular, our method achieves large performance improvements for non-standard settings by dynamically including relevant examples during inference.

expression, normalization, time expression, (15 more...)

2404.07775

Country:

Europe > Germany > Saarland (0.05)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
Europe > Iceland > Capital Region > Reykjavik (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry:

Law (1.00)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Constantinou, Valentino, Ravanelli, Michela, Liu, Hamlin, Bortnik, Jacob

Deep Learning Driven Detection of Tsunami Related Internal GravityWaves: a path towards open-ocean natural hazards detection

arXiv.org Artificial IntelligenceAug-8-2023

Tsunamis can trigger internal gravity waves (IGWs) in the ionosphere, perturbing the Total Electron Content (TEC) - referred to as Traveling Ionospheric Disturbances (TIDs) that are detectable through the Global Navigation Satellite System (GNSS). The GNSS are constellations of satellites providing signals from Earth orbit - Europe's Galileo, the United States' Global Positioning System (GPS), Russia's Global'naya Navigatsionnaya Sputnikovaya Sistema (GLONASS) and China's BeiDou. The real-time detection of TIDs provides an approach for tsunami detection, enhancing early warning systems by providing open-ocean coverage in geographic areas not serviceable by buoy-based warning systems. Large volumes of the GNSS data is leveraged by deep learning, which effectively handles complex non-linear relationships across thousands of data streams. We describe a framework leveraging slant total electron content (sTEC) from the VARION (Variometric Approach for Real-Time Ionosphere Observation) algorithm by Gramian Angular Difference Fields (from Computer Vision) and Convolutional Neural Networks (CNNs) to detect TIDs in near-real-time. Historical data from the 2010 Maule, 2011 Tohoku and the 2012 Haida-Gwaii earthquakes and tsunamis are used in model training, and the later-occurring 2015 Illapel earthquake and tsunami in Chile for out-of-sample model validation. Using the experimental framework described in the paper, we achieved a 91.7% F1 score. Source code is available at: https://github.com/vc1492a/tidd. Our work represents a new frontier in detecting tsunami-driven IGWs in open-ocean, dramatically improving the potential for natural hazards detection for coastal communities.

detection, earthquake, tsunami, (15 more...)

2308.04611

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.29)
North America > Canada > British Columbia > Haida Gwaii (0.26)
Europe > Russia (0.24)
(15 more...)

Genre: Research Report (0.50)

Industry: Energy (0.69)

Technology:

Information Technology > Geographic Information Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Cauteruccio, Francesco, Terracina, Giorgio

Extended High Utility Pattern Mining: An Answer Set Programming Based Framework and Applications

arXiv.org Artificial IntelligenceMar-23-2023

Detecting sets of relevant patterns from a given dataset is an important challenge in data mining. The relevance of a pattern, also called utility in the literature, is a subjective measure and can be actually assessed from very different points of view. Rule-based languages like Answer Set Programming (ASP) seem well suited for specifying user-provided criteria to assess pattern utility in a form of constraints; moreover, declarativity of ASP allows for a very easy switch between several criteria in order to analyze the dataset from different points of view. In this paper, we make steps toward extending the notion of High Utility Pattern Mining (HUPM); in particular we introduce a new framework that allows for new classes of utility criteria not considered in the previous literature. We also show how recent extensions of ASP with external functions can support a fast and effective encoding and testing of the new framework. To demonstrate the potential of the proposed framework, we exploit it as a building block for the definition of an innovative method for predicting ICU admission for COVID-19 patients. Finally, an extensive experimental activity demonstrates both from a quantitative and a qualitative point of view the effectiveness of the proposed approach.

logic & formal reasoning, machine learning, pattern recognition, (19 more...)

2303.13191

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > North Macedonia > Skopje Statistical Region > Skopje Municipality > Skopje (0.04)
(4 more...)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.54)
Health & Medicine > Therapeutic Area > Immunology (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)

arXiv.org Artificial IntelligenceOct-11-2022

Understanding the Failure of Batch Normalization for Transformers in NLP

Wang, Jiaxi, Wu, Ji, Huang, Lei

Batch Normalization (BN) is a core and prevalent technique in accelerating the training of deep neural networks and improving the generalization on Computer Vision (CV) tasks. However, it fails to defend its position in Natural Language Processing (NLP), which is dominated by Layer Normalization (LN). In this paper, we are trying to answer why BN usually performs worse than LN in NLP tasks with Transformer models. We find that the inconsistency between training and inference of BN is the leading cause that results in the failure of BN in NLP. We define Training Inference Discrepancy (TID) to quantitatively measure this inconsistency and reveal that TID can indicate BN's performance, supported by extensive experiments, including image classification, neural machine translation, language modeling, sequence labeling, and text classification tasks. We find that BN can obtain much better test performance than LN when TID keeps small through training. To suppress the explosion of TID, we propose Regularized BN (RBN) that adds a simple regularization term to narrow the gap between batch statistics and population statistics of BN.

machine learning, natural language, tid, (20 more...)