albert model
Identification of Potentially Misclassified Crash Narratives using Machine Learning (ML) and Deep Learning (DL)
Bhagat, Sudesh, Shihab, Ibne Farabi, Wood, Jonathan
This research investigates the efficacy of machine learning (ML) and deep learning (DL) methods in detecting misclassified intersection-related crashes in police-reported narratives. Using 2019 crash data from the Iowa Department of Transportation, we implemented and compared a comprehensive set of models, including Support Vector Machine (SVM), XGBoost, BERT Sentence Embeddings, BERT Word Embeddings, and Albert Model. Model performance was systematically validated against expert reviews of potentially misclassified narratives, providing a rigorous assessment of classification accuracy. Results demonstrated that while traditional ML methods exhibited superior overall performance compared to some DL approaches, the Albert Model achieved the highest agreement with expert classifications (73% with Expert 1) and original tabular data (58%). Statistical analysis revealed that the Albert Model maintained performance levels similar to inter-expert consistency rates, significantly outperforming other approaches, particularly on ambiguous narratives. This work addresses a critical gap in transportation safety research through multi-modal integration analysis, which achieved a 54.2% reduction in error rates by combining narrative text with structured crash data. We conclude that hybrid approaches combining automated classification with targeted expert review offer a practical methodology for improving crash data quality, with substantial implications for transportation safety management and policy development.
- North America > United States > Minnesota (0.04)
- North America > United States > Iowa > Story County > Ames (0.04)
- Europe > Switzerland (0.04)
- (9 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Overview (0.93)
- Transportation > Infrastructure & Services (1.00)
- Transportation > Ground > Road (1.00)
- Health & Medicine (1.00)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Exploring Internal Numeracy in Language Models: A Case Study on ALBERT
Wennberg, Ulme, Henter, Gustav Eje
It has been found that Transformer-based language models have the ability to perform basic quantitative reasoning. In this paper, we propose a method for studying how these models internally represent numerical data, and use our proposal to analyze the ALBERT family of language models. Specifically, we extract the learned embeddings these models use to represent tokens that correspond to numbers and ordinals, and subject these embeddings to Principal Component Analysis (PCA). PCA results reveal that ALBERT models of different sizes, trained and initialized separately, consistently learn to use the axes of greatest variation to represent the approximate ordering of various numerical concepts. Numerals and their textual counterparts are represented in separate clusters, but increase along the same direction in 2D space. Our findings illustrate that language models, trained purely to model text, can intuit basic mathematical concepts, opening avenues for NLP applications that intersect with quantitative reasoning.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.36)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)
mALBERT: Is a Compact Multilingual BERT Model Still Worth It?
Servan, Christophe, Ghannay, Sahar, Rosset, Sophie
Within the current trend of Pretained Language Models (PLM), emerge more and more criticisms about the ethical and ecological impact of such models. In this article, considering these critical remarks, we propose to focus on smaller models, such as compact models like ALBERT, which are more ecologically virtuous than these PLM. However, PLMs enable huge breakthroughs in Natural Language Processing tasks, such as Spoken and Natural Language Understanding, classification, Question-Answering tasks. PLMs also have the advantage of being multilingual, and, as far as we know, a multilingual version of compact ALBERT models does not exist. Considering these facts, we propose the free release of the first version of a multilingual compact ALBERT model, pre-trained using Wikipedia data, which complies with the ethical aspect of such a language model. We also evaluate the model against classical multilingual PLMs in classical NLP tasks. Finally, this paper proposes a rare study on the subword tokenization impact on language performances.
- Europe > France (0.14)
- North America > United States > Florida > Pinellas County > St. Petersburg (0.05)
- Europe > Czechia > Prague (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.71)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
An exploratory study on automatic identification of assumptions in the development of deep learning frameworks
Yang, Chen, Liang, Peng, Ma, Zinan
Stakeholders constantly make assumptions in the development of deep learning (DL) frameworks. These assumptions are related to various types of software artifacts (e.g., requirements, design decisions, and technical debt) and can turn out to be invalid, leading to system failures. Existing approaches and tools for assumption management usually depend on manual identification of assumptions. However, assumptions are scattered in various sources (e.g., code comments, commits, pull requests, and issues) of DL framework development, and manually identifying assumptions has high costs (e.g., time and resources). To overcome the issues of manually identifying assumptions in DL framework development, we constructed a new and largest dataset (i.e., AssuEval) of assumptions collected from the TensorFlow and Keras repositories on GitHub; explored the performance of seven traditional machine learning models (e.g., Support Vector Machine, Classification and Regression Trees), a popular DL model (i.e., ALBERT), and a large language model (i.e., ChatGPT) of identifying assumptions on the AssuEval dataset. The experiment results show that: ALBERT achieves the best performance (f1-score: 0.9584) of identifying assumptions on the AssuEval dataset, which is much better than the other models (the 2nd best f1-score is 0.6211, achieved by ChatGPT). Though ChatGPT is the most popular large language model, we do not recommend using it to identify assumptions in DL framework development because of its low performance on the task. Fine-tuning ChatGPT specifically for assumption identification could improve the performance. This study provides researchers with the largest dataset of assumptions for further research (e.g., assumption classification, evaluation, and reasoning) and helps practitioners better understand assumptions and how to manage them in their projects.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Asia > China > Hubei Province > Wuhan (0.04)
- (22 more...)
- Research Report > New Finding (0.89)
- Research Report > Experimental Study (0.88)
- Transportation > Air (0.93)
- Aerospace & Defense (0.68)
Ensemble ALBERT on SQuAD 2.0
Li, Shilun, Li, Renee, Peng, Veronica
Machine question answering is an essential yet challenging task in natural language processing. Recently, Pre-trained Contextual Embeddings (PCE) models like Bidirectional Encoder Representations from Transformers (BERT) and A Lite BERT (ALBERT) have attracted lots of attention due to their great performance in a wide range of NLP tasks. In our Paper, we utilized the fine-tuned ALBERT models and implemented combinations of additional layers (e.g. attention layer, RNN layer) on top of them to improve model performance on Stanford Question Answering Dataset (SQuAD 2.0). We implemented four different models with different layers on top of ALBERT-base model, and two other models based on ALBERT-xlarge and ALBERT-xxlarge. We compared their performance to our baseline model ALBERT-base-v2 + ALBERT-SQuAD-out with details. Our best-performing individual model is ALBERT-xxlarge + ALBERT-SQuAD-out, which achieved an F1 score of 88.435 on the dev set. Furthermore, we have implemented three different ensemble algorithms to boost overall performance. By passing in several best-performing models' results into our weighted voting ensemble algorithm, our final result ranks first on the Stanford CS224N Test PCE SQuAD Leaderboard with F1 = 90.123.
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
- Africa > Middle East > Egypt (0.05)
- South America > Colombia (0.04)
- Europe > France > Occitanie > Hérault > Montpellier (0.04)
A Semi-supervised Multi-task Learning Approach to Classify Customer Contact Intents
Dong, Li, Spencer, Matthew C., Biagi, Amir
In the area of customer support, understanding customers' intents is a crucial step. Machine learning plays a vital role in this type of intent classification. In reality, it is typical to collect confirmation from customer support representatives (CSRs) regarding the intent prediction, though it can unnecessarily incur prohibitive cost to ask CSRs to assign existing or new intents to the mis-classified cases. Apart from the confirmed cases with and without intent labels, there can be a number of cases with no human curation. This data composition (Positives + Unlabeled + multiclass Negatives) creates unique challenges for model development. In response to that, we propose a semi-supervised multi-task learning paradigm. In this manuscript, we share our experience in building text-based intent classification models for a customer support service on an E-commerce website. We improve the performance significantly by evolving the model from multiclass classification to semi-supervised multi-task learning by leveraging the negative cases, domain- and task-adaptively pretrained ALBERT on customer contact texts, and a number of un-curated data with no labels. In the evaluation, the final model boosts the average AUC ROC by almost 20 points compared to the baseline finetuned multiclass classification ALBERT model.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China > Hong Kong (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (16 more...)
The Case for Translation-Invariant Self-Attention in Transformer-Based Language Models
Wennberg, Ulme, Henter, Gustav Eje
Mechanisms for encoding positional information are central for transformer-based language models. In this paper, we analyze the position embeddings of existing language models, finding strong evidence of translation invariance, both for the embeddings themselves and for their effect on self-attention. The degree of translation invariance increases during training and correlates positively with model performance. Our findings lead us to propose translation-invariant self-attention (TISA), which accounts for the relative position between tokens in an interpretable fashion without needing conventional position embeddings. Our proposal has several theoretical advantages over existing position-representation approaches. Experiments show that it improves on regular ALBERT on GLUE tasks, while only adding orders of magnitude less positional parameters.
ReCAM@IITK at SemEval-2021 Task 4: BERT and ALBERT based Ensemble for Abstract Word Prediction
Mittal, Abhishek, Modi, Ashutosh
This paper describes our system for Task 4 of SemEval-2021: Reading Comprehension of Abstract Meaning (ReCAM). We participated in all subtasks where the main goal was to predict an abstract word missing from a statement. We fine-tuned the pre-trained masked language models namely BERT and ALBERT and used an Ensemble of these as our submitted system on Subtask 1 (ReCAM-Imperceptibility) and Subtask 2 (ReCAM-Nonspecificity). For Subtask 3 (ReCAM-Intersection), we submitted the ALBERT model as it gives the best results. We tried multiple approaches and found that Masked Language Modeling(MLM) based approach works the best.
- Oceania > Australia (0.04)
- Europe > United Kingdom > Scotland (0.04)
- Europe > United Kingdom > England > Leicestershire > Leicester (0.04)
AI4Bharat/indic-bert
Indic bert is a multilingual ALBERT model that exclusively covers 12 major Indian languages. It is pre-trained on our novel corpus of around 9 billion tokens and evaluated on a set of diverse tasks. Indic-bert has around 10x fewer parameters than other popular publicly available multilingual models while it also achieves a performance on-par or better than these models. We also introduce IGLUE - a set of standard evaluation tasks that can be used to measure the NLU performance of monolingual and multilingual models on Indian languages. Along with IGLUE, we also compile a list of additional evaluation tasks.
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Lan, Zhenzhong, Chen, Mingda, Goodman, Sebastian, Gimpel, Kevin, Sharma, Piyush, Soricut, Radu
A BSTRACT Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations, longer training times, and unexpected model degradation. To address these problems, we present two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT (Devlin et al., 2019). Comprehensive empirical evidence shows that our proposed methods lead to models that scale much better compared to the original BERT. We also use a self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream tasks with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on the GLUE, RACE, and SQuAD benchmarks while having fewer parameters compared to BERT -large. The code and the pretrained models are available at https://github.com/ Many nontrivial NLP tasks, including those that have limited training data, have greatly benefited from these pre-trained models. One of the most compelling signs of these breakthroughs is the evolution of machine performance on a reading comprehension task designed for middle and highschool English exams in China, the RACE test (Lai et al., 2017): the paper that originally describes the task and formulates the modeling challenge reports then state-of-the-art machine accuracy at 44. 1%; the latest published result reports their model performance at 83. 2% (Liu et al., 2019); the work we present here pushes it even higher to 89 .4%, a stunning 45 .3% Evidence from these improvements reveals that a large network is of crucial importance for achieving state-of-the-art performance (Devlin et al., 2019; Radford et al., 2019). It has become common practice to pre-train large models and distill them down to smaller ones (Sun et al., 2019; Turc et al., 2019) for real applications.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- (12 more...)