AITopics | distilbert

Collaborating Authors

distilbert

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

7a6a74cbe87bc60030a4bd041dd47b78-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-19-2026, 03:17:53 GMT

baseline, bert, monolingual data, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Shaving WeightswithOccam'sRazor: BayesianSparsificationforNeuralNetworks usingtheMarginalLikelihood

Neural Information Processing SystemsFeb-10-2026, 06:43:17 GMT

Whilemuchwork has focused on different weight pruning criteria, the overallsparsifiabilityofthe network, i.e., its capacity to be pruned without quality loss, has often been overlooked.

machine learning, natural language, pruning, (19 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > Wisconsin (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.95)

Add feedback

753fec797a22f71baf7106833734fdf3-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 20:47:36 GMT

correlation, entropy, memorization, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.30)

Add feedback

Using Text-Based Life Trajectories from Swedish Register Data to Predict Residential Mobility with Pretrained Transformers

Stark, Philipp, Sopasakis, Alexandros, Hall, Ola, Grillitsch, Markus

arXiv.org Artificial IntelligenceDec-10-2025

We transform large-scale Swedish register data into textual life trajectories to address two long-standing challenges in data analysis: high cardinality of categorical variables and inconsistencies in coding schemes over time. Leveraging this uniquely comprehensive population register, we convert register data from 6.9 million individuals (2001-2013) into semantically rich texts and predict individuals' residential mobility in later years (2013-2017). These life trajectories combine demographic information with annual changes in residence, work, education, income, and family circumstances, allowing us to assess how effectively such sequences support longitudinal prediction. We compare multiple NLP architectures (including LSTM, DistilBERT, BERT, and Qwen) and find that sequential and transformer-based models capture temporal and semantic structure more effectively than baseline models. The results show that textualized register data preserves meaningful information about individual pathways and supports complex, scalable modeling. Because few countries maintain longitudinal microdata with comparable coverage and precision, this dataset enables analyses and methodological tests that would be difficult or impossible elsewhere, offering a rigorous testbed for developing and evaluating new sequence-modeling approaches. Overall, our findings demonstrate that combining semantically rich register data with modern language models can substantially advance longitudinal analysis in social sciences.

large language model, machine learning, trajectory, (18 more...)

arXiv.org Artificial Intelligence

2512.07865

Country: Europe > Sweden (0.31)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (1.00)
Education (0.93)
Banking & Finance > Economy (0.69)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.75)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)

Add feedback

Bangla Hate Speech Classification with Fine-tuned Transformer Models

Jafari, Yalda Keivan, Dey, Krishno

arXiv.org Artificial IntelligenceDec-3-2025

Hate speech recognition in low-resource languages remains a difficult problem due to insufficient datasets, orthographic heterogeneity, and linguistic variety. Bangla is spoken by more than 230 million people of Bangladesh and India (West Bengal). Despite the growing need for automated moderation on social media platforms, Bangla is significantly under-represented in computational resources. In this work, we study Subtask 1A and Subtask 1B of the BLP 2025 Shared Task on hate speech detection. We reproduce the official baselines (e.g., Majority, Random, Support Vector Machine) and also produce and consider Logistic Regression, Random Forest, and Decision Tree as baseline methods. We also utilized transformer-based models such as DistilBERT, BanglaBERT, m-BERT, and XLM-RoBERTa for hate speech classification. All the transformer-based models outperformed baseline methods for the subtasks, except for DistilBERT. Among the transformer-based models, BanglaBERT produces the best performance for both subtasks. Despite being smaller in size, BanglaBERT outperforms both m-BERT and XLM-RoBERTa, which suggests language-specific pre-training is very important. Our results highlight the potential and need for pre-trained language models for the low-resource Bangla language.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2512.02845

Country: Asia > India > West Bengal (0.24)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Classification of worldwide news articles by perceived quality, 2018-2024

McElroy, Connor, de Oliveira, Thiago E. A., Brogly, Chris

arXiv.org Artificial IntelligenceNov-21-2025

This study explored whether supervised machine learning and deep learning models can effectively distinguish perceived lower-quality news articles from perceived higher-quality news articles. 3 machine learning classifiers and 3 deep learning models were assessed using a newly created dataset of 1,412,272 English news articles from the Common Crawl over 2018-2024. Expert consensus ratings on 579 source websites were split at the median, creating perceived low and high-quality classes of about 706,000 articles each, with 194 linguistic features per website-level labelled article. Traditional machine learning classifiers such as the Random Forest demonstrated capable performance (0.7355 accuracy, 0.8131 ROC AUC). For deep learning, ModernBERT-large (256 context length) achieved the best performance (0.8744 accuracy; 0.9593 ROC-AUC; 0.8739 F1), followed by DistilBERT-base (512 context length) at 0.8685 accuracy and 0.9554 ROC-AUC. DistilBERT-base (256 context length) reached 0.8478 accuracy and 0.9407 ROC-AUC, while ModernBERT-base (256 context length) attained 0.8569 accuracy and 0.9470 ROC-AUC. These results suggest that the perceived quality of worldwide news articles can be effectively differentiated by traditional CPU-based machine learning classifiers and deep learning classifiers.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2511.16416

Country: North America > Canada (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Media > News (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Open Banking Foundational Model: Learning Language Representations from Few Financial Transactions

Polleti, Gustavo, Santana, Marlesson, Fontes, Eduardo

arXiv.org Artificial IntelligenceNov-18-2025

We introduced a multimodal foundational model for financial transactions that integrates both structured attributes and unstructured textual descriptions into a unified representation. By adapting masked language modeling to transaction sequences, we demonstrated that our approach not only outperforms classical feature engineering and discrete event sequence methods but is also particularly effective in data-scarce Open Banking scenarios. To our knowledge, this is the first large-scale study across thousands of financial institutions in North America, providing evidence that multimodal representations can generalize across geographies and institutions. These results highlight the potential of self-supervised models to advance financial applications ranging from fraud prevention and credit risk to customer insights

data mining, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2511.12154

Country: North America > United States > New York (0.15)

Genre: Research Report (0.83)

Industry:

Banking & Finance > Credit (0.49)
Law Enforcement & Public Safety > Fraud (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Explainable Transformer-Based Email Phishing Classification with Adversarial Robustness

P, Sajad U

arXiv.org Artificial IntelligenceNov-18-2025

Phishing and related cyber threats are becoming more varied and technologically advanced. Among these, email-based phishing remains the most dominant and persistent threat. These attacks exploit human vulnerabilities to disseminate malware or gain unauthorized access to sensitive information. Deep learning (DL) models, particularly transformer-based models, have significantly enhanced phishing mitigation through their contextual understanding of language. However, some recent threats, specifically Artificial Intelligence (AI)-generated phishing attacks, are reducing the overall system resilience of phishing detectors. In response, adversarial training has shown promise against AI-generated phishing threats. This study presents a hybrid approach that uses DistilBERT, a smaller, faster, and lighter version of the BERT transformer model for email classification. Robustness against text-based adversarial perturbations is reinforced using Fast Gradient Method (FGM) adversarial training. Furthermore, the framework integrates the LIME Explainable AI (XAI) technique to enhance the transparency of the DistilBERT architecture. The framework also uses the Flan-T5-small language model from Hugging Face to generate plain-language security narrative explanations for end-users. This combined approach ensures precise phishing classification while providing easily understandable justifications for the model's decisions.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2511.12085

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Lossy Horizon: Error-Bounded Predictive Coding for Lossy Text Compression (Episode I)

Aghanya, Nnamdi, Li, Jun, Wang, Kewei

arXiv.org Artificial IntelligenceOct-28-2025

Large Language Models (LLMs) can achieve near-optimal lossless compression by acting as powerful probability models. We investigate their use in the lossy domain, where reconstruction fidelity is traded for higher compression ratios. This paper introduces Error-Bounded Predictive Coding (EPC), a lossy text codec that leverages a Masked Language Model (MLM) as a decompressor. Instead of storing a subset of original tokens, EPC allows the model to predict masked content and stores minimal, rank-based corrections only when the model's top prediction is incorrect. This creates a residual channel that offers continuous rate-distortion control. We compare EPC to a simpler Predictive Masking (PM) baseline and a transform-based Vector Quantisation with a Residual Patch (VQ+RE) approach. Through an evaluation that includes precise bit accounting and rate-distortion analysis, we demonstrate that EPC consistently dominates PM, offering superior fidelity at a significantly lower bit rate by more efficiently utilising the model's intrinsic knowledge.

error-bounded predictive coding, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.22207

Genre: Research Report (0.42)

Industry: Law > Litigation (0.62)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.55)

Add feedback

Filters

Collaborating Authors

distilbert

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

7a6a74cbe87bc60030a4bd041dd47b78-AuthorFeedback.pdf

Differ

Shaving WeightswithOccam'sRazor: BayesianSparsificationforNeuralNetworks usingtheMarginalLikelihood

753fec797a22f71baf7106833734fdf3-Supplemental-Conference.pdf

Using Text-Based Life Trajectories from Swedish Register Data to Predict Residential Mobility with Pretrained Transformers

Bangla Hate Speech Classification with Fine-tuned Transformer Models

Classification of worldwide news articles by perceived quality, 2018-2024

Open Banking Foundational Model: Learning Language Representations from Few Financial Transactions

Explainable Transformer-Based Email Phishing Classification with Adversarial Robustness

The Lossy Horizon: Error-Bounded Predictive Coding for Lossy Text Compression (Episode I)