AITopics | dropout regularization

Collaborating Authors

dropout regularization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

dfce06801e1a85d6d06f1fdd4475dacd-Paper.pdf

Neural Information Processing SystemsFeb-11-2026, 13:17:02 GMT

international conference, network structure, proc, (11 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.68)

Add feedback

SegFormer Fine-Tuning with Dropout: Advancing Hair Artifact Removal in Skin Lesion Analysis

Saad, Asif Mohammed, Mahi, Umme Niraj

arXiv.org Artificial IntelligenceSep-3-2025

Hair artifacts in dermoscopic images present significant challenges for accurate skin lesion analysis, potentially obscuring critical diagnostic features in dermatological assessments. This work introduces a fine-tuned SegFormer model augmented with dropout regularization to achieve precise hair mask segmentation. The proposed SegformerWithDropout architecture leverages the MiT-B2 encoder, pretrained on ImageNet, with an in-channel count of 3 and 2 output classes, incorporating a dropout probability of 0.3 in the segmentation head to prevent overfitting. Training is conducted on a specialized dataset of 500 dermoscopic skin lesion images with fine-grained hair mask annotations, employing 10-fold cross-validation, AdamW optimization with a learning rate of 0.001, and cross-entropy loss. Early stopping is applied based on validation loss, with a patience of 3 epochs and a maximum of 20 epochs per fold. Performance is evaluated using a comprehensive suite of metrics, including Intersection over Union (IoU), Dice coefficient, Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and Learned Perceptual Image Patch Similarity (LPIPS). Experimental results from the cross-validation demonstrate robust performance, with average Dice coefficients reaching approximately 0.96 and IoU values of 0.93, alongside favorable PSNR (around 34 dB), SSIM (0.97), and low LPIPS (0.06), highlighting the model's effectiveness in accurate hair artifact segmentation and its potential to enhance preprocessing for downstream skin cancer detection tasks.

artificial intelligence, machine learning, segmentation, (14 more...)

arXiv.org Artificial Intelligence

2509.02156

Country: Asia > Nepal (0.04)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Dermatology (1.00)
Health & Medicine > Therapeutic Area > Oncology > Skin Cancer (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.70)

Add feedback

From Measurement to Mitigation: Exploring the Transferability of Debiasing Approaches to Gender Bias in Maltese Language Models

Galea, Melanie, Borg, Claudia

arXiv.org Artificial IntelligenceJul-8-2025

The advancement of Large Language Models (LLMs) has transformed Natural Language Processing (NLP), enabling performance across diverse tasks with little task-specific training. However, LLMs remain susceptible to social biases, particularly reflecting harmful stereotypes from training data, which can disproportionately affect marginalised communities. We measure gender bias in Maltese LMs, arguing that such bias is harmful as it reinforces societal stereotypes and fails to account for gender diversity, which is especially problematic in gendered, low-resource languages. While bias evaluation and mitigation efforts have progressed for English-centric models, research on low-resourced and morphologically rich languages remains limited. This research investigates the transferability of debiasing methods to Maltese language models, focusing on BERTu and mBERTu, BERT-based monolingual and multilingual models respectively. Bias measurement and mitigation techniques from English are adapted to Maltese, using benchmarks such as CrowS-Pairs and SEAT, alongside debiasing methods Counterfactual Data Augmentation, Dropout Regularization, Auto-Debias, and GuiDebias. We also contribute to future work in the study of gender bias in Maltese by creating evaluation datasets. Our findings highlight the challenges of applying existing bias mitigation methods to linguistically complex languages, underscoring the need for more inclusive approaches in the development of multilingual NLP.

computational linguistic, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2507.03142

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(3 more...)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Joint Inference for Neural Network Depth and Dropout Regularization Kishan K C1 Rui Li1 Mahdi Gilany Rochester Institute of Technology 2

Neural Information Processing SystemsMay-30-2025, 01:39:56 GMT

Dropout regularization methods prune a neural network's pre-determined backbone structure to avoid overfitting. However, a deep model still tends to be poorly calibrated with high confidence on incorrect predictions. We propose a unified Bayesian model selection method to jointly infer the most plausible network depth warranted by data, and perform dropout regularization simultaneously. In particular, to infer network depth we define a beta process over the number of hidden layers which allows it to go to infinity. Layer-wise activation probabilities induced by the beta process modulate neuron activation via binary vectors of a conjugate Bernoulli process. Experiments across domains show that by adapting network depth and dropout regularization to data, our method achieves superior performance comparing to state-of-the-art methods with well-calibrated uncertainty estimates. In continual learning, our method enables neural networks to dynamically evolve their depths to accommodate incrementally available data beyond their initial structures, and alleviate catastrophic forgetting.

artificial intelligence, international conference, machine learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

MID-L: Matrix-Interpolated Dropout Layer with Layer-wise Neuron Selection

Shaeri, Pouya, Middel, Ariane

arXiv.org Artificial IntelligenceMay-19-2025

Modern neural networks often activate all neurons for every input, leading to unnecessary computation and inefficiency. We introduce Matrix-Interpolated Dropout Layer (MID-L), a novel module that dynamically selects and activates only the most informative neurons by interpolating between two transformation paths via a learned, input-dependent gating vector. Unlike conventional dropout or static sparsity methods, MID-L employs a differentiable Top-k masking strategy, enabling per-input adaptive computation while maintaining end-to-end differentiability. MID-L is model-agnostic and integrates seamlessly into existing architectures. Extensive experiments on six benchmarks, including MNIST, CIFAR-10, CIFAR-100, SVHN, UCI Adult, and IMDB, show that MID-L achieves up to average 55\% reduction in active neurons, 1.7$\times$ FLOPs savings, and maintains or exceeds baseline accuracy. We further validate the informativeness and selectivity of the learned neurons via Sliced Mutual Information (SMI) and observe improved robustness under overfitting and noisy data conditions. Additionally, MID-L demonstrates favorable inference latency and memory usage profiles, making it suitable for both research exploration and deployment on compute-constrained systems. These results position MID-L as a general-purpose, plug-and-play dynamic computation layer, bridging the gap between dropout regularization and efficient inference.

machine learning, mid-l, natural language, (16 more...)

arXiv.org Artificial Intelligence

2505.11416

Country:

North America > United States > Arizona (0.04)
North America > Canada > Ontario > Toronto (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
(4 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Unified Enhancement of the Generalization and Robustness of Language Models via Bi-Stage Optimization

Sun, Yudao, Yin, Juan, Zhao, Juan, Zhang, Fan, Liu, Yongheng, Chen, Hongji

arXiv.org Artificial IntelligenceMar-19-2025

Neural network language models (LMs) are confronted with significant challenges in generalization and robustness. Currently, many studies focus on improving either generalization or robustness in isolation, without methods addressing both aspects simultaneously, which presents a significant challenge in developing LMs that are both robust and generalized. In this paper, we propose a bi-stage optimization framework to uniformly enhance both the generalization and robustness of LMs, termed UEGR. Specifically, during the forward propagation stage, we enrich the output probability distributions of adversarial samples by adaptive dropout to generate diverse sub models, and incorporate JS divergence and adversarial losses of these output distributions to reinforce output stability. During backward propagation stage, we compute parameter saliency scores and selectively update only the most critical parameters to minimize unnecessary deviations and consolidate the model's resilience. Theoretical analysis shows that our framework includes gradient regularization to limit the model's sensitivity to input perturbations and selective parameter updates to flatten the loss landscape, thus improving both generalization and robustness. The experimental results show that our method significantly improves the generalization and robustness of LMs compared to other existing methods across 13 publicly available language datasets, achieving state-of-the-art (SOTA) performance.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.1655

Country:

Asia > Singapore > Central Region > Singapore (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Enhanced Convolutional Neural Networks for Improved Image Classification

Yang, Xiaoran, Yu, Shuhan, Xu, Wenxi

arXiv.org Artificial IntelligenceFeb-1-2025

Image classification is a fundamental task in computer vision with diverse applications, ranging from autonomous systems to medical imaging. The CIFAR-10 dataset is a widely used benchmark to evaluate the performance of classification models on small-scale, multi-class datasets. Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art results; however, they often suffer from overfitting and suboptimal feature representation when applied to challenging datasets like CIFAR-10. In this paper, we propose an enhanced CNN architecture that integrates deeper convolutional blocks, batch normalization, and dropout regularization to achieve superior performance. The proposed model achieves a test accuracy of 84.95%, outperforming baseline CNN architectures. Through detailed ablation studies, we demonstrate the effectiveness of the enhancements and analyze the hierarchical feature representations. This work highlights the potential of refined CNN architectures for tackling small-scale image classification problems effectively.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2502.00663

Country:

Asia > China > Anhui Province > Hefei (0.05)
North America > United States > North Carolina > Wake County > Raleigh (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > China > Hainan Province (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Diagnostic Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Making Sense Of Distributed Representations With Activation Spectroscopy

Reing, Kyle, Steeg, Greg Ver, Galstyan, Aram

arXiv.org Artificial IntelligenceJan-26-2025

In the study of neural network interpretability, there is growing evidence to suggest that relevant features are encoded across many neurons in a distributed fashion. Making sense of these distributed representations without knowledge of the network's encoding strategy is a combinatorial task that is not guaranteed to be tractable. This work explores one feasible path to both detecting and tracing the joint influence of neurons in a distributed representation. We term this approach Activation Spectroscopy (ActSpec), owing to its analysis of the pseudo-Boolean Fourier spectrum defined over the activation patterns of a network layer. The sub-network defined between a given layer and an output logit is cast as a special class of pseudo-Boolean function. The contributions of each subset of neurons in the specified layer can be quantified through the function's Fourier coefficients. We propose a combinatorial optimization procedure to search for Fourier coefficients that are simultaneously high-valued, and non-redundant. This procedure can be viewed as an extension of the Goldreich-Levin algorithm which incorporates additional problem-specific constraints. The resulting coefficients specify a collection of subsets, which are used to test the degree to which a representation is distributed. We verify our approach in a number of synthetic settings and compare against existing interpretability benchmarks. We conclude with a number of experimental evaluations on an MNIST classifier, and a transformer-based network for sentiment analysis.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2501.15435

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Riverside County > Riverside (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Dropout Regularization in Extended Generalized Linear Models based on Double Exponential Families

Schwienhorst, Benedikt Lütke, Kock, Lucas, Nott, David J., Klein, Nadja

arXiv.org Artificial IntelligenceMay-11-2023

Even though dropout is a popular regularization technique, its theoretical properties are not fully understood. In this paper we study dropout regularization in extended generalized linear models based on double exponential families, for which the dispersion parameter can vary with the features. A theoretical analysis shows that dropout regularization prefers rare but important features in both the mean and dispersion, generalizing an earlier result for conventional generalized linear models. Training is performed using stochastic gradient descent with adaptive learning rate. To illustrate, we apply dropout to adaptive smoothing with B-splines, where both the mean and dispersion parameters are modelled flexibly. The important B-spline basis functions can be thought of as rare features, and we confirm in experiments that dropout is an effective form of regularization for mean and dispersion parameters that improves on a penalized maximum likelihood approach with an explicit smoothness penalty.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2305.06625

Country:

North America > United States > New York (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
(2 more...)

Genre: Research Report (0.82)

Add feedback

Using Dropout Regularization in PyTorch Models - MachineLearningMastery.com Using Dropout Regularization in PyTorch Models - MachineLearningMastery.com

#artificialintelligenceFeb-20-2023, 02:00:40 GMT

Dropout is a simple and powerful regularization technique for neural networks and deep learning models. In this post, you will discover the Dropout regularization technique and how to apply it to your models in PyTorch models. Dropout is a regularization technique for neural network models proposed around 2012 to 2014. It is a layer in the neural network. During training of a neural network model, it will take the output from its previous layer, randomly select some of the neurons and zero them out before passing to the next layer, effectively ignored them. This means that their contribution to the activation of downstream neurons is temporally removed on the forward pass, and any weight updates are not applied to the neuron on the backward pass.

dropout, dropout regularization, machinelearningmastery, (15 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback