core model
Towards Adaptive Deep Learning: Model Elasticity via Prune-and-Grow CNN Architectures
Mangal, Pooja, Kalra, Sudaksh, Sapra, Dolly
Deploying deep convolutional neural networks (CNNs) on resource-constrained devices presents significant challenges due to their high computational demands and rigid, static architectures. To overcome these limitations, this thesis explores methods for enabling CNNs to dynamically adjust their computational complexity based on available hardware resources. We introduce adaptive CNN architectures capable of scaling their capacity at runtime, thus efficiently balancing performance and resource utilization. To achieve this adaptability, we propose a structured pruning and dynamic re-construction approach that creates nested subnetworks within a single CNN model. This approach allows the network to dynamically switch between compact and full-sized configurations without retraining, making it suitable for deployment across varying hardware platforms. Experiments conducted across multiple CNN architectures including VGG-16, AlexNet, ResNet-20, and ResNet-56 on CIFAR-10 and Imagenette datasets demonstrate that adaptive models effectively maintain or even enhance performance under varying computational constraints. Our results highlight that embedding adaptability directly into CNN architectures significantly improves their robustness and flexibility, paving the way for efficient real-world deployment in diverse computational environments.
The Impact of Copyrighted Material on Large Language Models: A Norwegian Perspective
de la Rosa, Javier, Mikhailov, Vladislav, Zhang, Lemei, Wetjen, Freddy, Samuel, David, Liu, Peng, Braaten, Rolv-Arild, Mæhlum, Petter, Birkenes, Magnus Breder, Kutuzov, Andrey, Enstad, Tita, Brygfjeld, Svein Arne, Gulla, Jon Atle, Oepen, Stephan, Velldal, Erik, Østgulen, Wilfred, Øvrelid, Liljia, Myhre, Aslak Sira
The use of copyrighted materials in training generative language models raises critical legal and ethical questions. This paper presents a framework for and the results of empirically assessing the impact of copyrighted materials on the performance of large language models (LLMs) for Norwegian. We found that both books and newspapers contribute positively when the models are evaluated on a diverse set of Norwegian benchmarks, while fiction works possibly lead to decreased performance. Our experiments could inform the creation of a compensation scheme for authors whose works contribute to AI development.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Sweden > Östergötland County > Linköping (0.04)
- Europe > Norway > Eastern Norway > Oslo (0.04)
- (6 more...)
Reviews: GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking
The sizes of embedding matrices in NLP tasks have long posed difficult computational problems, either from the inefficiency of operating (softmaxing) over them, or often from the sheer difficulty in storing them. In this paper the authors take on the latter problem, introducing a method of using multiple low-rank approximations to reduce the size of these matrices. They rely on frequency binning -- the same observation underlying the hierarchical softmax solution to the former problem -- to group words, prioritizing the most frequent words to receive higher rank approximations. This itself leads to significant compression rates with little loss in accuracy, and when further combined with quantization, yields large reduction in memory. Importantly quantization appears to play nicely with their methodology, and the combined seem to provide much smaller models overall, while performing at least as well as naive quantiziation on large data sets.
Multi-BERT: Leveraging Adapters and Prompt Tuning for Low-Resource Multi-Domain Adaptation
Azad, Parham Abed, Beigy, Hamid
The rapid expansion of texts' volume and diversity presents formidable challenges in multi-domain settings. These challenges are also visible in the Persian name entity recognition (NER) settings. Traditional approaches, either employing a unified model for multiple domains or individual models for each domain, frequently pose significant limitations. Single models often struggle to capture the nuances of diverse domains, while utilizing multiple large models can lead to resource constraints, rendering the training of a model for each domain virtually impractical. Therefore, this paper introduces a novel approach composed of one core model with multiple sets of domain-specific parameters. We utilize techniques such as prompt tuning and adapters, combined with the incorporation of additional layers, to add parameters that we can train for the specific domains. This enables the model to perform comparably to individual models for each domain. Experimental results on different formal and informal datasets show that by employing these added parameters, the proposed model significantly surpasses existing practical models in performance. Remarkably, the proposed model requires only one instance for training and storage, yet achieves outstanding results across all domains, even surpassing the state-of-the-art in some. Moreover, we analyze each adaptation strategy, delineating its strengths, weaknesses, and optimal hyper-parameters for the Persian NER settings. Finally, we introduce a document-based domain detection pipeline tailored for scenarios with unknown text domains, enhancing the adaptability and practicality of this paper in real-world applications.
- North America > United States (0.05)
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
Entangling Machine Learning with Quantum Tensor Networks
van der Poel, Constantijn, Zhao, Dan
This paper examines the use of tensor networks, which can efficiently represent high-dimensional quantum states, in language modeling. It is a distillation and continuation of the work done in (van der Poel, 2023). To do so, we will abstract the problem down to modeling Motzkin spin chains, which exhibit long-range correlations reminiscent of those found in language. The Matrix Product State (MPS), also known as the tensor train, has a bond dimension which scales as the length of the sequence it models. To combat this, we use the factored core MPS, whose bond dimension scales sub-linearly. We find that the tensor models reach near perfect classifying ability, and maintain a stable level of performance as the number of valid training examples is decreased.
Modular and On-demand Bias Mitigation with Attribute-Removal Subnetworks
Hauzenberger, Lukas, Masoudian, Shahed, Kumar, Deepak, Schedl, Markus, Rekabsaz, Navid
Societal biases are reflected in large pre-trained language models and their fine-tuned versions on downstream tasks. Common in-processing bias mitigation approaches, such as adversarial training and mutual information removal, introduce additional optimization criteria, and update the model to reach a new debiased state. However, in practice, end-users and practitioners might prefer to switch back to the original model, or apply debiasing only on a specific subset of protected attributes. To enable this, we propose a novel modular bias mitigation approach, consisting of stand-alone highly sparse debiasing subnetworks, where each debiasing module can be integrated into the core model on-demand at inference time. Our approach draws from the concept of \emph{diff} pruning, and proposes a novel training regime adaptable to various representation disentanglement optimizations. We conduct experiments on three classification tasks with gender, race, and age as protected attributes. The results show that our modular approach, while maintaining task performance, improves (or at least remains on-par with) the effectiveness of bias mitigation in comparison with baseline finetuning. Particularly on a two-attribute dataset, our approach with separately learned debiasing subnetworks shows effective utilization of either or both the subnetworks for selective bias mitigation.
- North America > United States > New York > New York County > New York City (0.04)
- North America > Dominican Republic (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- (4 more...)
Removing Spurious Features can Hurt Accuracy and Affect Groups Disproportionately
The presence of spurious features interferes with the goal of obtaining robust models that perform well across many groups within the population. A natural remedy is to remove spurious features from the model. However, in this work we show that removal of spurious features can decrease accuracy due to the inductive biases of overparameterized models. We completely characterize how the removal of spurious features affects accuracy across different groups (more generally, test distributions) in noiseless overparameterized linear regression. In addition, we show that removal of spurious feature can decrease the accuracy even in balanced datasets -- each target co-occurs equally with each spurious feature; and it can inadvertently make the model more susceptible to other spurious features. Finally, we show that robust self-training can remove spurious features without affecting the overall accuracy. Experiments on the Toxic-Comment-Detectoin and CelebA datasets show that our results hold in non-linear models.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > Middle East > Jordan (0.04)