AITopics | core model

Collaborating Authors

core model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards Adaptive Deep Learning: Model Elasticity via Prune-and-Grow CNN Architectures

Mangal, Pooja, Kalra, Sudaksh, Sapra, Dolly

arXiv.org Artificial IntelligenceMay-20-2025

Deploying deep convolutional neural networks (CNNs) on resource-constrained devices presents significant challenges due to their high computational demands and rigid, static architectures. To overcome these limitations, this thesis explores methods for enabling CNNs to dynamically adjust their computational complexity based on available hardware resources. We introduce adaptive CNN architectures capable of scaling their capacity at runtime, thus efficiently balancing performance and resource utilization. To achieve this adaptability, we propose a structured pruning and dynamic re-construction approach that creates nested subnetworks within a single CNN model. This approach allows the network to dynamically switch between compact and full-sized configurations without retraining, making it suitable for deployment across varying hardware platforms. Experiments conducted across multiple CNN architectures including VGG-16, AlexNet, ResNet-20, and ResNet-56 on CIFAR-10 and Imagenette datasets demonstrate that adaptive models effectively maintain or even enhance performance under varying computational constraints. Our results highlight that embedding adaptability directly into CNN architectures significantly improves their robustness and flexibility, paving the way for efficient real-world deployment in diverse computational environments.

artificial intelligence, machine learning, pruning, (19 more...)

arXiv.org Artificial Intelligence

2505.11569

Country: Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Impact of Copyrighted Material on Large Language Models: A Norwegian Perspective

de la Rosa, Javier, Mikhailov, Vladislav, Zhang, Lemei, Wetjen, Freddy, Samuel, David, Liu, Peng, Braaten, Rolv-Arild, Mæhlum, Petter, Birkenes, Magnus Breder, Kutuzov, Andrey, Enstad, Tita, Brygfjeld, Svein Arne, Gulla, Jon Atle, Oepen, Stephan, Velldal, Erik, Østgulen, Wilfred, Øvrelid, Liljia, Myhre, Aslak Sira

arXiv.org Artificial IntelligenceDec-12-2024

The use of copyrighted materials in training generative language models raises critical legal and ethical questions. This paper presents a framework for and the results of empirically assessing the impact of copyrighted materials on the performance of large language models (LLMs) for Norwegian. We found that both books and newspapers contribute positively when the models are evaluated on a diverse set of Norwegian benchmarks, while fiction works possibly lead to decreased performance. Our experiments could inform the creation of a compensation scheme for authors whose works contribute to AI development.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.0946

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Sweden > Östergötland County > Linköping (0.04)
Europe > Norway > Eastern Norway > Oslo (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Media > News (0.38)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Reviews: GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking

Neural Information Processing SystemsOct-7-2024, 22:13:25 GMT

The sizes of embedding matrices in NLP tasks have long posed difficult computational problems, either from the inefficiency of operating (softmaxing) over them, or often from the sheer difficulty in storing them. In this paper the authors take on the latter problem, introducing a method of using multiple low-rank approximations to reduce the size of these matrices. They rely on frequency binning -- the same observation underlying the hierarchical softmax solution to the former problem -- to group words, prioritizing the most frequent words to receive higher rank approximations. This itself leads to significant compression rates with little loss in accuracy, and when further combined with quantization, yields large reduction in memory. Importantly quantization appears to play nicely with their methodology, and the combined seem to provide much smaller models overall, while performing at least as well as naive quantiziation on large data sets.

block-wise low-rank approximation, compression rate, neural language model, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.72)

Add feedback

Multi-BERT: Leveraging Adapters and Prompt Tuning for Low-Resource Multi-Domain Adaptation

Azad, Parham Abed, Beigy, Hamid

arXiv.org Artificial IntelligenceApr-2-2024

The rapid expansion of texts' volume and diversity presents formidable challenges in multi-domain settings. These challenges are also visible in the Persian name entity recognition (NER) settings. Traditional approaches, either employing a unified model for multiple domains or individual models for each domain, frequently pose significant limitations. Single models often struggle to capture the nuances of diverse domains, while utilizing multiple large models can lead to resource constraints, rendering the training of a model for each domain virtually impractical. Therefore, this paper introduces a novel approach composed of one core model with multiple sets of domain-specific parameters. We utilize techniques such as prompt tuning and adapters, combined with the incorporation of additional layers, to add parameters that we can train for the specific domains. This enables the model to perform comparably to individual models for each domain. Experimental results on different formal and informal datasets show that by employing these added parameters, the proposed model significantly surpasses existing practical models in performance. Remarkably, the proposed model requires only one instance for training and storage, yet achieves outstanding results across all domains, even surpassing the state-of-the-art in some. Moreover, we analyze each adaptation strategy, delineating its strengths, weaknesses, and optimal hyper-parameters for the Persian NER settings. Finally, we introduce a document-based domain detection pipeline tailored for scenarios with unknown text domains, enhancing the adaptability and practicality of this paper in real-world applications.

adapter, core model, dataset, (12 more...)

arXiv.org Artificial Intelligence

2404.02335

Country:

North America > United States (0.05)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)

Genre: Research Report > Promising Solution (0.66)

Technology:

Information Technology > Communications > Social Media (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.49)

Add feedback

Entangling Machine Learning with Quantum Tensor Networks

van der Poel, Constantijn, Zhao, Dan

arXiv.org Artificial IntelligenceJan-8-2024

This paper examines the use of tensor networks, which can efficiently represent high-dimensional quantum states, in language modeling. It is a distillation and continuation of the work done in (van der Poel, 2023). To do so, we will abstract the problem down to modeling Motzkin spin chains, which exhibit long-range correlations reminiscent of those found in language. The Matrix Product State (MPS), also known as the tensor train, has a bond dimension which scales as the length of the sequence it models. To combat this, we use the factored core MPS, whose bond dimension scales sub-linearly. We find that the tensor models reach near perfect classifying ability, and maintain a stable level of performance as the number of valid training examples is decreased.

dimension, sequence, tensor network, (14 more...)

arXiv.org Artificial Intelligence

2403.12969

Country: Asia > Middle East > Israel (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)

Add feedback

Modular and On-demand Bias Mitigation with Attribute-Removal Subnetworks

Hauzenberger, Lukas, Masoudian, Shahed, Kumar, Deepak, Schedl, Markus, Rekabsaz, Navid

arXiv.org Artificial IntelligenceJun-4-2023

Societal biases are reflected in large pre-trained language models and their fine-tuned versions on downstream tasks. Common in-processing bias mitigation approaches, such as adversarial training and mutual information removal, introduce additional optimization criteria, and update the model to reach a new debiased state. However, in practice, end-users and practitioners might prefer to switch back to the original model, or apply debiasing only on a specific subset of protected attributes. To enable this, we propose a novel modular bias mitigation approach, consisting of stand-alone highly sparse debiasing subnetworks, where each debiasing module can be integrated into the core model on-demand at inference time. Our approach draws from the concept of \emph{diff} pruning, and proposes a novel training regime adaptable to various representation disentanglement optimizations. We conduct experiments on three classification tasks with gender, race, and age as protected attributes. The results show that our modular approach, while maintaining task performance, improves (or at least remains on-par with) the effectiveness of bias mitigation in comparison with baseline finetuning. Particularly on a two-attribute dataset, our approach with separately learned debiasing subnetworks shows effective utilization of either or both the subnetworks for selective bias mitigation.

machine learning, natural language, subnetwork, (16 more...)

arXiv.org Artificial Intelligence

2205.15171

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Dominican Republic (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
(4 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Removing Spurious Features can Hurt Accuracy and Affect Groups Disproportionately

Khani, Fereshte, Liang, Percy

arXiv.org Machine LearningDec-7-2020

The presence of spurious features interferes with the goal of obtaining robust models that perform well across many groups within the population. A natural remedy is to remove spurious features from the model. However, in this work we show that removal of spurious features can decrease accuracy due to the inductive biases of overparameterized models. We completely characterize how the removal of spurious features affects accuracy across different groups (more generally, test distributions) in noiseless overparameterized linear regression. In addition, we show that removal of spurious feature can decrease the accuracy even in balanced datasets -- each target co-occurs equally with each spurious feature; and it can inadvertently make the model more susceptible to other spurious features. Finally, we show that robust self-training can remove spurious features without affecting the overall accuracy. Experiments on the Toxic-Comment-Detectoin and CelebA datasets show that our results hold in non-linear models.

accuracy, removing spurious feature, spurious feature, (14 more...)

arXiv.org Machine Learning

2012.04104

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback