AITopics | Diera, Andor

Collaborating Authors

Diera, Andor

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Continual Learning for Encoder-only Language Models via a Discrete Key-Value Bottleneck

Diera, Andor, Galke, Lukas, Karl, Fabian, Scherp, Ansgar

arXiv.org Artificial IntelligenceDec-11-2024

Continual learning remains challenging across various natural language understanding tasks. When models are updated with new training data, they risk catastrophic forgetting of prior knowledge. In the present work, we introduce a discrete key-value bottleneck for encoder-only language models, allowing for efficient continual learning by requiring only localized updates. Inspired by the success of a discrete key-value bottleneck in vision, we address new and NLP-specific challenges. We experiment with different bottleneck architectures to find the most suitable variants regarding language, and present a generic discrete key initialization technique for NLP that is task independent. We evaluate the discrete key-value bottleneck in four continual learning NLP scenarios and demonstrate that it alleviates catastrophic forgetting. We showcase that it offers competitive performance to other popular continual learning methods, with lower computational costs.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.08528

Country: Europe (0.28)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

Isotropy Matters: Soft-ZCA Whitening of Embeddings for Semantic Code Search

Diera, Andor, Galke, Lukas, Scherp, Ansgar

arXiv.org Artificial IntelligenceNov-27-2024

Our study investigates the impact of isotropy on semantic code search performance and explores post-processing techniques to mitigate this issue. We analyze various code language models, examine isotropy in their embedding spaces, and its influence on search effectiveness. We propose a modified ZCA whitening technique to control isotropy levels in embeddings. Our results demonstrate that Soft-ZCA whitening improves the performance of pre-trained code language models and can complement contrastive fine-tuning.

isotropy, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2411.17538

Country: Europe > Denmark (0.14)

Genre: Research Report > New Finding (0.54)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

GenCodeSearchNet: A Benchmark Test Suite for Evaluating Generalization in Programming Language Understanding

Diera, Andor, Dahou, Abdelhalim, Galke, Lukas, Karl, Fabian, Sihler, Florian, Scherp, Ansgar

arXiv.org Artificial IntelligenceNov-16-2023

Language models can serve as a valuable tool for software developers to increase productivity. Large generative models can be used for code generation and code completion, while smaller encoder-only models are capable of performing code search tasks using natural language queries.These capabilities are heavily influenced by the quality and diversity of the available training data. Source code datasets used for training usually focus on the most popular languages and testing is mostly conducted on the same distributions, often overlooking low-resource programming languages. Motivated by the NLP generalization taxonomy proposed by Hupkes et.\,al., we propose a new benchmark dataset called GenCodeSearchNet (GeCS) which builds upon existing natural language code search datasets to systemically evaluate the programming language understanding generalization capabilities of language models. As part of the full dataset, we introduce a new, manually curated subset StatCodeSearch that focuses on R, a popular but so far underrepresented programming language that is often used by researchers outside the field of computer science. For evaluation and comparison, we collect several baseline results using fine-tuned BERT-style models and GPT-style large language models in a zero-shot setting.

benchmark test suite, large language model, natural language, (3 more...)

arXiv.org Artificial Intelligence

2311.09707

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)

Add feedback

Memorization of Named Entities in Fine-tuned BERT Models

Diera, Andor, Lell, Nicolas, Garifullina, Aygul, Scherp, Ansgar

arXiv.org Artificial IntelligenceOct-10-2023

Privacy preserving deep learning is an emerging field in machine learning that aims to mitigate the privacy risks in the use of deep neural networks. One such risk is training data extraction from language models that have been trained on datasets, which contain personal and privacy sensitive information. In our study, we investigate the extent of named entity memorization in fine-tuned BERT models. We use single-label text classification as representative downstream task and employ three different fine-tuning setups in our experiments, including one with Differentially Privacy (DP). We create a large number of text samples from the fine-tuned BERT models utilizing a custom sequential sampling strategy with two prompting strategies. We search in these samples for named entities and check if they are also present in the fine-tuning datasets. We experiment with two benchmark datasets in the domains of emails and blogs. We show that the application of DP has a detrimental effect on the text generation capabilities of BERT. Furthermore, we show that a fine-tuned BERT does not generate more named entities specific to the fine-tuning dataset than a BERT model that is pre-trained only. This suggests that BERT is unlikely to emit personal or privacy sensitive named entities. Overall, our results are important to understand to what extent BERT-based services are prone to training data extraction attacks.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-40837-3_16

2212.03749

Country:

Europe (0.46)
North America > United States > California (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Are We Really Making Much Progress in Text Classification? A Comparative Review

Galke, Lukas, Diera, Andor, Lin, Bao Xin, Khera, Bhakti, Meuser, Tim, Singhal, Tushar, Karl, Fabian, Scherp, Ansgar

arXiv.org Artificial IntelligenceJun-4-2023

This study reviews and compares methods for single-label and multi-label text classification, categorized into bag-of-words, sequence-based, graph-based, and hierarchical methods. The comparison aggregates results from the literature over five single-label and seven multi-label datasets and complements them with new experiments. The findings reveal that all recently proposed graph-based and hierarchy-based methods fail to outperform pre-trained language models and sometimes perform worse than standard machine learning methods like a multilayer perceptron on a bag-of-words. To assess the true scientific progress in text classification, future work should thoroughly test against strong bag-of-words baselines and state-of-the-art pre-trained language models.

machine learning, natural language, text classification, (18 more...)

arXiv.org Artificial Intelligence

2204.03954

Country:

Europe (1.00)
Asia > Middle East (0.92)
Asia > China (0.67)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback