AITopics | Niculae, Vlad

Collaborating Authors

Niculae, Vlad

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Keep your distance: learning dispersed embeddings on $\mathbb{S}_d$

Tokarchuk, Evgeniia, Bakker, Hua Chang, Niculae, Vlad

arXiv.org Artificial IntelligenceFeb-12-2025

Learning well-separated features in high-dimensional spaces, such as text or image embeddings, is crucial for many machine learning applications. Achieving such separation can be effectively accomplished through the dispersion of embeddings, where unrelated vectors are pushed apart as much as possible. By constraining features to be on a hypersphere, we can connect dispersion to well-studied problems in mathematics and physics, where optimal solutions are known for limited low-dimensional cases. However, in representation learning we typically deal with a large number of features in high-dimensional space, and moreover, dispersion is usually traded off with some other task-oriented training objective, making existing theoretical and numerical solutions inapplicable. Therefore, it is common to rely on gradient-based methods to encourage dispersion, usually by minimizing some function of the pairwise distances. In this work, we first give an overview of existing methods from disconnected literature, making new connections and highlighting similarities. Next, we introduce some new angles. We propose to reinterpret pairwise dispersion using a maximum mean discrepancy (MMD) motivation. We then propose an online variant of the celebrated Lloyd's algorithm, of K-Means fame, as an effective alternative regularizer for dispersion on generic domains. Finally, we derive a novel dispersion method that directly exploits properties of the hypersphere. Our experiments show the importance of dispersion in image classification and natural language processing tasks, and how algorithms exhibit different trade-offs in different regimes.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.08231

Country:

North America > United States > Pennsylvania (0.14)
North America > United States > New York (0.14)
North America > Mexico > Mexico City (0.14)

Genre:

Overview (0.88)
Research Report (0.82)

Industry: Education > Educational Setting > Online (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.67)

Add feedback

Hopfield-Fenchel-Young Networks: A Unified Framework for Associative Memory Retrieval

Santos, Saul, Niculae, Vlad, McNamee, Daniel, Martins, André F. T.

arXiv.org Artificial IntelligenceNov-13-2024

Associative memory models, such as Hopfield networks and their modern variants, have garnered renewed interest due to advancements in memory capacity and connections with self-attention in transformers. In this work, we introduce a unified framework-Hopfield-Fenchel-Young networks-which generalizes these models to a broader family of energy functions. Our energies are formulated as the difference between two Fenchel-Young losses: one, parameterized by a generalized entropy, defines the Hopfield scoring mechanism, while the other applies a post-transformation to the Hopfield output. By utilizing Tsallis and norm entropies, we derive end-to-end differentiable update rules that enable sparse transformations, uncovering new connections between loss margins, sparsity, and exact retrieval of single memory patterns. We further extend this framework to structured Hopfield networks using the SparseMAP transformation, allowing the retrieval of pattern associations rather than a single pattern. Our framework unifies and extends traditional and modern Hopfield networks and provides an energy minimization perspective for widely used post-transformations like $\ell_2$-normalization and layer normalization-all through suitable choices of Fenchel-Young losses and by using convex analysis as a building block. Finally, we validate our Hopfield-Fenchel-Young networks on diverse memory recall tasks, including free and sequential recall. Experiments on simulated data, image retrieval, multiple instance learning, and text rationalization demonstrate the effectiveness of our approach.

artificial intelligence, hopfield network, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2411.0859

Country:

North America > United States (0.28)
Europe > Portugal > Lisbon > Lisbon (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (0.81)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

We Augmented Whisper With kNN and You Won't Believe What Came Next

Nachesa, Maya K., Niculae, Vlad

arXiv.org Artificial IntelligenceOct-24-2024

Speech recognition performance varies by language, domain, and speaker characteristics such as accent, and fine-tuning a model on any of these categories may lead to catastrophic forgetting. $k$ nearest neighbor search ($k$NN), first proposed for neural sequence decoders for natural language generation (NLG) and machine translation (MT), is a non-parametric method that can instead adapt by building an external datastore that can then be searched during inference time, without training the underlying model. We show that Whisper, a transformer end-to-end speech model, benefits from $k$NN. We investigate the differences between the speech and text setups. We discuss implications for speaker adaptation, and analyze improvements by gender, accent, and age.

datastore, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.1885

Country: Europe > Netherlands (0.29)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Analyzing Context Utilization of LLMs in Document-Level Translation

Mohammed, Wafaa, Niculae, Vlad

arXiv.org Artificial IntelligenceOct-18-2024

Large language models (LLM) are increasingly strong contenders in machine translation. We study document-level translation, where some words cannot be translated without context from outside the sentence. We investigate the ability of prominent LLMs to utilize context by analyzing models' robustness to perturbed and randomized document context. We find that LLMs' improved document-translation performance is not always reflected in pronoun translation performance. We highlight the need for context-aware finetuning of LLMs with a focus on relevant parts of the context to improve their reliability for document-level translation.

large language model, natural language, translation, (15 more...)

arXiv.org Artificial Intelligence

2410.14391

Country:

North America > United States (0.28)
Asia > Middle East > UAE (0.14)
Europe > Portugal > Lisbon > Lisbon (0.14)
Europe > Middle East > Malta (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

ARM: Efficient Guided Decoding with Autoregressive Reward Models

Troshin, Sergey, Niculae, Vlad, Fokkens, Antske

arXiv.org Artificial IntelligenceJul-5-2024

Language models trained on large amounts of data require careful tuning to be safely deployed in real world. We revisit the guided decoding paradigm, where the goal is to augment the logits of the base language model using the scores from a task-specific reward model. We propose a simple but efficient parameterization of the autoregressive reward model enabling fast and effective guided decoding. On detoxification and sentiment control tasks, we show that our efficient parameterization performs on par with RAD, a strong but less efficient guided decoding approach. Generative large language models (LLMs) gain a lot of popularity in recent years and show impressive results in zero-shot and few-shot scenarios on numerous downstream tasks (Touvron et al., 2023; OpenAI, 2024; Jiang et al., 2023). These large-scale models are pretrained on large amounts of data, and are known to inherit and memorize the underlying biases (Sheng et al., 2019).

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2407.04615

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Sparse and Structured Hopfield Networks

Santos, Saul, Niculae, Vlad, McNamee, Daniel, Martins, Andre F. T.

arXiv.org Artificial IntelligenceJun-4-2024

Modern Hopfield networks have enjoyed recent interest due to their connection to attention in transformers. Our paper provides a unified framework for sparse Hopfield networks by establishing a link with Fenchel-Young losses. The result is a new family of Hopfield-Fenchel-Young energies whose update rules are end-to-end differentiable sparse transformations. We reveal a connection between loss margins, sparsity, and exact memory retrieval. We further extend this framework to structured Hopfield networks via the SparseMAP transformation, which can retrieve pattern associations instead of a single pattern. Experiments on multiple instance learning and text rationalization demonstrate the usefulness of our approach.

artificial intelligence, machine learning, transformation, (15 more...)

arXiv.org Artificial Intelligence

2402.13725

Country:

Europe > Portugal > Lisbon > Lisbon (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)

Add feedback

On Measuring Context Utilization in Document-Level MT Systems

Mohammed, Wafaa, Niculae, Vlad

arXiv.org Artificial IntelligenceFeb-2-2024

Document-level translation models are usually evaluated using general metrics such as BLEU, which are not informative about the benefits of context. Current work on context-aware evaluation, such as contrastive methods, only measure translation accuracy on words that need context for disambiguation. Such measures cannot reveal whether the translation model uses the correct supporting context. We propose to complement accuracy-based evaluation with measures of context utilization. We find that perturbation-based analysis (comparing models' performance when provided with correct versus random context) is an effective measure of overall context utilization. For a finer-grained phenomenon-specific evaluation, we propose to measure how much the supporting context contributes to handling context-dependent discourse phenomena. We show that automatically-annotated supporting context gives similar conclusions to human-annotated context and can be used as alternative for cases where human annotations are not available. Finally, we highlight the importance of using discourse-rich datasets when assessing context utilization.

artificial intelligence, machine translation, natural language, (15 more...)

arXiv.org Artificial Intelligence

2402.01404

Country:

Europe (1.00)
North America > United States > California (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)

Add feedback

Viewing Knowledge Transfer in Multilingual Machine Translation Through a Representational Lens

Stap, David, Niculae, Vlad, Monz, Christof

arXiv.org Artificial IntelligenceDec-4-2023

We argue that translation quality alone is not a sufficient metric for measuring knowledge transfer in multilingual neural machine translation. To support this claim, we introduce Representational Transfer Potential (RTP), which measures representational similarities between languages. We show that RTP can measure both positive and negative transfer (interference), and find that RTP is strongly correlated with changes in translation quality, indicating that transfer does occur. Furthermore, we investigate data and language characteristics that are relevant for transfer, and find that multi-parallel overlap is an important yet under-explored feature. Based on this, we develop a novel training scheme, which uses an auxiliary similarity loss that encourages representations to be more invariant across languages by taking advantage of multi-parallel data. We show that our method yields increased translation quality for low- and mid-resource languages across multiple data and model setups.

artificial intelligence, computational linguistic, natural language, (15 more...)

arXiv.org Artificial Intelligence

2305.1155

Country:

Europe (1.00)
North America > United States > California (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

The Unreasonable Effectiveness of Random Target Embeddings for Continuous-Output Neural Machine Translation

Tokarchuk, Evgeniia, Niculae, Vlad

arXiv.org Artificial IntelligenceOct-31-2023

Continuous-output neural machine translation (CoNMT) replaces the discrete next-word prediction problem with an embedding prediction. The semantic structure of the target embedding space (i.e., closeness of related words) is intuitively believed to be crucial. We challenge this assumption and show that completely random output embeddings can outperform laboriously pretrained ones, especially on larger datasets. Further investigation shows this surprising effect is strongest for rare words, due to the geometry of their embeddings. We shed further light on this finding by designing a mixed strategy that combines random and pre-trained embeddings for different tokens.

artificial intelligence, machine translation, natural language, (16 more...)

arXiv.org Artificial Intelligence

2310.2062

Country:

Europe (1.00)
North America > United States (0.28)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Joint Dropout: Improving Generalizability in Low-Resource Neural Machine Translation through Phrase Pair Variables

Araabi, Ali, Niculae, Vlad, Monz, Christof

arXiv.org Artificial IntelligenceJul-24-2023

Although Neural Machine Translation (NMT) has made remarkable advances (Vaswani et al., 2017), it still requires large amounts of data to induce correct generalizations that characterize human intelligence (Lake et al., 2017). However, such a vast amount of data to make robust, reliable, and fair predictions is not available for low-resource NMT (Koehn and Knowles, 2017). The generalizability of NMT has been extensively studied in prior research, revealing the volatile behaviour of translation outputs when even a single token in the source sentence is modified (Belinkov and Bisk, 2018; Fadaee and Monz, 2020; Li et al., 2021). For instance, in the sentence "smallpox killed billions of people on this planet" from our IWSLT test set, when replacing the noun "smallpox" with another acute disease like "tuberculosis", the model should ideally generate a correct translation by only modifying the relevant part while keeping the rest of the sentence unchanged. However, in many instances, such a small perturbation adversely affects the translation of the entire sentence, highlighting the limited generalization and robustness of existing NMT models (Fadaee and Monz, 2020). Compositionality is regarded as the most prominent form of generalization that embodies the ability of human intelligence to generalize to new data, tasks, and domains (Schmidhuber, 1990; Lake and Baroni, 2018), while other types mostly focus on the practical considerations across domains, tasks, and languages, model robustness, and structural generalization (Hupkes et al., 2022). Research in compositional generalization has two main aspects: evaluating the current models' compositional abilities as well as improving them.

artificial intelligence, computational linguistic, natural language, (14 more...)

arXiv.org Artificial Intelligence

2307.12835

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Louisiana (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)

Genre:

Research Report (0.64)
Overview (0.46)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.74)
Health & Medicine > Therapeutic Area > Immunology (0.74)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback