AITopics | Erker, Justus-Jonas

Collaborating Authors

Erker, Justus-Jonas

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GRITHopper: Decomposition-Free Multi-Hop Dense Retrieval

Erker, Justus-Jonas, Reimers, Nils, Gurevych, Iryna

arXiv.org Artificial IntelligenceMar-10-2025

Decomposition-based multi-hop retrieval methods rely on many autoregressive steps to break down complex queries, which breaks end-to-end differentiability and is computationally expensive. Decomposition-free methods tackle this, but current decomposition-free approaches struggle with longer multi-hop problems and generalization to out-of-distribution data. To address these challenges, we introduce GRITHopper-7B, a novel multi-hop dense retrieval model that achieves state-of-the-art performance on both in-distribution and out-of-distribution benchmarks. GRITHopper combines generative and representational instruction tuning by integrating causal language modeling with dense retrieval training. Through controlled studies, we find that incorporating additional context after the retrieval process, referred to as post-retrieval language modeling, enhances dense retrieval performance. By including elements such as final answers during training, the model learns to better contextualize and retrieve relevant information. GRITHopper-7B offers a robust, scalable, and generalizable solution for multi-hop dense retrieval, and we release it to the community for future research and applications requiring multi-hop reasoning and retrieval capabilities.

grithopper, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2503.07519

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.66)
Research Report > Strength High (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)

Add feedback

Triple-Encoders: Representations That Fire Together, Wire Together

Erker, Justus-Jonas, Mai, Florian, Reimers, Nils, Spanakis, Gerasimos, Gurevych, Iryna

arXiv.org Artificial IntelligenceJul-13-2024

Curved Contrastive Learning, a representation learning method that encodes relative distances between utterances into the embedding space via a bi-encoder, has recently shown promising results for dialog modeling at far superior efficiency. While high efficiency is achieved through independently encoding utterances, this ignores the importance of contextualization. To overcome this issue, this study introduces triple-encoders, which efficiently compute distributed utterance mixtures Figure 1: Comparison of our Triple Encoder to Henderson from these independently encoded utterances et al. (2020) and Erker et al. (2023). Similar to CCL through a novel hebbian inspired co-occurrence we only need to encode and compute similarity scores learning objective in a self-organizing manner, of the latest utterance. At the same time, we achieve without using any weights, i.e., merely through contextualization through pairwise mean-pooling with local interactions. Empirically, we find that previous encoded utterances combining the advantages triple-encoders lead to a substantial improvement of both previous works. Our analysis shows that the over bi-encoders, and even to better zeroshot co-occurrence training pushes representations that occur generalization than single-vector representation (fire) together closer together, leading to stronger models without requiring re-encoding.

machine learning, natural language, utterance, (17 more...)

arXiv.org Artificial Intelligence

2402.12332

Country:

Europe (1.00)
Asia (0.93)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)

Add feedback

Imagination is All You Need! Curved Contrastive Learning for Abstract Sequence Modeling Utilized on Long Short-Term Dialogue Planning

Erker, Justus-Jonas, Schaffer, Stefan, Spanakis, Gerasimos

arXiv.org Artificial IntelligenceJun-26-2023

Inspired by the curvature of space-time (Einstein, 1921), we introduce Curved Contrastive Learning (CCL), a novel representation learning technique for learning the relative turn distance between utterance pairs in multi-turn dialogues. The resulting bi-encoder models can guide transformers as a response ranking model towards a goal in a zero-shot fashion by projecting the goal utterance and the corresponding reply candidates into a latent space. Here the cosine similarity indicates the distance/reachability of a candidate utterance toward the corresponding goal. Furthermore, we explore how these forward-entailing language representations can be utilized for assessing the likelihood of sequences by the entailment strength i.e. through the cosine similarity of its individual members (encoded separately) as an emergent property in the curved space. These non-local properties allow us to imagine the likelihood of future patterns in dialogues, specifically by ordering/identifying future goal utterances that are multiple turns away, given a dialogue context. As part of our analysis, we investigate characteristics that make conversations (un)plannable and find strong evidence of planning capability over multiple turns (in 61.56% over 3 turns) in conversations from the DailyDialog (Li et al., 2017) dataset. Finally, we show how we achieve higher efficiency in sequence modeling tasks compared to previous work thanks to our relativistic approach, where only the last utterance needs to be encoded and computed during inference.

machine learning, natural language, utterance, (18 more...)

arXiv.org Artificial Intelligence

2211.07591

Country:

Europe (0.14)
Asia > Taiwan (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)

Add feedback