AITopics | Baumgärtner, Tim

Collaborating Authors

Baumgärtner, Tim

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PeerQA: A Scientific Question Answering Dataset from Peer Reviews

Baumgärtner, Tim, Briscoe, Ted, Gurevych, Iryna

arXiv.org Artificial IntelligenceFeb-19-2025

We present PeerQA, a real-world, scientific, document-level Question Answering (QA) dataset. PeerQA questions have been sourced from peer reviews, which contain questions that reviewers raised while thoroughly examining the scientific article. Answers have been annotated by the original authors of each paper. The dataset contains 579 QA pairs from 208 academic articles, with a majority from ML and NLP, as well as a subset of other scientific communities like Geoscience and Public Health. PeerQA supports three critical tasks for developing practical QA systems: Evidence retrieval, unanswerable question classification, and answer generation. We provide a detailed analysis of the collected dataset and conduct experiments establishing baseline systems for all three tasks. Our experiments and analyses reveal the need for decontextualization in document-level retrieval, where we find that even simple decontextualization approaches consistently improve retrieval performance across architectures. On answer generation, PeerQA serves as a challenging benchmark for long-context modeling, as the papers have an average size of 12k tokens. Our code and data is available at https://github.com/UKPLab/peerqa.

large language model, machine learning, question answering, (19 more...)

arXiv.org Artificial Intelligence

2502.13668

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > Experimental Study (0.92)

Industry:

Health & Medicine (0.87)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Scalable and Domain-General Abstractive Proposition Segmentation

Hosseini, Mohammad Javad, Gao, Yang, Baumgärtner, Tim, Fabrikant, Alex, Amplayo, Reinald Kim

arXiv.org Artificial IntelligenceJun-28-2024

Segmenting text into fine-grained units of meaning is important to a wide range of NLP applications. The default approach of segmenting text into sentences is often insufficient, especially since sentences are usually complex enough to include multiple units of meaning that merit separate treatment in the downstream task. We focus on the task of abstractive proposition segmentation: transforming text into simple, self-contained, well-formed sentences. Several recent works have demonstrated the utility of proposition segmentation with few-shot prompted LLMs for downstream tasks such as retrieval-augmented grounding and fact verification. However, this approach does not scale to large amounts of text and may not always extract all the facts from the input text. In this paper, we first introduce evaluation metrics for the task to measure several dimensions of quality. We then propose a scalable, yet accurate, proposition segmentation model. We model proposition segmentation as a supervised task by training LLMs on existing annotated datasets and show that training yields significantly improved results. We further show that by using the fine-tuned LLMs as teachers for annotating large amounts of multi-domain synthetic distillation data, we can train smaller student models with results similar to the teacher LLMs. We then demonstrate that our technique leads to effective domain generalization, by annotating data in two domains outside the original training data and evaluating on them. Finally, as a key contribution of the paper, we share an easy-to-use API for NLP practitioners to use.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2406.19803

Country:

Europe > United Kingdom > England (0.14)
North America > United States > Massachusetts (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment (0.46)
Education (0.36)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data

Baumgärtner, Tim, Gao, Yang, Alon, Dana, Metzler, Donald

arXiv.org Artificial IntelligenceApr-8-2024

Reinforcement Learning from Human Feedback (RLHF) is a popular method for aligning Language Models (LM) with human values and preferences. RLHF requires a large number of preference pairs as training data, which are often used in both the Supervised Fine-Tuning and Reward Model training, and therefore publicly available datasets are commonly used. In this work, we study to what extent a malicious actor can manipulate the LMs generations by poisoning the preferences, i.e., injecting poisonous preference pairs into these datasets and the RLHF training process. We propose strategies to build poisonous preference pairs and test their performance by poisoning two widely used preference datasets. Our results show that preference poisoning is highly effective: by injecting a small amount of poisonous data (1-5% of the original dataset), we can effectively manipulate the LM to generate a target entity in a target sentiment (positive or negative). The findings from our experiments also shed light on strategies to defend against the preference poisoning attack.

artificial intelligence, machine learning, sentiment, (18 more...)

arXiv.org Artificial Intelligence

2404.0553

Country:

Europe (1.00)
Asia (0.67)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.86)

Industry:

Law Enforcement & Public Safety (1.00)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
(9 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

UKP-SQuARE v3: A Platform for Multi-Agent QA Research

Puerto, Haritz, Baumgärtner, Tim, Sachdeva, Rachneet, Fang, Haishuo, Zhang, Hao, Tariverdian, Sewin, Wang, Kexin, Gurevych, Iryna

arXiv.org Artificial IntelligenceMay-17-2023

The continuous development of Question Answering (QA) datasets has drawn the research community's attention toward multi-domain models. A popular approach is to use multi-dataset models, which are models trained on multiple datasets to learn their regularities and prevent overfitting to a single dataset. However, with the proliferation of QA models in online repositories such as GitHub or Hugging Face, an alternative is becoming viable. Recent works have demonstrated that combining expert agents can yield large performance gains over multi-dataset models. To ease research in multi-agent models, we extend UKP-SQuARE, an online platform for QA research, to support three families of multi-agent systems: i) agent selection, ii) early-fusion of agents, and iii) late-fusion of agents. We conduct experiments to evaluate their inference speed and discuss the performance vs. speed trade-off compared to multi-dataset models. UKP-SQuARE is open-source and publicly available at http://square.ukp-lab.de.

artificial intelligence, computational linguistic, proceedings, (16 more...)

arXiv.org Artificial Intelligence

2303.1812

Country:

Europe (1.00)
Asia > Middle East (0.68)
North America > United States > California (0.46)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.50)

Industry:

Government (0.47)
Education (0.47)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

On the Realization of Compositionality in Neural Networks

Baan, Joris, Leible, Jana, Nikolaus, Mitja, Rau, David, Ulmer, Dennis, Baumgärtner, Tim, Hupkes, Dieuwke, Bruni, Elia

arXiv.org Artificial IntelligenceJun-6-2019

We present a detailed comparison of two types of sequence to sequence models trained to conduct a compositional task. The models are architecturally identical at inference time, but differ in the way that they are trained: our baseline model is trained with a task-success signal only, while the other model receives additional supervision on its attention mechanism (Attentive Guidance), which has shown to be an effective method for encouraging more compositional solutions (Hupkes et al.,2019). We first confirm that the models with attentive guidance indeed infer more compositional solutions than the baseline, by training them on the lookup table task presented by Li\v{s}ka et al. (2019). We then do an in-depth analysis of the structural differences between the two model types, focusing in particular on the organisation of the parameter space and the hidden layer activations and find noticeable differences in both these aspects. Guided networks focus more on the components of the input rather than the sequence as a whole and develop small functional groups of neurons with specific purposes that use their gates more selectively. Results from parameter heat maps, component swapping and graph analysis also indicate that guided networks exhibit a more modular structure with a small number of specialized, strongly connected neurons.

deep learning, neural network, neuron, (19 more...)

arXiv.org Artificial Intelligence

1906.01634

Country:

Europe (0.28)
North America > United States > Hawaii (0.14)
Asia > Middle East > Qatar (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

The PhotoBook Dataset: Building Common Ground through Visually-Grounded Dialogue

Haber, Janosch, Baumgärtner, Tim, Takmaz, Ece, Gelderloos, Lieke, Bruni, Elia, Fernández, Raquel

arXiv.org Artificial IntelligenceJun-4-2019

This paper introduces the PhotoBook dataset, a large-scale collection of visually-grounded, task-oriented dialogues in English designed to investigate shared dialogue history accumulating during conversation. Taking inspiration from seminal work on dialogue analysis, we propose a data-collection task formulated as a collaborative game prompting two online participants to refer to images utilising both their visual context as well as previously established referring expressions. We provide a detailed description of the task setup and a thorough analysis of the 2,500 dialogues collected. To further illustrate the novel features of the dataset, we propose a baseline model for reference resolution which uses a simple method to take into account shared information accumulated in a reference chain. Our results show that this information is particularly important to resolve later descriptions and underline the need to develop more sophisticated models of common ground in dialogue interaction.

deep learning, neural network, participant, (22 more...)

arXiv.org Artificial Intelligence

1906.0153

Country: Europe (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment > Sports (0.68)
Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback