Peek Across: Improving Multi-Document Modeling via Cross-Document Question-Answering
Caciularu, Avi, Peters, Matthew E., Goldberger, Jacob, Dagan, Ido, Cohan, Arman
–arXiv.org Artificial Intelligence
The integration of multi-document pre-training objectives into language models has resulted in remarkable improvements in multi-document downstream tasks. In this work, we propose extending this idea by pre-training a generic multi-document model from a novel cross-document question answering pre-training objective. To that end, given a set (or cluster) of topically-related documents, we systematically generate semantically-oriented questions from a salient sentence in one document and challenge the model, during pre-training, to answer these questions while "peeking" into other topically-related documents. In a similar manner, the model is also challenged to recover the sentence from which the question was generated, again while leveraging cross-document information. This novel multi-document QA formulation directs the model to better recover cross-text informational relations, and introduces a natural augmentation that artificially increases the pre-training data. Further, unlike prior multi-document models that focus on either classification or summarization tasks, our pre-training objective formulation enables the model to perform tasks that involve both short text generation (e.g., QA) and long text generation (e.g., summarization). Following this scheme, we pre-train our model -- termed QAmden -- and evaluate its performance across several multi-document tasks, including multi-document QA, summarization, and query-focused summarization, yielding improvements of up to 7%, and significantly outperforms zero-shot GPT-3.5 and GPT-4.
arXiv.org Artificial Intelligence
May-24-2023
- Country:
- Oceania > Australia (0.04)
- South America > Chile
- North America
- Dominican Republic (0.04)
- United States
- Michigan (0.04)
- Washington > King County
- Seattle (0.14)
- Illinois > Cook County
- Chicago (0.05)
- Connecticut > New Haven County
- New Haven (0.04)
- Canada > Newfoundland and Labrador
- Labrador (0.04)
- Europe
- United Kingdom > Wales (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Italy > Tuscany
- Florence (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- France
- Occitanie > Hérault
- Montpellier (0.04)
- Bourgogne-Franche-Comté > Côte-d'Or
- Dijon (0.04)
- Occitanie > Hérault
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Asia
- China > Hong Kong (0.04)
- Middle East
- Israel (0.04)
- Jordan (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Genre:
- Research Report (1.00)
- Technology: