Zealand
HistoryBankQA: Multilingual Temporal Question Answering on Historical Events
Mandal, Biswadip, Khandelwal, Anant, Gupta, Manish
Temporal reasoning about historical events is a critical skill for NLP tasks like event extraction, historical entity linking, temporal question answering, timeline summarization, temporal event clustering and temporal natural language inference. Yet efforts on benchmarking temporal reasoning capabilities of large language models (LLMs) are rather limited. Existing temporal reasoning datasets are limited in scale, lack multilingual coverage and focus more on contemporary events. To address these limitations, we present HistoryBank, a multilingual database of 10M+ historical events extracted from Wikipedia timeline pages and article infoboxes. Our database provides unprecedented coverage in both historical depth and linguistic breadth with 10 languages. Additionally, we construct a comprehensive question answering benchmark for temporal reasoning across all languages. This benchmark covers a diverse set of 6 temporal QA reasoning tasks, and we evaluate a suite of popular language models (LLaMA-3-8B, Mistral-7B, Gemma-2-9b, Qwen3-8B, GPT4o) to assess their performance on these tasks. As expected GPT4o performs best across all answer types and languages; Gemma-2 outperforms the other small language models. Our work aims to provide a comprehensive resource for advancing multilingual and temporally-aware natural language understanding of historical events. To facilitate further research, we will make our code and datasets publicly available upon acceptance of this paper.
- Leisure & Entertainment > Sports (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Media (0.68)
- Law (0.67)
Edge.org
The conversation is on hold. The Edge community has hit the road... or they're staying home. Preparing for the academic year to begin, wrapping up projects and starting new ones, celebrating with family and friends or contemplating in solitude. After a hiatus, Edge is pleased to revive Summer Postcards: Edgies reporting in from wherever they are and on whatever they're doing, as the dog days wind out and the season comes to a close. As the world slowly returns to a "new normal" with enduring COVID restrictions in the midst of renewed vaccine freedoms, this year's collection is a testament to change (temporary and lasting), a consideration of loss (will travel ever be like it was?), and a celebration of questions (that still need answering). The hammock may be away until next year, but the memories remain. I spent the summer writing and revising the final section of a longish novel I started in 2019. It seems now as though I've been from 1946 to 2021 on my hands and knees. Various lockdowns have been a liberation from obligations and the luggage carousel, and I've never known such sweet and total focus for months on end. We have the luxury of living in the country--no shortage of big skies and moody walks. All our few breaks were in the UK--Scotland, the Lake District, the West country. Even in our remote part of the Lakes, I had to keep on writing--as in photo. The best novel I read this summer was Sandro Veronesi's The Hummingbird. Best non-fiction was Peter Godfrey Smith's Metazoa: Animal Life and the Birth of the Mind. I gave time also to some wonderful novellas--perfect fictional form for you too-busy scientists. IAN MCEWAN is a novelist whose works have earned him worldwide critical acclaim. He is the recipient of the Man Booker Prize for Amsterdam (1998), the National Book Critics' Circle Fiction Award, and the Los Angeles Times Prize for Fiction for Atonement (2003). His most recent novel is Machines Like Me. In 2019, Časlav Brukner and myself were walking on a beach on Lamma Island, near Hong Kong, marvelling together at the astonishing strangeness of quantum phenomena. This summer, the conversation with Časlav has continued on another island, and quite an island: Lesbos, the northern Greek island near the Turkish coast. Lesbos is the place where lyrical poetry was born. Here lived Sappho and Alcaeus.
- Europe > United Kingdom > Scotland (0.24)
- North America > United States > California > Los Angeles County > Los Angeles (0.24)
- Europe > Netherlands > North Holland > Amsterdam (0.24)
- (44 more...)
- Media (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.93)
- Education > Educational Setting (0.93)
- (3 more...)
Ranking vs. Classifying: Measuring Knowledge Base Completion Quality
Speranskaya, Marina, Schmitt, Martin, Roth, Benjamin
Knowledge base completion (KBC) methods aim at inferring missing facts from the information present in a knowledge base (KB) by estimating the likelihood of candidate facts. In the prevailing evaluation paradigm, models do not actually decide whether a new fact should be accepted or not but are solely judged on the position of true facts in a likelihood ranking with other candidates. We argue that consideration of binary predictions is essential to reflect the actual KBC quality, and propose a novel evaluation paradigm, designed to provide more transparent model selection criteria for a realistic scenario. We construct the data set FB14k-QAQ where instead of single facts, we use KB queries, i.e., facts where one entity is replaced with a variable, and construct corresponding sets of entities that are correct answers. We randomly remove some of these correct answers from the data set, simulating the realistic scenario of real-world entities missing from a KB. This way, we can explicitly measure a model's ability to handle queries that have more correct answers in the real world than in the KB, including the special case of queries without any valid answer. The latter especially contrasts the ranking setting. We evaluate a number of state-of-the-art KB embeddings models on our new benchmark. The differences in relative performance between ranking-based and classification-based evaluation that we observe in our experiments confirm our hypothesis that good performance on the ranking task does not necessarily translate to good performance on the actual completion task. Our results motivate future work on KB embedding models with better prediction separability and, as a first step in that direction, we propose a simple variant of TransE that encourages thresholding and achieves a significant improvement in classification F1 score relative to the original TransE.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > New York (0.05)
- Oceania > New Zealand (0.04)
- (11 more...)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Knowledge Management > Knowledge Engineering (0.92)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.92)
Deep learning for Chemometric and non-translational data
Larsen, Jacob Søgaard, Clemmensen, Line
We propose a novel method to train deep convolutional neural networks which learn from multiple data sets of varying input sizes through weight sharing. This is an advantage in chemometrics where individual measurements represent exact chemical compounds and thus signals cannot be translated or resized without disturbing their interpretation. Our approach show superior performance compared to transfer learning when a medium sized and a small data set are trained together. While we observe a small improvement compared to individual training when two medium sized data sets are trained together, in particular through a reduction in the variance.
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Switzerland (0.04)
- Europe > France > Occitanie > Hérault > Montpellier (0.04)
- (4 more...)