Goto

Collaborating Authors

 conflict





WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models

Neural Information Processing Systems

Large language models (LLMs) need knowledge updates to meet the ever-growing world facts and correct the hallucinated responses, facilitating the methods of lifelong model editing. Where the updated knowledge resides in memories is a fundamental question for model editing. In this paper, we find that editing either long-term memory (direct model parameters) or working memory (non-parametric knowledge of neural network activations/representations by retrieval) will result in an impossible triangle---reliability, generalization, and locality can not be realized together in the lifelong editing settings. For long-term memory, directly editing the parameters will cause conflicts with irrelevant pretrained knowledge or previous edits (poor reliability and locality). For working memory, retrieval-based activations can hardly make the model understand the edits and generalize (poor generalization). Therefore, we propose WISE to bridge the gap between memories.


The drones being used in Sudan: 1,000 attacks since April 2023

Al Jazeera

During Sudan's civil war, which erupted in April 2023, both sides have increasingly relied on drones, and civilians have borne the brunt of the carnage. The conflict between the Sudanese armed forces (SAF) and the Rapid Support Forces (RSF) paramilitary group is an example of war transformed by commercially available, easily concealable unmanned aerial vehicles (UAVs), or drones. Modular, well-adapted to sanctions evasions and devastatingly effective, drones have killed scores of civilians, crippled infrastructure and plunged Sudanese cities into darkness. In this visual investigation, Al Jazeera examines the history of drone warfare in Sudan, the types of drones used by the warring sides, how they are sourced, where the attacks have occurred and the human toll. The RSF traces its origins to what at the time was a government-linked militia known as the Janjaweed.


Watch: Fishing on a frozen river for respite from the war in Ukraine

BBC News

Kyiv is many miles from the front line, but Ukraine's war with Russia is never far away - with Moscow's missile and drone attacks directed at the city almost every day. On the frozen surface of the mighty River Dnipro, the BBC speaks to men who spend hours fishing to take their minds off the almost four-year-old conflict, which has left homes with no heating after Russian strikes on power stations. Drilling holes in the ice of the river in the heart of the city, these ice-fisherman - many of them veterans with friends and family at the front - hope to catch small fish, and a little respite. Authorities deliberately triggered the avalanche on Mount Elbrus to release a build up of snow. The limited deployment involves Germany, France, Sweden, Norway, Finland, the Netherlands and the UK.


Grok's deepfake crisis, explained

TIME - Tech

Welcome back to In the Loop, new twice-weekly newsletter about AI. If you're reading this in your browser, why not subscribe to have the next one delivered straight to your inbox? In the past few weeks, many tech leaders have made bold predictions about what AI will achieve in 2026, from mastering the field of biology to surpassing human intelligence outright . But in 2026's first week, the most visible use of AI has been X users employing Grok to digitally disrobe women. Elon Musk's platform X has been flooded with nonconsensual AI-created images, requested by users, of unclothed or scantily-clad women, men and children, sometimes in sexual positions.


An A-Z list of 2025's biggest stories

Al Jazeera

Scroll back through the last year, and the same words come up again and again. The top-trending terms of 2025, from artificial intelligence to Zohran Mamdani, shaped headlines across politics, conflict, technology and climate. As the year comes to a close, AJ Labs has compiled an A to Z list of names, places and issues that generated sustained interest throughout 2025, according to a loose analysis of our own most-viewed story tags and those that appeared in Google's most searched. Taken together, these terms are a patchwork of issues that are also likely to spill into 2026, from ongoing conflicts to a changing technosocial landscape not seen since the dawn of the internet. This is 2025 from A to Z, by the words that made the year.


WikiContradict: A Benchmark for Evaluating LLMs on Real-World Knowledge Conflicts from Wikipedia

Neural Information Processing Systems

Retrieval-augmented generation (RAG) has emerged as a promising solution to mitigate the limitations of large language models (LLMs), such as hallucinations and outdated information. However, it remains unclear how LLMs handle knowledge conflicts arising from different augmented retrieved passages, especially when these passages originate from the same source and have equal trustworthiness. In this work, we conduct a comprehensive evaluation of LLM-generated answers to questions that have varying answers based on contradictory passages from Wikipedia, a dataset widely regarded as a high-quality pre-training resource for most LLMs. Specifically, we introduce WikiContradict, a benchmark consisting of 253 high-quality, human-annotated instances designed to assess the performance of LLMs in providing a complete perspective on conflicts from the retrieved documents, rather than choosing one answer over another, when augmented with retrieved passages containing real-world knowledge conflicts. We benchmark a diverse range of both closed and open-source LLMs under different QA scenarios, including RAG with a single passage, and RAG with 2 contradictory passages. Through rigorous human evaluations on a subset of WikiContradict instances involving 5 LLMs and over 3,500 judgements, we shed light on the behaviour and limitations of these models. For instance, when provided with two passages containing contradictory facts, all models struggle to generate answers that accurately reflect the conflicting nature of the context, especially for implicit conflicts requiring reasoning. Since human evaluation is costly, wealso introduce an automated model that estimates LLM performance using a strong open-source language model, achieving an F-score of 0.8. Using this automated metric, we evaluate more than 1,500 answers from seven LLMs across all WikiContradict instances.


\texttt{ConflictBank} : A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLMs

Neural Information Processing Systems

Large language models (LLMs) have achievedimpressive advancements across numerous disciplines, yet the critical issue of knowledge conflicts, a major source of hallucinations, has rarely been studied. While a few research explored the conflicts between the inherent knowledge of LLMs and the retrieved contextual knowledge, a comprehensive assessment of knowledge conflict in LLMs is still missing.