Personal
MAGIC: A Multi-Hop and Graph-Based Benchmark for Inter-Context Conflicts in Retrieval-Augmented Generation
Lee, Jungyeon, Lee, Kangmin, Kim, Taeuk
Knowledge conflict often arises in retrieval-augmented generation (RAG) systems, where retrieved documents may be inconsistent with one another or contradict the model's parametric knowledge. Existing benchmarks for investigating the phenomenon have notable limitations, including a narrow focus on the question answering setup, heavy reliance on entity substitution techniques, and a restricted range of conflict types. To address these issues, we propose a knowledge graph (KG)-based framework that generates varied and subtle conflicts between two similar yet distinct contexts, while ensuring interpretability through the explicit relational structure of KGs. Experimental results on our benchmark, MAGIC, provide intriguing insights into the inner workings of LLMs regarding knowledge conflict: both open-source and proprietary models struggle with conflict detection -- especially when multi-hop reasoning is required -- and often fail to pinpoint the exact source of contradictions. Finally, we present in-depth analyses that serve as a foundation for improving LLMs in integrating diverse, sometimes even conflicting, information.
Policy design for two-sided platforms with participation dynamics: Interview with Haruka Kiyohara
In their paper Policy Design for Two-sided Platforms with Participation Dynamics, which was presented at ICML 2025, and investigated the the participation dynamics in two-sided markets. In this interview, Haruka tells us more about such two-sided platforms, the main contributions of the work, and the experiments carried out to test the method. What is the topic of the research in your paper and why is it an interesting area for study? Our paper studied the long-term impacts of decision-making algorithms on two-sided platforms like e-commerce or music streaming applications. In two-sided platforms, multiple stakeholders, such as viewers and content creators, are involved.
A API Details
API calls for each position identified in a piece of text. Question Answering We use the Atlas model of Izacard et al. (2022) finetuned on Natural Questions Calculator Our calculator is based on a simple Python script and only supports the operators " It does not return any result for syntactically invalid equations. "=", "equals", "equal to", "total of", "average of" followed by a number, or (iii) contain at least three English text before generating API calls. Below, we list the prompts used to sample API calls for each tool considered. Your task is to add calls to a Question Answering API to a piece of text. Input: Joe Biden was born in Scranton, Pennsylvania. Output: Joe Biden was born in [QA("Where was Joe Biden born?")] Scranton, [QA("In Output: Coca-Cola, or [QA("What other name is Coca-Cola known by?")] Coke, is Your task is to add calls to a Calculator API to a piece of text.
China honing abilities for a possible future attack, Taiwan warns
A China Coast Guard vessel is seen on a giant screen showing news footage about the coast guard's law enforcement patrols in waters around Taiwan, outside a shopping mall in Beijing on April 1. | REUTERS TAIPEI - China is increasing military activities near Taiwan and honing its ability to stage a surprise attack, as well as seeking to undermine trust in the government with hybrid online warfare tactics, the island's defense ministry said on Thursday. Democratically-governed Taiwan, which China views as its own territory, has faced increased military pressure from Beijing over the past five years, including at least seven rounds of major war games around the island since 2022. China has been using artificial intelligence tools to weaken Taiwan's cybersecurity and to scan for weak points in critical infrastructure, the defense ministry said in a report released every two years. Beijing is also using hybrid warfare to weaken people's trust in the government and support for defense spending, and stepping up grey zone harassment, it added, referring to non-combat operations such as coast guard patrols designed to pressure Taiwan. Through both conventional and unconventional military actions, it aims to test its capabilities for attacking Taiwan and confronting foreign forces, the ministry said.
Large Language Model as Attributed Training Data Generator: A T ale of Diversity and Bias Yue Y u
Large language models (LLMs) have been recently leveraged as training data generators for various natural language processing (NLP) tasks. While previous research has explored different approaches to training models using generated data, they generally rely on simple class-conditional prompts, which may limit the diversity of the generated data and inherit systematic biases of LLM. Thus, we investigate training data generation with diversely attributed prompts (e.g.,
Energy firms snap up weather services for trading edge in Japan
Power traders are fueling a boom in weather data, which helps them to anticipate sudden price swings. Weather forecasters are finding a lucrative niche in Japan's power-trading boom, selling hyper-specialized data to firms seeking an edge in one of the world's most volatile electricity markets. Weathernews is among a handful of companies cashing in on demand for meteorological data. The Tokyo-listed company's shares have surged 50% in the last year as investors bet on its expanded use of artificial intelligence, among other factors. The firm says it's supplying -- or is in talks to provide -- data to several dozen power traders, about a third of which are based outside Japan.