Media
XLQA: A Benchmark for Locale-Aware Multilingual Open-Domain Question Answering
Roh, Keon-Woo, Ju, Yeong-Joon, Lee, Seong-Whan
Large Language Models (LLMs) have shown significant progress in Open-domain question answering (ODQA), yet most evaluations focus on English and assume locale-invariant answers across languages. This assumption neglects the cultural and regional variations that affect question understanding and answer, leading to biased evaluation in multilingual benchmarks. To address these limitations, we introduce XLQA, a novel benchmark explicitly designed for locale-sensitive multilingual ODQA. XLQA contains 3,000 English seed questions expanded to eight languages, with careful filtering for semantic consistency and human-verified annotations distinguishing locale-invariant and locale-sensitive cases. Our evaluation of five state-of-the-art multilingual LLMs reveals notable failures on locale-sensitive questions, exposing gaps between English and other languages due to a lack of locale-grounding knowledge. We provide a systematic framework and scalable methodology for assessing multilingual QA under diverse cultural contexts, offering a critical resource to advance the real-world applicability of multilingual ODQA systems. Our findings suggest that disparities in training data distribution contribute to differences in both linguistic competence and locale-awareness across models.
Counterspeech for Mitigating the Influence of Media Bias: Comparing Human and LLM-Generated Responses
Lin, Luyang, Feng, Zijin, Wang, Lingzhi, Wong, Kam-Fai
Biased news contributes to societal polarization and is often reinforced by hostile reader comments, constituting a vital yet often overlooked aspect of news dissemination. Our study reveals that offensive comments support biased content, amplifying bias and causing harm to targeted groups or individuals. Counterspeech is an effective approach to counter such harmful speech without violating freedom of speech, helping to limit the spread of bias. To the best of our knowledge, this is the first study to explore counterspeech generation in the context of news articles. We introduce a manually annotated dataset linking media bias, offensive comments, and counterspeech. We conduct a detailed analysis showing that over 70\% offensive comments support biased articles, amplifying bias and thus highlighting the importance of counterspeech generation. Comparing counterspeech generated by humans and large language models, we find model-generated responses are more polite but lack the novelty and diversity. Finally, we improve generated counterspeech through few-shot learning and integration of news background information, enhancing both diversity and relevance.
Persuasiveness and Bias in LLM: Investigating the Impact of Persuasiveness and Reinforcement of Bias in Language Models
Warning: This research studies AI persuasion and bias amplification that could be misused; all experiments are for safety evaluation. Large Language Models (LLMs) now generate convincing, human-like text and are widely used in content creation, decision support, and user interactions. Yet the same systems can spread information or misinformation at scale and reflect social biases that arise from data, architecture, or training choices. This work examines how persuasion and bias interact in LLMs, focusing on how imperfect or skewed outputs affect persuasive impact. Specifically, we test whether persona-based models can persuade with fact-based claims while also, unintentionally, promoting misinformation or biased narratives. We introduce a convincer-skeptic framework: LLMs adopt personas to simulate realistic attitudes. Skeptic models serve as human proxies; we compare their beliefs before and after exposure to arguments from convincer models. Persuasion is quantified with Jensen-Shannon divergence over belief distributions. We then ask how much persuaded entities go on to reinforce and amplify biased beliefs across race, gender, and religion. Strong persuaders are further probed for bias using sycophantic adversarial prompts and judged with additional models. Our findings show both promise and risk. LLMs can shape narratives, adapt tone, and mirror audience values across domains such as psychology, marketing, and legal assistance. But the same capacity can be weaponized to automate misinformation or craft messages that exploit cognitive biases, reinforcing stereotypes and widening inequities. The core danger lies in misuse more than in occasional model mistakes. By measuring persuasive power and bias reinforcement, we argue for guardrails and policies that penalize deceptive use and support alignment, value-sensitive design, and trustworthy deployment.
KG-o1: Enhancing Multi-hop Question Answering in Large Language Models via Knowledge Graph Integration
Wang, Nan, Fan, Yongqi, zhu, yansha, Wang, ZongYu, Cao, Xuezhi, He, Xinyan, Jiang, Haiyun, Ruan, Tong, Liu, Jingping
Large Language Models (LLMs) face challenges in knowledge-intensive reasoning tasks like classic multi-hop question and answering, which involves reasoning across multiple facts. This difficulty arises because the chain of thoughts (CoTs) generated by LLMs in such tasks often deviate from real or a priori reasoning paths. In contrast, knowledge graphs (KGs) explicitly represent the logical connections between facts through entities and relationships. This reflects a significant gap. Meanwhile, large reasoning models (LRMs), such as o1, have demonstrated that long-step reasoning significantly enhances the performance of LLMs. Building on these insights, we propose KG-o1, a four-stage approach that integrates KGs to enhance the multi-hop reasoning abilities of LLMs. We first filter out initial entities and generate complex subgraphs. Secondly, we construct logical paths for subgraphs and then use knowledge graphs to build a dataset with a complex and extended brainstorming process, which trains LLMs to imitate long-term reasoning. Finally, we employ rejection sampling to generate a self-improving corpus for direct preference optimization (DPO), further refining the LLMs reasoning abilities. We conducted experiments on two simple and two complex datasets. The results show that KG-o1 models exhibit superior performance across all tasks compared to existing LRMs.
Exploration of Plan-Guided Summarization for Narrative Texts: the Case of Small Language Models
Grenander, Matt, Varia, Siddharth, Czarnowska, Paula, Vyas, Yogarshi, Halder, Kishaloy, Min, Bonan
Plan-guided summarization attempts to reduce hallucinations in small language models (SLMs) by grounding generated summaries to the source text, typically by targeting fine-grained details such as dates or named entities. In this work, we investigate whether plan-based approaches in SLMs improve summarization in long document, narrative tasks. Narrative texts' length and complexity often mean they are difficult to summarize faithfully. We analyze existing plan-guided solutions targeting fine-grained details, and also propose our own higher-level, narrative-based plan formulation. Our results show that neither approach significantly improves on a baseline without planning in either summary quality or faithfulness. Human evaluation reveals that while plan-guided approaches are often well grounded to their plan, plans are equally likely to contain hallucinations compared to summaries. As a result, the plan-guided summaries are just as unfaithful as those from models without planning. Our work serves as a cautionary tale to plan-guided approaches to summarization, especially for long, complex domains such as narrative texts. Code available at https://github.com/amazon-science/plan-guided-summarization
DAVID MARCUS: With Trump in power, 'South Park' seeks to get its edge back
Turning Point USA founder Charlie Kirk spoke with Fox News Digital about his thoughts of "South Park" parodying him in an upcoming episode, calling it a "badge of honor." "South Park," Comedy Central's gold-standard animated sitcom, has launched its 27th season on America's television screens and, with President Trump back in the White House, politics is back on the menu for creators Trey Parker and Matt Stone. Much like our national media ecosystem, Trump and his presidency are the driving force behind almost every plot line in the first three episodes this year. Much of it is quite funny, but one does wonder: Where was all this hilarious hijinx regarding Joe Biden's "Weekend At Bernie's" presidency? The overarching premise of the season thus far is that, with the election of Trump, wokeness is finally dead.
The High Femme Dystopia of Star Amerasu
If the recent embrace of seemingly--and only seemingly--autonomous machines is any indication, something much less chic than the future premised in "The Matrix" awaits us. During the 1999 film's sequence of down-the-rabbit-hole scenes, Morpheus (Laurence Fishburne) flips the channel on the late-nineties metropolis as Neo (Keanu Reeves) knows it, revealing it to be a "computer-generated dream world" that pacifies a dozing human race whose bioelectricity is extracted by machines, for machines, circa 2197. The "world as it exists today" is instead a dark and decaying place--the "desert of the real," as Morpheus coolly puts it. It is also, he explains, the aftermath of early twenty-first-century optimism, a time when, he says, "we marvelled at our own magnificence as we gave birth to A.I." Still, dystopia as envisioned by the movie's directors, the Wachowskis (and their collaborators, on that film, particularly in production and costume design), looks pretty rad, in cinematic terms. The glint and thrum of Y2K aesthetics--as contrasted with the droning conservatism of the white-collar office--read as anticipatory rather than melancholic, looking toward a future liberated from systems of old.
NASA spacecraft collects dust older than the sun from an asteroid more than 200million miles away
Dust collected from an asteroid by a NASA spacecraft more than 200 million miles away from Earth contains material that is older than the sun. Scientists have analysed samples from the Bennu asteroid, which resembles the Death Star space station in the Star Wars films, and found it is'chemically primitive'. It contained'presolar grains' which are stardust that formed around dying stars billions of years ago. The samples provide a glimpse into the outer Solar System during the birth of the sun and are more pristine than any meteorite on Earth, according to a team of researchers. NASA spacecraft Osiris Rex briefly touched the surface of Bennu with a robotic arm and collected 120g of material which was put into a capsule before it returned to earth in 2023.
Fans loved her new album. The thing was, she hadn't released one
Kaufman made a playlist of all the tracks he could find and gave it a derogatory name. "It's more fun to laugh about it than to feel bad about it," he says. "But it is disconcerting that this can happen." And it was strange to him, as a musician and producer who generally goes "under the radar", to be targeted. "Why not go for someone big?" he asks.
YouTube's Sneaky AI 'Experiment'
Something strange has been happening on YouTube over the past few weeks. After being uploaded, some videos have been subtly augmented, their appearance changing without their creators doing anything. Viewers have noticed "extra punchy shadows," "weirdly sharp edges," and a smoothed-out look to footage that makes it look "like plastic." Many people have come to the same conclusion: YouTube is using AI to tweak videos on its platform, without creators' knowledge. A multimedia artist going by the name Mr. Bravo, whose YouTube videos feature "an authentic 80s aesthetic" achieved by running his videos through a VCR, wrote on Reddit that his videos look "completely different to what was originally uploaded."