AITopics

Knowledge-intensive question answering is central to large language models (LLMs) and is typically assessed using static benchmarks derived from sources like Wikipedia and textbooks. However, these benchmarks fail to capture evolving knowledge in a dynamic world, and centralized curation struggles to keep pace with rapid LLM advancements. To address these drawbacks, we propose Open Knowledge Bench (OKBench), a fully automated framework for generating high-quality, dynamic knowledge benchmarks on demand. Focusing on the news domain where knowledge updates daily, OKBench is an agentic framework that automates the sourcing, creation, validation, and distribution of benchmarks. Our approach democratizes benchmark creation and facilitates thorough evaluation of retrieval-augmented methods by reducing overlap with pretraining data. We evaluate our framework on a wide range open-source and proprietary LLMs of various sizes and configurations, both with and without retrieval over freshly generated knowledge. Our results reveal distinct model behaviors when confronted with new information and highlight how retrieval narrows the performance gap between small and large models. These findings underscore the importance of evaluating LLMs on evolving knowledge benchmarks.

benchmark, large language model, natural language, (19 more...)

2511.08598

Country:

Europe (1.00)
North America > United States (0.93)
Asia (0.67)

Genre: Research Report > New Finding (1.00)

Industry:

Media > News (1.00)
Leisure & Entertainment (0.93)
Government (0.67)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Dutta, Arka, Dutta, Sujan, Magu, Rijul, Datta, Soumyajit, De Choudhury, Munmun, KhudaBukhsh, Ashiqur R.

What About the Scene with the Hitler Reference? HAUNT: A Framework to Probe LLMs' Self-consistency Via Adversarial Nudge

Hallucinations pose a critical challenge to the real-world deployment of large language models (LLMs) in high-stakes domains. In this paper, we present a framework for stress testing factual fidelity in LLMs in the presence of adversarial nudge. Our framework consists of three steps. In the first step, we instruct the LLM to produce sets of truths and lies consistent with the closed domain in question. In the next step, we instruct the LLM to verify the same set of assertions as truths and lies consistent with the same closed domain. In the final step, we test the robustness of the LLM against the lies generated (and verified) by itself. Our extensive evaluation, conducted using five widely known proprietary LLMs across two closed domains of popular movies and novels, reveals a wide range of susceptibility to adversarial nudges: \texttt{Claude} exhibits strong resilience, \texttt{GPT} and \texttt{Grok} demonstrate moderate resilience, while \texttt{Gemini} and \texttt{DeepSeek} show weak resilience. Considering that a large population is increasingly using LLMs for information seeking, our findings raise alarm.

large language model, machine learning, natural language, (19 more...)

2511.08596

Country:

Europe (1.00)
North America > United States (0.93)

Genre: Research Report > New Finding (0.66)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Health & Medicine (0.93)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

The Collective Turing Test: Large Language Models Can Generate Realistic Multi-User Discussions

Bouleimen, Azza, De Marzo, Giordano, Kim, Taehee, Pagan, Nicol`o, Metzler, Hannah, Giordano, Silvia, Garcia, David

Large Language Models (LLMs) offer new avenues to simulate online communities and social media. Potential applications range from testing the design of content recommendation algorithms to estimating the effects of content policies and interventions. However, the validity of using LLMs to simulate conversations between various users remains largely untested. We evaluated whether LLMs can convincingly mimic human group conversations on social media. We collected authentic human conversations from Reddit and generated artificial conversations on the same topic with two LLMs: Llama 3 70B and GPT-4o. When presented side-by-side to study participants, LLM-generated conversations were mistaken for human-created content 39\% of the time. In particular, when evaluating conversations generated by Llama 3, participants correctly identified them as AI-generated only 56\% of the time, barely better than random chance. Our study demonstrates that LLMs can generate social media conversations sufficiently realistic to deceive humans when reading them, highlighting both a promising potential for social simulation and a warning message about the potential misuse of LLMs to generate new inauthentic social media content.

large language model, machine learning, natural language, (18 more...)

2511.08592

Country: Europe > Switzerland (0.29)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Media > News (0.96)
Health & Medicine > Therapeutic Area (0.69)
Government > Regional Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Tian, Lin, Zhang, Xiuzhen, Kim, Maria Myung-Hee, Biggs, Jennifer, Rizoiu, Marian-Andrei

X-Troll: eXplainable Detection of State-Sponsored Information Operations Agents

State-sponsored trolls, malicious actors who deploy sophisticated linguistic manipulation in coordinated information campaigns, posing threats to online discourse integrity. While Large Language Models (LLMs) achieve strong performance on general natural language processing (NLP) tasks, they struggle with subtle propaganda detection and operate as ``black boxes'', providing no interpretable insights into manipulation strategies. This paper introduces X-Troll, a novel framework that bridges this gap by integrating explainable adapter-based LLMs with expert-derived linguistic knowledge to detect state-sponsored trolls and provide human-readable explanations for its decisions. X-Troll incorporates appraisal theory and propaganda analysis through specialized LoRA adapters, using dynamic gating to capture campaign-specific discourse patterns in coordinated information operations. Experiments on real-world data demonstrate that our linguistically-informed approach shows strong performance compared with both general LLM baselines and existing troll detection models in accuracy while providing enhanced transparency through expert-grounded explanations that reveal the specific linguistic strategies used by state-sponsored actors. X-Troll source code is available at: https://github.com/ltian678/xtroll_source/.

large language model, machine learning, x-troll, (19 more...)

doi: 10.1145/3746252.376102

2508.16021

Country:

Asia (0.96)
Oceania > Australia (0.68)
North America (0.68)

Genre: Research Report > New Finding (0.67)

Industry:

Media > News (1.00)
Information Technology (1.00)
Government > Regional Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

FOX NewsNov-12-2025, 22:12:46 GMT

OpenAI accuses NY Times of wanting to invade millions of users' privacy in paper's lawsuit against tech giant

OpenAI accuses The New York Times of wanting to invade user privacy after the newspaper demanded access to 20 million private ChatGPT conversations in its ongoing lawsuit.

large language model, machine learning, natural language, (16 more...)

FOX News

Country: North America > United States (1.00)

Industry:

Leisure & Entertainment > Sports (1.00)
Law > Litigation (1.00)
Information Technology > Security & Privacy (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.80)

The New YorkerNov-12-2025, 17:35:54 GMT

That New Hit Song on Spotify? It Was Made by A.I.

That New Hit Song on Spotify? Aspiring musicians are churning out tracks using generative artificial intelligence. Some are topping the charts. Nick Arter, a thirty-five-year-old in Washington, D.C., never quite managed to become a professional musician the old-fashioned way. He grew up in Harrisburg, Pennsylvania, in a music-loving family.

artificial intelligence, natural language, social media, (15 more...)

The New Yorker

Country:

North America > United States > Pennsylvania > Dauphin County > Harrisburg (0.24)
North America > United States > District of Columbia > Washington (0.24)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.34)
Information Technology > Communications > Social Media (0.30)

FOX NewsNov-12-2025, 16:00:18 GMT

AI-powered scams target kids while parents stay silent

A new Bitwarden survey reveals that 78% of parents fear their children could fall victim to AI-powered scams, yet nearly half haven't discussed these threats with their kids.

artificial intelligence, kurt cyberguy knutsson, social media, (9 more...)

FOX News

Country: North America > United States (0.29)

Industry:

Media (1.00)
Leisure & Entertainment > Sports (1.00)
Information Technology > Security & Privacy (1.00)
(4 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Social Media (0.73)

Why do cats love boxes? Evolution has an answer.

Why do cats love boxes? Boxes give cats control, comfort, and prime ambush angles. Even when a box is too small, cats still love them. Breakthroughs, discoveries, and DIY tips sent every weekday. If you've ever purchased an expensive, bespoke toy for your feline friend, then watched them ignore said purchase in favor of the cardboard container it arrived in, you will know this universal truth: cats love boxes.

artificial intelligence, cat love box, laura baisa, (12 more...)

Popular Science

Genre: Research Report > New Finding (0.35)

Industry:

Health & Medicine (1.00)
Media > Photography (0.48)

Technology: Information Technology > Artificial Intelligence (0.50)

MIT Technology ReviewNov-12-2025, 13:10:00 GMT

The Download: how to survive a conspiracy theory, and moldy cities

What it's like to be in the middle of a conspiracy theory (according to a conspiracy theory expert) It's something of a familiar cycle by now: Tragedy hits; rampant misinformation and conspiracy theories follow. It's often even more acute in the case of a natural disaster, when conspiracy theories about what "really" caused the calamity run right into culture-war-driven climate change denialism. Put together, these theories obscure real causes while elevating fake ones. I've studied these ideas extensively, having spent the last 10 years writing about conspiracy theories and disinformation as a journalist and researcher. I've covered everything from the rise of QAnon to whether Donald Trump faked his assassination attempt. I've written three books, testified to Congress, and even written a report for the January 6th Committee.

artificial intelligence, chatbot, natural language, (12 more...)

MIT Technology Review

Country: North America > United States > California (0.15)

Industry:

Media > News (1.00)
Government > Regional Government > North America Government > United States Government (0.35)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.49)

WIREDNov-12-2025, 12:31:00 GMT

Is a Robot Vacuum Worth It?

Is a Robot Vacuum Worth It? It's not for everyone, but sometimes my robot vacuum is my only friend. Every single day--weekend, weekday, rain or shine--whichever robot vacuum I'm currently testing starts running at 9 am. I heave a sigh of relief and continue with whatever else I was doing, content that at least f*cking chore in my house is getting done. When I first started testing robot vacuums eight years ago, it sometimes seemed like more trouble than it was worth. I cleaned up the floor .

artificial intelligence, robot vacuum, vacuum, (16 more...)

WIRED

Country: North America > United States > Oregon (0.14)

Industry:

Leisure & Entertainment (0.48)
Information Technology (0.48)
Retail (0.47)
Media (0.31)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)