Goto

Collaborating Authors

 thriller


12 books you need to read in 2026

BBC News

Whenever I fantasise about a couple of hours of uninterrupted relaxation during the chilly winter months, my mind immediately conjures up images of curling up on the sofa with a deliciously good book. And when summer eventually comes around, just swap the location to a sun lounger in the back garden (or somewhere more exotic). So with 2026 nearly upon us, join me for an eclectic taste of a few literary delights worth feasting upon over the next 12 months. It's the final instalment of Oseman's hit graphic novel series which has followed the lives of Nick and Charlie, two teenage boys who fall for each other at school. Along with their friends, we've followed all the ups and downs of their relationship as they navigated family drama, homophobia and mental health issues, alongside the joy of first love.


The best new science fiction books of October 2025

New Scientist

Science fiction legend Ursula K. Le Guin is honoured with a new collection out this month, and sci-fi fans can also look forward to fiction from astronaut Chris Hadfield and award-winning authors Ken Liu and Mary Robinette Kowal Like many of you, no doubt, Ursula K. Le Guin is one of my favourite sci-fi writers. So I am really excited about a collection out this month that brings together the maps she would draw when starting a story, and also celebrates her brilliant and wise writing. Not least because we've just read with the New Scientist Book Club: do come and join us and share your thoughts on this classic novel with fellow fans! The sci-fi out this month looks forward as well as back, though. Ken Liu brings us a thriller set in the near future, and I'm keen to read Megha Majumdar's tale of a flooded Kolkata and a desperate mother.


What Do Americans Actually Want to Read? One Author Crunched the Numbers--and Wrote It.

Slate

This enterprise proved so amusing that the pair, in collaboration with composer Dave Soldier, repeated the experiment with popular music, releasing the "most wanted" and "least wanted" songs together on a CD with a cover photo of all three men wearing white lab coats and pointing at a calculator. Sadly, the pair stopped short of what I view as the greatest challenge: producing novels that reflect what Americans like and dislike in fiction. Now, at last, with People's Choice Literature, by the writer/artist/composer Tom Comitta, a new "scientist" has taken up the task. People's Choice Literature offers its readers two novels for the price of one. The first is a thriller whose heroine tries to prevent her boss, a new age–y tech mogul, from launching a quantum computing network that will bring about a total surveillance state.


Movie Facts and Fibs (MF$^2$): A Benchmark for Long Movie Understanding

Zaranis, Emmanouil, Farinhas, António, Santos, Saul, Canaverde, Beatriz, Ramos, Miguel Moura, Surikuchi, Aditya K, Viveiros, André, Liao, Baohao, Bueno-Benito, Elena, Sivakumaran, Nithin, Vasylenko, Pavlo, Yu, Shoubin, Sannigrahi, Sonal, Mohammed, Wafaa, Peters, Ben, Villegas, Danae Sánchez, Stengel-Eskin, Elias, Attanasio, Giuseppe, Yoon, Jaehong, Frank, Stella, Suglia, Alessandro, Zerva, Chrysoula, Elliott, Desmond, Dimiccoli, Mariella, Bansal, Mohit, Lanz, Oswald, Bernardi, Raffaella, Fernández, Raquel, Pezzelle, Sandro, Niculae, Vlad, Martins, André F. T.

arXiv.org Artificial Intelligence

Despite recent progress in vision-language models (VLMs), holistic understanding of long-form video content remains a significant challenge, partly due to limitations in current benchmarks. Many focus on peripheral, ``needle-in-a-haystack'' details, encouraging context-insensitive retrieval over deep comprehension. Others rely on large-scale, semi-automatically generated questions (often produced by language models themselves) that are easier for models to answer but fail to reflect genuine understanding. In this paper, we introduce MF$^2$, a new benchmark for evaluating whether models can comprehend, consolidate, and recall key narrative information from full-length movies (50-170 minutes long). MF$^2$ includes over 50 full-length, open-licensed movies, each paired with manually constructed sets of claim pairs -- one true (fact) and one plausible but false (fib), totalling over 850 pairs. These claims target core narrative elements such as character motivations and emotions, causal chains, and event order, and refer to memorable moments that humans can recall without rewatching the movie. Instead of multiple-choice formats, we adopt a binary claim evaluation protocol: for each pair, models must correctly identify both the true and false claims. This reduces biases like answer ordering and enables a more precise assessment of reasoning. Our experiments demonstrate that both open-weight and closed state-of-the-art models fall well short of human performance, underscoring the relative ease of the task for humans and their superior ability to retain and reason over critical narrative information -- an ability current VLMs lack.


Black Mirror is now a delightful escape from reality

Engadget

The latest season of Black Mirror feels almost therapeutic as we peer over the cliff of civilizational collapse. Everything is awful, but at least we don't have to worry about renting out access to our brains from skeevy startups, or dealing with the consequences of a PC game's super-intelligent AI. While Black Mirror felt like a horrifying harbinger of an over-teched future when it debuted in 2011, now it's practically an escape from the fresh hell of real world headlines. That's not to say that the show has lost any of the acerbic bite from creator Charlie Brooker. But now Brooker and his writers -- Ms. Marvel showrunner Bisha K. Ali, William Bridges, Ella Road and Bekka Bowling -- more deftly wield their talent for cultural analysis. Not all of the new episodes revolve around nefarious new tech, sometimes the tools themselves are genuinely helpful -- it's humans who are often the real problem.


Netflix's Most Expensive Movie Ever Is Here, and It's a Monumental Disaster

Slate

When he got his first glimpse of a movie studio, Orson Welles excitedly proclaimed it "the biggest electric train set any boy ever had." But with a reported budget of more than 300 million, Joe and Anthony Russo's The Electric State makes Welles' train set look like a busted caboose. The most expensive movie in Netflix's history, it's also among the costliest of all time, joining a list that includes the brothers' own Avengers: Infinity War and Avengers: Endgame. If the Russos are the most profligate creators in history--their Amazon series Citadel is also one of the most expensive TV shows ever made--they're among the most successful too. And yet for all the money they're making, and all that they're allowed to spend, they don't seem to be enjoying themselves very much.


Why A.I. Isn't Going to Make Art

The New Yorker

In 1953, Roald Dahl published "The Great Automatic Grammatizator," a short story about an electrical engineer who secretly desires to be a writer. One day, after completing construction of the world's fastest calculating machine, the engineer realizes that "English grammar is governed by rules that are almost mathematical in their strictness." He constructs a fiction-writing machine that can produce a five-thousand-word short story in thirty seconds; a novel takes fifteen minutes and requires the operator to manipulate handles and foot pedals, as if he were driving a car or playing an organ, to regulate the levels of humor and pathos. The resulting novels are so popular that, within a year, half the fiction published in English is a product of the engineer's invention. Is there anything about art that makes us think it can't be created by pushing a button, as in Dahl's imagination?


One Thousand and One Pairs: A "novel" challenge for long-context language models

Karpinska, Marzena, Thai, Katherine, Lo, Kyle, Goyal, Tanya, Iyyer, Mohit

arXiv.org Artificial Intelligence

Synthetic long-context LLM benchmarks (e.g., "needle-in-the-haystack") test only surface-level retrieval capabilities, but how well can long-context LLMs retrieve, synthesize, and reason over information across book-length inputs? We address this question by creating NoCha, a dataset of 1,001 minimally different pairs of true and false claims about 67 recently-published English fictional books, written by human readers of those books. In contrast to existing long-context benchmarks, our annotators confirm that the largest share of pairs in NoCha require global reasoning over the entire book to verify. Our experiments show that while human readers easily perform this task, it is enormously challenging for all ten long-context LLMs that we evaluate: no open-weight model performs above random chance (despite their strong performance on synthetic benchmarks), while GPT-4o achieves the highest accuracy at 55.8%. Further analysis reveals that (1) on average, models perform much better on pairs that require only sentence-level retrieval vs. global reasoning; (2) model-generated explanations for their decisions are often inaccurate even for correctly-labeled claims; and (3) models perform substantially worse on speculative fiction books that contain extensive world-building. The methodology proposed in NoCha allows for the evolution of the benchmark dataset and the easy analysis of future models.


Character is Destiny: Can Large Language Models Simulate Persona-Driven Decisions in Role-Playing?

Xu, Rui, Wang, Xintao, Chen, Jiangjie, Yuan, Siyu, Yuan, Xinfeng, Liang, Jiaqing, Chen, Zulong, Dong, Xiaoqing, Xiao, Yanghua

arXiv.org Artificial Intelligence

Can Large Language Models substitute humans in making important decisions? Recent research has unveiled the potential of LLMs to role-play assigned personas, mimicking their knowledge and linguistic habits. However, imitative decision-making requires a more nuanced understanding of personas. In this paper, we benchmark the ability of LLMs in persona-driven decision-making. Specifically, we investigate whether LLMs can predict characters' decisions provided with the preceding stories in high-quality novels. Leveraging character analyses written by literary experts, we construct a dataset LIFECHOICE comprising 1,401 character decision points from 395 books. Then, we conduct comprehensive experiments on LIFECHOICE, with various LLMs and methods for LLM role-playing. The results demonstrate that state-of-the-art LLMs exhibit promising capabilities in this task, yet there is substantial room for improvement. Hence, we further propose the CHARMAP method, which achieves a 6.01% increase in accuracy via persona-based memory retrieval. We will make our datasets and code publicly available.


Percival Everett Can't Say What His Novels Mean

The New Yorker

In a narrow, windowless room at the University of Southern California, a group of graduate students is workshopping a short story. Its author is silent as her classmates deliver gentle feedback. Some suggest minor improvements of pacing, setting, and tone. One student would appreciate a more robust description of the protagonist's emotions, but enjoys the sparseness, too. "I like this version," another adds.