Goto

Collaborating Authors

 narrator


What if Readers Like A.I.-Generated Fiction?

The New Yorker

Finally, he gave the summaries to his fine-tuned model, and he asked it to compose passages "in the style of Vauhini Vara." Going into all this, I was self-assured, even smug. I'd always felt that my style was original and, more important, that my books were totally distinct from one another. I figured that, even if the A.I. model could imitate my past books, it couldn't predict the style of the novel in progress. So, when Chakrabarty sent me the A.I.-generated imitations, I was genuinely confused.


ParsVoice: A Large-Scale Multi-Speaker Persian Speech Corpus for Text-to-Speech Synthesis

Kalahroodi, Mohammad Javad Ranjbar, Faili, Heshaam, Shakery, Azadeh

arXiv.org Artificial Intelligence

Existing Persian speech datasets are typically smaller than their English counterparts, which creates a key limitation for developing Persian speech technologies. We address this gap by introducing ParsVoice, the largest Persian speech corpus designed specifically for text-to-speech(TTS) applications. We created an automated pipeline that transforms raw audiobook content into TTS-ready data, incorporating components such as a BERT-based sentence completion detector, a binary search boundary optimization method for precise audio-text alignment, and audio-text quality assessment frameworks tailored to Persian. The pipeline processes 2,000 audiobooks, yielding 3,526 hours of clean speech, which was further filtered into a 1,804-hour high-quality subset suitable for TTS, featuring more than 470 speakers. To validate the dataset, we fine-tuned XTTS for Persian, achieving a naturalness Mean Opinion Score (MOS) of 3.6/5 and a Speaker Similarity Mean Opinion Score (SMOS) of 4.0/5 demonstrating ParsVoice's effectiveness for training multi-speaker TTS systems. ParsVoice is the largest high-quality Persian speech dataset, offering speaker diversity and audio quality comparable to major English corpora. The complete dataset has been made publicly available to accelerate the development of Persian speech technologies. The ParsVoice dataset is publicly available at: https://huggingface.co/datasets/MohammadJRanjbar/ParsVoice.


Helen Oyeyemi's Novel of Cognitive Dissonance

The New Yorker

Few fantasies are harder to wipe away than the romance of a clean slate. Every January, when we're twitchy with regret and self-loathing, advertisers blare, "New Year, new you," urging us to jettison our failures and start fresh. In fiction, self-reinvention is a perennial theme, often shadowed by the suspicion that it can't be done. Lately, novelists have put a political spin on the idea, counterposing hopeful acts of individual self-fashioning to the immovable weight of circumstance. Halle Butler's "The New Me" (2019), a millennial office satire, finds its temp heroine, Millie, trying to life-hack her way out of loneliness and professional drift--buy a plant, whiten her teeth, make friends, think positive.


What to Ask Next? Probing the Imaginative Reasoning of LLMs with TurtleSoup Puzzles

Zhou, Mengtao, Wu, Sifan, Zhang, Huan, Sima, Qi, Liu, Bang

arXiv.org Artificial Intelligence

We investigate the capacity of Large Language Models (LLMs) for imaginative reasoning--the proactive construction, testing, and revision of hypotheses in information-sparse environments. Existing benchmarks, often static or focused on social deduction, fail to capture the dynamic, exploratory nature of this reasoning process. To address this gap, we introduce a comprehensive research framework based on the classic "Turtle Soup" game, integrating a benchmark, an agent, and an evaluation protocol. We present TurtleSoup-Bench, the first large-scale, bilingual, interactive benchmark for imaginative reasoning, comprising 800 turtle soup puzzles sourced from both the Internet and expert authors. We also propose Mosaic-Agent, a novel agent designed to assess LLMs' performance in this setting. To evaluate reasoning quality, we develop a multi-dimensional protocol measuring logical consistency, detail completion, and conclusion alignment. Experiments with leading LLMs reveal clear capability limits, common failure patterns, and a significant performance gap compared to humans. Our work offers new insights into LLMs' imaginative reasoning and establishes a foundation for future research on exploratory agent behavior.


Which one Performs Better? Wav2Vec or Whisper? Applying both in Badini Kurdish Speech to Text (BKSTT)

Adnan, Renas, Hassani, Hossein

arXiv.org Artificial Intelligence

Speech-to-text (STT) systems have a wide range of applications. They are available in many languages, albeit at different quality levels. Although Kurdish is considered a less-resourced language from a processing perspective, SST is available for some of the Kurdish dialects, for instance, Sorani (Central Kurdish). However, that is not applied to other Kurdish dialects, Badini and Hawrami, for example. This research is an attempt to address this gap. Bandin, approximately, has two million speakers, and STT systems can help their community use mobile and computer-based technologies while giving their dialect more global visibility. We aim to create a language model based on Badini's speech and evaluate its performance. To cover a conversational aspect, have a proper confidence level of grammatical accuracy, and ready transcriptions, we chose Badini kids' stories, eight books including 78 stories, as the textual input. Six narrators narrated the books, which resulted in approximately 17 hours of recording. We cleaned, segmented, and tokenized the input. The preprocessing produced nearly 15 hours of speech, including 19193 segments and 25221 words. We used Wav2Vec2-Large-XLSR-53 and Whisper-small to develop the language models. The experiments indicate that the transcriptions process based on the Wav2Vec2-Large-XLSR-53 model provides a significantly more accurate and readable output than the Whisper-small model, with 90.38% and 65.45% readability, and 82.67% and 53.17% accuracy, respectively.


'AI doesn't know what an orgasm sounds like': audiobook actors grapple with the rise of robot narrators

The Guardian

When we think about what makes an audiobook memorable, it's always the most human moments: a catch in the throat when tears are near, or words spoken through a real smile. A Melbourne actor and audiobook narrator, Annabelle Tudor, says it's the instinct we have as storytellers that makes narration such a primal, and precious, skill. "The voice betrays how we're feeling really easily," she says. But as an art form it may be under threat. In May the Amazon-owned audiobook provider Audible announced it would allow authors and publishers to choose from more than 100 voices created by artificial intelligence to narrate audiobooks in English, Spanish, French and Italian, with AI translation of audiobooks expected to be available later in the year – news that was met with criticism and curiosity across the publishing industry.


Classifying Unreliable Narrators with Large Language Models

Brei, Anneliese, Henry, Katharine, Sharma, Abhisheik, Srivastava, Shashank, Chaturvedi, Snigdha

arXiv.org Artificial Intelligence

Often when we interact with a first-person account of events, we consider whether or not the narrator, the primary speaker of the text, is reliable. In this paper, we propose using computational methods to identify unreliable narrators, i.e. those who unintentionally misrepresent information. Borrowing literary theory from narratology to define different types of unreliable narrators based on a variety of textual phenomena, we present TUNa, a human-annotated dataset of narratives from multiple domains, including blog posts, subreddit posts, hotel reviews, and works of literature. We define classification tasks for intra-narrational, inter-narrational, and inter-textual unreliabilities and analyze the performance of popular open-weight and proprietary LLMs for each. We propose learning from literature to perform unreliable narrator classification on real-world text data. To this end, we experiment with few-shot, fine-tuning, and curriculum learning settings. Our results show that this task is very challenging, and there is potential for using LLMs to identify unreliable narrators. We release our expert-annotated dataset and code and invite future research in this area.


MHTS: Multi-Hop Tree Structure Framework for Generating Difficulty-Controllable QA Datasets for RAG Evaluation

Lee, Jeongsoo, Kwon, Daeyong, Jin, Kyohoon, Jeong, Junnyeong, Sim, Minwoo, Kim, Minwoo

arXiv.org Artificial Intelligence

Existing RAG benchmarks often overlook query difficulty, leading to inflated performance on simpler questions and unreliable evaluations. A robust benchmark dataset must satisfy three key criteria: quality, diversity, and difficulty, which capturing the complexity of reasoning based on hops and the distribution of supporting evidence. In this paper, we propose MHTS (Multi-Hop Tree Structure), a novel dataset synthesis framework that systematically controls multi-hop reasoning complexity by leveraging a multi-hop tree structure to generate logically connected, multi-chunk queries. Our fine-grained difficulty estimation formula exhibits a strong correlation with the overall performance metrics of a RAG system, validating its effectiveness in assessing both retrieval and answer generation capabilities. By ensuring high-quality, diverse, and difficulty-controlled queries, our approach enhances RAG evaluation and benchmarking capabilities.


MultiActor-Audiobook: Zero-Shot Audiobook Generation with Faces and Voices of Multiple Speakers

Park, Kyeongman, Joo, Seongho, Jung, Kyomin

arXiv.org Artificial Intelligence

We introduce MultiActor-Audiobook, a zero-shot approach for generating audiobooks that automatically produces consistent, expressive, and speaker-appropriate prosody, including intonation and emotion. Previous audiobook systems have several limitations: they require users to manually configure the speaker's prosody, read each sentence with a monotonic tone compared to voice actors, or rely on costly training. However, our MultiActor-Audiobook addresses these issues by introducing two novel processes: (1) MSP (**Multimodal Speaker Persona Generation**) and (2) LSI (**LLM-based Script Instruction Generation**). With these two processes, MultiActor-Audiobook can generate more emotionally expressive audiobooks with a consistent speaker prosody without additional training. We compare our system with commercial products, through human and MLLM evaluations, achieving competitive results. Furthermore, we demonstrate the effectiveness of MSP and LSI through ablation studies.


Our favourite science fiction books of all time (the ones we forgot)

New Scientist

Is your favourite sci-fi novel included here, or have we forgotten it? Almost exactly a year ago, I asked our team of expert science writers here at New Scientist to name their favourite science fiction novels. Personal tastes meant we ended up with a wonderfully eclectic list, ranging from classics by the likes of Margaret Atwood and Octavia Butler to titles I'd not previously read (Jon Bois's 17776 was a particularly wild suggestion, from our US editor Chelsea Whyte – but it's well worth your time). We New Scientist staffers tend to be sci-fi nerds, and we realised we hadn't quite got all the greats yet. So here, for your reading pleasure, is our second take on our favourite sci-fi novels of all time, otherwise known as the ones we forgot. Again, we're not claiming this is a definitive list. It's just our top sci-fi reads, in no particular order, and we hope you'll discover some new favourites of your own in this line-up. We asked New Scientist staff to pick their favourite science fiction books. Here are the results, ranging from 19th-century classics to modern day offerings, and from Octavia E. Butler to Iain M. Banks And if we still haven't got them all, then come and tell us about it on Facebook.