Goto

Collaborating Authors

 austen


Generation, Evaluation, and Explanation of Novelists' Styles with Single-Token Prompts

Rezaei, Mosab, Moghadam, Mina Rajaei, Shaikh, Abdul Rahman, Alhoori, Hamed, Freedman, Reva

arXiv.org Artificial Intelligence

Abstract--Recent advances in large language models have created new opportunities for stylometry, the study of writing styles and authorship. Two challenges, however, remain central: training generative models when no paired data exist, and evaluating stylistic text without relying only on human judgment. In this work, we present a framework for both generating and evaluating sentences in the style of 19th-century novelists. Large language models are fine-tuned with minimal, single-token prompts to produce text in the voices of authors such as Dickens, Austen, Twain, Alcott, and Melville. T o assess these generative models, we employ a transformer-based detector trained on authentic sentences, using it both as a classifier and as a tool for stylistic explanation. We complement this with syntactic comparisons and explainable AI methods, including attention-based and gradient-based analyses, to identify the linguistic cues that drive stylistic imitation. Our findings show that the generated text reflects the authors' distinctive patterns and that AI-based evaluation offers a reliable alternative to human assessment. All artifacts of this work are published online. The ability to recognize and reproduce an author's writing style has long fascinated both literary scholars and computer scientists. Stylometry, the quantitative study of writing style, rests on the idea that every author leaves behind unconscious patterns in vocabulary, syntax, and rhythm [2, 3]. These patterns have been analyzed for centuries in questions of disputed authorship, the study of literary traditions, and more recently in applications such as security and forensics [4].


Dating apps, booze and clubbing - Jane Austen's Emma comes into the 21st Century

BBC News

Dating apps, booze and clubbing - Jane Austen's Emma comes into the 21st Century And your pushy best friend is trying to sort out your love life. It's Jane Austen's Emma, but not as you know it. For the uninitiated, the 1815 novel follows the charmed life of our protagonist in Regency England as she busies herself interfering in her friends' relationships (or matchmaking, depending on your point of view). In Ava Pickett's fresh adaptation, being staged at London's Rose Theatre, Emma Woodhouse still has all the trademark traits of our beloved original heroine - she's clever, quick-witted, meddling, haughty and occasionally cruel. But instead of navigating society balls and dowries, Pickett's modern Emma is poking her nose into her friends' online dating profiles, having returned home after failing her exams at Oxford University.


Evaluating Computational Representations of Character: An Austen Character Similarity Benchmark

Yang, Funing, Anderson, Carolyn Jane

arXiv.org Artificial Intelligence

Several systems have been developed to extract information about characters to aid computational analysis of English literature. We propose character similarity grouping as a holistic evaluation task for these pipelines. We present AustenAlike, a benchmark suite of character similarities in Jane Austen's novels. Our benchmark draws on three notions of character similarity: a structurally defined notion of similarity; a socially defined notion of similarity; and an expert defined set extracted from literary criticism. We use AustenAlike to evaluate character features extracted using two pipelines, BookNLP and FanfictionNLP. We build character representations from four kinds of features and compare them to the three AustenAlike benchmarks and to GPT-4 similarity rankings. We find that though computational representations capture some broad similarities based on shared social and narrative roles, the expert pairings in our third benchmark are challenging for all systems, highlighting the subtler aspects of similarity noted by human readers.


Oh, Not This Again: "AI Will Rise Up and Destroy Mankind"

#artificialintelligence

We analyze the expected behavior of an advanced artificial agent with a learned goal planning in an unknown environment. Given a few assumptions, we argue that it will encounter a fundamental ambiguity in the data about its goal. For example, if we provide a large reward to indicate that something about the world is satisfactory to us, it may hypothesize that what satisfied us was the sending of the reward itself; no observation can refute that. Then we argue that this ambiguity will lead it to intervene in whatever protocol we set up to provide data for the agent about its goal. We discuss an analogous failure mode of approximate solutions to assistance games. Finally, we briefly review some recent approaches that may avoid this problem.


Pride and Prejudice and Z-scores

#artificialintelligence

You might think literary criticism is no place for statistical analysis, but given digital versions of the text you can, for example, use sentiment analysis to infer the dramatic arc of an Oscar Wilde novel. Now you can apply similar techniques to the works of Jane Austen thanks to Julia Silge's R package janeaustenr (available on CRAN). The package includes the full text the 6 Austen novels, including Pride and Prejudice and Sense and Sensibility. With the novels' text in hand, Julia then applied Bing sentiment analysis (as implemented in R's syuzhet package), shown here with annotations marking the major dramatic turns in the book: There's quite a lot of noise in that chart, so Julia took the elegant step of using a low-pass fourier transform to smooth the sentiment for all six novels, which allows for a comparison of the dramatic arcs: This is super interesting to me. Emma and Northanger Abbey have the most similar plot trajectories, with their tales of immature women who come to understand their own folly and grow up a bit.


A Note on Local Ultrametricity in Text

Murtagh, Fionn

arXiv.org Artificial Intelligence

Structures that are inherent to data of any type can be of import ance, and hierarchical structure is a prime example. In this work we take text corpora and assess the extent of hierarchical structure among words co nstituting the texts. By comprehensively taking context into account we seek to study hierarchical structures in the domain semantics. The data studied in Rammal et al. (1986) and Murtagh (2004) is point pattern data: observational features with their measurements on many coordinate dimensions. Data may be instead presented as time-varyin g signals and in a similar way, related to the findings of Rammal et al. (1986) and 1 Murtagh (2004), we have investigated ultrametric-related prope rties of time series or 1D signals in Murtagh (2005a).