Goto

Collaborating Authors

 wordplay


Welcome to the Slopverse

The Atlantic - Technology

Listen to more stories on the Noa app. Bill Lowery, a sales executive, is confused when a workmate asks where he should take a date out for dinosaur. "You're planning to take this girl out for?" "That's right," the colleague responds, totally nonchalant. Lowery presses him, agitated: "Wait a minute. What is this, some sort of new-wave expression or something--saying instead of?" "He's so pale and awfully congested--and he didn't touch his dinosaur when I took it in to him."


Pun Unintended: LLMs and the Illusion of Humor Understanding

Zangari, Alessandro, Marcuzzo, Matteo, Albarelli, Andrea, Pilehvar, Mohammad Taher, Camacho-Collados, Jose

arXiv.org Artificial Intelligence

Puns are a form of humorous wordplay that exploits polysemy and phonetic similarity. While LLMs have shown promise in detecting puns, we show in this paper that their understanding often remains shallow, lacking the nuanced grasp typical of human interpretation. By systematically analyzing and reformulating existing pun benchmarks, we demonstrate how subtle changes in puns are sufficient to mislead LLMs. Our contributions include comprehensive and nuanced pun detection benchmarks, human evaluation across recent LLMs, and an analysis of the robustness challenges these models face in processing puns.


Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings

Taylor, Russell, Herbert, Benjamin, Sana, Michael

arXiv.org Artificial Intelligence

Translating wordplay across languages presents unique challenges that have long confounded both professional human translators and machine translation systems. This research proposes a novel approach for translating puns from English to French by combining state-of-the-art large language models with specialized techniques for wordplay generation. Our methodology employs a three-stage approach. First, we establish a baseline using multiple frontier large language models with feedback based on a new contrastive learning dataset. Second, we implement a guided chain-of-thought pipeline with combined phonetic-semantic embeddings. Third, we implement a multi-agent generator-discriminator framework for evaluating and regenerating puns with feedback. Moving beyond the limitations of literal translation, our methodology's primary objective is to capture the linguistic creativity and humor of the source text wordplay, rather than simply duplicating its vocabulary. Our best runs earned first and second place in the CLEF JOKER 2025 Task 2 competition where they were evaluated manually by expert native French speakers. This research addresses a gap between translation studies and computational linguistics by implementing linguistically-informed techniques for wordplay translation, advancing our understanding of how language models can be leveraged to handle the complex interplay between semantic ambiguity, phonetic similarity, and the implicit cultural and linguistic awareness needed for successful humor.


A Reasoning-Based Approach to Cryptic Crossword Clue Solving

Andrews, Martin, Witteveen, Sam

arXiv.org Artificial Intelligence

Cryptic crossword clues are challenging language tasks for which new test sets are released daily by major newspapers on a global basis. Each cryptic clue contains both the definition of the answer to be placed in the crossword grid (in common with regular crosswords), and 'wordplay' that proves that the answer is correct (i.e. a human solver can be confident that an answer is correct without needing crossing words as confirmation). This work describes an LLM-based reasoning system built from open-licensed components that solves cryptic clues by (i) hypothesising answers; (ii) proposing wordplay explanations; and (iii) using a verifier system that operates on codified reasoning steps. Overall, this system establishes a new state-of-the-art performance on the challenging Cryptonite dataset of clues from The Times and The Telegraph newspapers in the UK. Because each proved solution is expressed in Python, interpretable wordplay reasoning for proven answers is available for inspection.


KoWit-24: A Richly Annotated Dataset of Wordplay in News Headlines

Baranov, Alexander, Palatkina, Anna, Makovka, Yulia, Braslavski, Pavel

arXiv.org Artificial Intelligence

We present KoWit-24, a dataset with fine-grained annotation of wordplay in 2,700 Russian news headlines. KoWit-24 annotations include the presence of wordplay, its type, wordplay anchors, and words/phrases the wordplay refers to. Unlike the majority of existing humor collections of canned jokes, KoWit-24 provides wordplay contexts -- each headline is accompanied by the news lead and summary. The most common type of wordplay in the dataset is the transformation of collocations, idioms, and named entities -- the mechanism that has been underrepresented in previous humor datasets. Our experiments with five LLMs show that there is ample room for improvement in wordplay detection and interpretation tasks. The dataset and evaluation scripts are available at https://github.com/Humor-Research/KoWit-24


Proving that Cryptic Crossword Clue Answers are Correct

Andrews, Martin, Witteveen, Sam

arXiv.org Artificial Intelligence

Cryptic crossword clues are challenging cognitive tasks, for which new test sets are released on a daily basis by multiple international newspapers. Each cryptic clue contains both the definition of the answer to be placed in the crossword grid (in common with regular crosswords), and `wordplay' that proves that the answer is correct (i.e. a human solver can be confident that an answer is correct without needing crossing words to confirm it). Using an existing cryptic wordplay proving framework (operating on Python proofs created by an LLM), we show that it is possible to distinguish between correct answers and almost-correct ones based upon whether the wordplay `works'.


Are LLMs Good Cryptic Crossword Solvers?

Sadallah, Abdelrahman "Boda", Kotova, Daria, Kochmar, Ekaterina

arXiv.org Artificial Intelligence

Cryptic crosswords are puzzles that rely not only on general knowledge but also on the solver's ability to manipulate language on different levels and deal with various types of wordplay. Previous research suggests that solving such puzzles is a challenge even for modern NLP models. However, the abilities of large language models (LLMs) have not yet been tested on this task. In this paper, we establish the benchmark results for three popular LLMs -- LLaMA2, Mistral, and ChatGPT -- showing that their performance on this task is still far from that of humans.


ChatGPT is fun, but it is not funny! Humor is still challenging Large Language Models

Jentzsch, Sophie, Kersting, Kristian

arXiv.org Artificial Intelligence

Humor is a central aspect of human communication that has not been solved for artificial agents so far. Large language models (LLMs) are increasingly able to capture implicit and contextual information. Especially, OpenAI's ChatGPT recently gained immense public attention. The GPT3-based model almost seems to communicate on a human level and can even tell jokes. Humor is an essential component of human communication. But is ChatGPT really funny? We put ChatGPT's sense of humor to the test. In a series of exploratory experiments around jokes, i.e., generation, explanation, and detection, we seek to understand ChatGPT's capability to grasp and reproduce human humor. Since the model itself is not accessible, we applied prompt-based experiments. Our empirical evidence indicates that jokes are not hard-coded but mostly also not newly generated by the model. Over 90% of 1008 generated jokes were the same 25 Jokes. The system accurately explains valid jokes but also comes up with fictional explanations for invalid jokes. Joke-typical characteristics can mislead ChatGPT in the classification of jokes. ChatGPT has not solved computational humor yet but it can be a big leap toward "funny" machines.


GPT-4 is surprisingly good at explaining jokes

#artificialintelligence

Explaining a joke, as E.B. White once wrote, is like dissecting a frog: "the thing dies in the process and the innards are discouraging to any but the purely scientific mind." In fact, the large language model -- released on March 14 by OpenAI -- is surprisingly good at generating detailed explanations of why a joke is funny. And like its predecessor, ChatGPT, the AI can also generate jokes, though its go-to one-liners are simple and seem to have been scraped from the internet's corniest, punniest corners (Why don't scientists trust atoms? Because they make up everything!). GPT-4 seems better at explaining humor than its predecessor.


Witscript 2: A System for Generating Improvised Jokes Without Wordplay

Toplyn, Joe

arXiv.org Artificial Intelligence

A previous paper presented Witscript, a system for generating conversational jokes that rely on wordplay. This paper extends that work by presenting Witscript 2, which uses a large language model to generate conversational jokes that rely on common sense instead of wordplay. Like Witscript, Witscript 2 is based on joke-writing algorithms created by an expert comedy writer. Human evaluators judged Witscript 2's responses to input sentences to be jokes 46% of the time, compared to 70% of the time for human-written responses. This is evidence that Witscript 2 represents another step toward giving a chatbot a humanlike sense of humor.