Goto

Collaborating Authors

 alliteration


Extending Token Computation for LLM Reasoning

Liao, Bingli, Vargas, Danilo Vasconcellos

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are pivotal in advancing natural language processing but often struggle with complex reasoning tasks due to inefficient attention distributions. In this paper, we explore the effect of increased computed tokens on LLM performance and introduce a novel method for extending computed tokens in the Chain-of-Thought (CoT) process, utilizing attention mechanism optimization. By fine-tuning an LLM on a domain-specific, highly structured dataset, we analyze attention patterns across layers, identifying inefficiencies caused by non-semantic tokens with outlier high attention scores. To address this, we propose an algorithm that emulates early layer attention patterns across downstream layers to re-balance skewed attention distributions and enhance knowledge abstraction. Our findings demonstrate that our approach not only facilitates a deeper understanding of the internal dynamics of LLMs but also significantly improves their reasoning capabilities, particularly in non-STEM domains. Our study lays the groundwork for further innovations in LLM design, aiming to create more powerful, versatile, and responsible models capable of tackling a broad range of real-world applications.


ByGPT5: End-to-End Style-conditioned Poetry Generation with Token-free Language Models

Belouadi, Jonas, Eger, Steffen

arXiv.org Artificial Intelligence

State-of-the-art poetry generation systems are often complex. They either consist of task-specific model pipelines, incorporate prior knowledge in the form of manually created constraints, or both. In contrast, end-to-end models would not suffer from the overhead of having to model prior knowledge and could learn the nuances of poetry from data alone, reducing the degree of human supervision required. In this work, we investigate end-to-end poetry generation conditioned on styles such as rhyme, meter, and alliteration. We identify and address lack of training data and mismatching tokenization algorithms as possible limitations of past attempts. In particular, we successfully pre-train ByGPT5, a new token-free decoder-only language model, and fine-tune it on a large custom corpus of English and German quatrains annotated with our styles. We show that ByGPT5 outperforms other models such as mT5, ByT5, GPT-2 and ChatGPT, while also being more parameter efficient and performing favorably compared to humans. In addition, we analyze its runtime performance and demonstrate that it is not prone to memorization. We make our code, models, and datasets publicly available.


Learning Patterns of Assonance for Authorship Attribution of Historical Texts

Ivanov, Lubomir (Iona College)

AAAI Conferences

This paper deals with extracting and learning patterns of assonance as a stylistic feature for author attribution of historical texts. We describe an assonance extraction algorithm, and consider results from an extensive set of machine learning experiments, based on a historical corpus of 18th century American and British texts. The results are compared with those obtained from the use of other prosodic and traditional stylistic features.


Learning AI if You Suck at Math -- P7 -- The Magic of Natural Language Processing

#artificialintelligence

After discovering the amazing power of convolutional neural networks for image recognition in part five of this series, I decided to dive head first into Natural language Processing or NLP. This hotbed of machine learning research teaches computers to understand how people talk. When you ask Siri or the Google Assistant a question, it's NLP that drives the conversation. Of course, as an author of novels and articles, working with language seemed like the obvious next step for me. I may suck at math but words are my domain! So I set out to uncover what insights NLP could give me about my own area of mastery. I had so many questions. Had NLP uncovered the hidden keys to writing heart-wrenching poems? Could AIs turn phrases better than the Bard? Luckily, I had just the right project in mind to test the limits of NLP. I was in the midst of naming the second book in my epic sci-fi saga The Jasmine Wars but I'd struggled to find the perfect title. What if I could feed a neural net with the greatest titles of all time and have it deliver a title for the ages? This isn't my first foray into computer assisted title generation. There are a number of random title generators out on the interwebs that I've tried from time to time. They're the type of toy you play with for a few minutes and then move on.