alliteration
Extending Token Computation for LLM Reasoning
Liao, Bingli, Vargas, Danilo Vasconcellos
Large Language Models (LLMs) are pivotal in advancing natural language processing but often struggle with complex reasoning tasks due to inefficient attention distributions. In this paper, we explore the effect of increased computed tokens on LLM performance and introduce a novel method for extending computed tokens in the Chain-of-Thought (CoT) process, utilizing attention mechanism optimization. By fine-tuning an LLM on a domain-specific, highly structured dataset, we analyze attention patterns across layers, identifying inefficiencies caused by non-semantic tokens with outlier high attention scores. To address this, we propose an algorithm that emulates early layer attention patterns across downstream layers to re-balance skewed attention distributions and enhance knowledge abstraction. Our findings demonstrate that our approach not only facilitates a deeper understanding of the internal dynamics of LLMs but also significantly improves their reasoning capabilities, particularly in non-STEM domains. Our study lays the groundwork for further innovations in LLM design, aiming to create more powerful, versatile, and responsible models capable of tackling a broad range of real-world applications.
- Research Report > New Finding (0.68)
- Research Report > Promising Solution (0.48)
ByGPT5: End-to-End Style-conditioned Poetry Generation with Token-free Language Models
Belouadi, Jonas, Eger, Steffen
State-of-the-art poetry generation systems are often complex. They either consist of task-specific model pipelines, incorporate prior knowledge in the form of manually created constraints, or both. In contrast, end-to-end models would not suffer from the overhead of having to model prior knowledge and could learn the nuances of poetry from data alone, reducing the degree of human supervision required. In this work, we investigate end-to-end poetry generation conditioned on styles such as rhyme, meter, and alliteration. We identify and address lack of training data and mismatching tokenization algorithms as possible limitations of past attempts. In particular, we successfully pre-train ByGPT5, a new token-free decoder-only language model, and fine-tune it on a large custom corpus of English and German quatrains annotated with our styles. We show that ByGPT5 outperforms other models such as mT5, ByT5, GPT-2 and ChatGPT, while also being more parameter efficient and performing favorably compared to humans. In addition, we analyze its runtime performance and demonstrate that it is not prone to memorization. We make our code, models, and datasets publicly available.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- (15 more...)
Learning Patterns of Assonance for Authorship Attribution of Historical Texts
Ivanov, Lubomir (Iona College)
This paper deals with extracting and learning patterns of assonance as a stylistic feature for author attribution of historical texts. We describe an assonance extraction algorithm, and consider results from an extensive set of machine learning experiments, based on a historical corpus of 18th century American and British texts. The results are compared with those obtained from the use of other prosodic and traditional stylistic features.
- Europe > Portugal > Évora > Évora (0.05)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.05)
- North America > United States > New York > Westchester County > New Rochelle (0.04)
- (7 more...)
Learning AI if You Suck at Math -- P7 -- The Magic of Natural Language Processing
After discovering the amazing power of convolutional neural networks for image recognition in part five of this series, I decided to dive head first into Natural language Processing or NLP. This hotbed of machine learning research teaches computers to understand how people talk. When you ask Siri or the Google Assistant a question, it's NLP that drives the conversation. Of course, as an author of novels and articles, working with language seemed like the obvious next step for me. I may suck at math but words are my domain! So I set out to uncover what insights NLP could give me about my own area of mastery. I had so many questions. Had NLP uncovered the hidden keys to writing heart-wrenching poems? Could AIs turn phrases better than the Bard? Luckily, I had just the right project in mind to test the limits of NLP. I was in the midst of naming the second book in my epic sci-fi saga The Jasmine Wars but I'd struggled to find the perfect title. What if I could feed a neural net with the greatest titles of all time and have it deliver a title for the ages? This isn't my first foray into computer assisted title generation. There are a number of random title generators out on the interwebs that I've tried from time to time. They're the type of toy you play with for a few minutes and then move on.
- Leisure & Entertainment (0.93)
- Education (0.88)
- Media (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)