open-access journal
Values That Are Explicitly Present in Fairy Tales: Comparing Samples from German, Italian and Portuguese Traditions
Diaz-Faes, Alba Morollon, Murteira, Carla Sofia Ribeiro, Ruskov, Martin
Looking at how social values are represented in fairy tales can give insights about the variations in communication of values across cultures. We study how values are communicated in fairy tales from Portugal, Italy and Germany using a technique called word embedding with a compass to quantify vocabulary differences and commonalities. We study how these three national traditions differ in their explicit references to values. To do this, we specify a list of value-charged tokens, consider their word stems and analyse the distance between these in a bespoke pre-trained Word2Vec model. We triangulate and critically discuss the validity of the resulting hypotheses emerging from this quantitative model. Our claim is that this is a reusable and reproducible method for the study of the values explicitly referenced in historical corpora. Finally, our preliminary findings hint at a shared cultural understanding and the expression of values such as Benevolence, Conformity, and Universalism across the studied cultures, suggesting the potential existence of a pan-European cultural memory.
- Europe > Germany (0.25)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- (13 more...)
Predicting Sustainable Development Goals Using Course Descriptions -- from LLMs to Conventional Foundation Models
Kharlashkin, Lev, Macias, Melany, Huovinen, Leo, Hämäläinen, Mika
We use an LLM named PaLM 2 to generate training data given a noisy human-authored course description input as input. We use this data to train several different smaller language models to predict SDGs for university courses. This work contributes to better university level adaptation of SDGs. The best performing model in our experiments was BART with an F1-score of 0.786.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Dominican Republic (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- (3 more...)
- Research Report (1.00)
- Instructional Material > Course Syllabus & Notes (1.00)
Incorporating Crowdsourced Annotator Distributions into Ensemble Modeling to Improve Classification Trustworthiness for Ancient Greek Papyri
West, Graham, Swindall, Matthew I., Keener, Ben, Player, Timothy, Williams, Alex C., Brusuelas, James H., Wallin, John F.
Performing classification on noisy, crowdsourced image datasets can prove challenging even for the best neural networks. Two issues which complicate the problem on such datasets are class imbalance and ground-truth uncertainty in labeling. The AL-ALL and AL-PUB datasets - consisting of tightly cropped, individual characters from images of ancient Greek papyri - are strongly affected by both issues. The application of ensemble modeling to such datasets can help identify images where the ground-truth is questionable and quantify the trustworthiness of those samples. As such, we apply stacked generalization consisting of nearly identical ResNets with different loss functions: one utilizing sparse cross-entropy (CXE) and the other Kullback-Liebler Divergence (KLD). Both networks use labels drawn from a crowd-sourced consensus. This consensus is derived from a Normalized Distribution of Annotations (NDA) based on all annotations for a given character in the dataset. For the second network, the KLD is calculated with respect to the NDA. For our ensemble model, we apply a k-nearest neighbors model to the outputs of the CXE and KLD networks. Individually, the ResNet models have approximately 93% accuracy, while the ensemble model achieves an accuracy of > 95%, increasing the classification trustworthiness. We also perform an analysis of the Shannon entropy of the various models' output distributions to measure classification uncertainty. Our results suggest that entropy is useful for predicting model misclassifications.
- Africa > Middle East > Egypt (0.14)
- North America > United States > Tennessee > Knox County > Knoxville (0.04)
- North America > United States > Kentucky (0.04)
- (3 more...)
- Information Technology > Communications > Social Media > Crowdsourcing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.55)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
The Challenges of HTR Model Training: Feedback from the Project Donner le gout de l'archive a l'ere numerique
Couture, Beatrice, Verret, Farah, Gohier, Maxime, Deslandres, Dominique
The arrival of handwriting recognition technologies offers new possibilities for research in heritage studies. However, it is now necessary to reflect on the experiences and the practices developed by research teams. Our use of the Transkribus platform since 2018 has led us to search for the most significant ways to improve the performance of our handwritten text recognition (HTR) models which are made to transcribe French handwriting dating from the 17th century. This article therefore reports on the impacts of creating transcribing protocols, using the language model at full scale and determining the best way to use base models in order to help increase the performance of HTR models. Combining all of these elements can indeed increase the performance of a single model by more than 20% (reaching a Character Error Rate below 5%). This article also discusses some challenges regarding the collaborative nature of HTR platforms such as Transkribus and the way researchers can share their data generated in the process of creating or training handwritten text recognition models.
- North America > Canada > Quebec > Montreal (0.07)
- Europe > France (0.05)
- Europe > Austria > Vienna (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Vision > Handwriting Recognition (0.75)
- Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
The Impact of Incumbent/Opposition Status and Ideological Similitude on Emotions in Political Manifestos
The study involved the analysis of emotion-associated language in the UK Conservative and Labour party general election manifestos between 2000 to 2019. While previous research have shown a general correlation between ideological positioning and overlap of public policies, there are still conflicting results in matters of sentiments in such manifestos. Using new data, we present how valence level can be swayed by party status within government with incumbent parties presenting a higher frequency in positive emotion-associated words while negative emotion-associated words are more prevalent in opposition parties. We also demonstrate that parties with ideological similitude use positive language prominently further adding to the literature on the relationship between sentiments and party status.
- Europe > United Kingdom (1.00)
- North America > United States (0.14)
- Europe > Germany (0.14)
- (3 more...)
- Government > Voting & Elections (1.00)
- Government > Regional Government > Europe Government > United Kingdom Government (0.96)
The Effects of Political Martyrdom on Election Results: The Assassination of Abe
In developed nations assassinations are rare and thus the impact of such acts on the electoral and political landscape is understudied. In this paper, we focus on Twitter data to examine the effects of Japan's former Primer Minister Abe's assassination on the Japanese House of Councillors elections in 2022. We utilize sentiment analysis and emotion detection together with topic modeling on over 2 million tweets and compare them against tweets during previous election cycles. Our findings indicate that Twitter sentiments were negatively impacted by the event in the short term and that social media attention span has shortened. We also discuss how "necropolitics" affected the outcome of the elections in favor of the deceased's party meaning that there seems to have been an effect of Abe's death on the election outcome though the findings warrant further investigation for conclusive results.. Keywords Japanese House of Councillors Elections; Abe assassination; sentiment analysis ...
- Asia > Middle East > Palestine (0.28)
- Europe > Germany (0.28)
- Asia > China (0.14)
- (14 more...)
- Information Technology > Services (1.00)
- Government > Voting & Elections (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Government > Regional Government > Asia Government > Japan Government (1.00)
Style Classification of Rabbinic Literature for Detection of Lost Midrash Tanhuma Material
Tannor, Shlomo, Dershowitz, Nachum, Lavee, Moshe
Midrash collections are complex rabbinic works that consist of text in multiple languages, which evolved through long processes of unstable oral and written transmission. Determining the origin of a given passage in such a compilation is not always straightforward and is often a matter of dispute among scholars, yet it is essential for scholars' understanding of the passage and its relationship to other texts in the rabbinic corpus. To help solve this problem, we propose a system for classification of rabbinic literature based on its style, leveraging recent advances in natural language processing for Hebrew texts. Additionally, we demonstrate how this method can be applied to uncover lost material from a specific midrash genre, Tanḥuma-Yelammedenu, that has been preserved in later anthologies. I INTRODUCTION Midrash, an integral genre within Jewish literature, encompasses a range of interpretative and narrative texts that seek to explore and expound upon the meanings of biblical scriptures. These texts incorporate a rich mix of legal, ethical, and philosophical discussions, allegories, parables, and homilies, offering deeper insights into the religious passages they explore.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.05)
- Asia > Middle East > Israel > Haifa District > Haifa (0.05)
- (7 more...)
A data science and machine learning approach to continuous analysis of Shakespeare's plays
Swisher, Charles, Shamir, Lior
The availability of quantitative text analysis methods has provided new ways of analyzing literature in a manner that was not available in the pre-information era. Here we apply comprehensive machine learning analysis to the work of William Shakespeare. The analysis shows clear changes in the style of writing over time, with the most significant changes in the sentence length, frequency of adjectives and adverbs, and the sentiments expressed in the text. Applying machine learning to make a stylometric prediction of the year of the play shows a Pearson correlation of 0.71 between the actual and predicted year, indicating that Shakespeare's writing style as reflected by the quantitative measurements changed over time. Additionally, it shows that the stylometrics of some of the plays is more similar to plays written either before or after the year they were written. For instance, Romeo and Juliet is dated 1596, but is more similar in stylometrics to plays written by Shakespeare after 1600. The source code for the analysis is available for free download. INTRODUCTION Being one of the most in influential authors in history, the analysis of the stylometrics of William Shakespeare has been a topic of substantial interest.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Kansas (0.04)
- (3 more...)
- Research Report > Experimental Study (0.70)
- Research Report > New Finding (0.48)
- Media (0.46)
- Leisure & Entertainment (0.46)
Affect as a proxy for literary mood
We propose to use affect as a proxy for mood in literary texts. In this study, we explore the differences in computationally detecting tone versus detecting mood. Methodologically we utilize affective word embeddings to look at the affective distribution in different text segments. We also present a simple yet efficient and effective method of enhancing emotion lexicons to take both semantic shift and the domain of the text into account producing real-world congruent results closely matching both contemporary and modern qualitative analyses. I INTRODUCTION In this study, we explore how the literary concept of mood can be studied and detected with computational methods.
- Europe > Finland > Southwest Finland > Turku (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Finland > Uusimaa > Helsinki (0.04)
- (13 more...)
From exemplar to copy: the scribal appropriation of a Hadewijch manuscript computationally explored
Haverals, Wouter, Kestemont, Mike
This study is devoted to two of the oldest known manuscripts in which the oeuvre of the medieval mystical author Hadewijch has been preserved: Brussels, KBR, 2879-2880 (ms. On the basis of codicological and contextual arguments, it is assumed that the scribe who produced B used A as an exemplar. While the similarities in both layout and content between the two manuscripts are striking, the present article seeks to identify the differences. After all, regardless of the intention to produce a copy that closely follows the exemplar, subtle linguistic variation is apparent. Divergences relate to spelling conventions, but also to the way in which words are abbreviated (and the extent to which abbreviations occur). The present study investigates the spelling profiles of the scribes who produced mss. In the first part of this study, we will present both manuscripts in more detail, after which we will consider prior research carried out on scribal profiling. The current study both builds and expands on Kestemont (2015). Next, we outline the methodology used to analyse and measure the degree of scribal appropriation that took place when ms. B was copied off the exemplar ms. A. After this, we will discuss the results obtained, focusing on the scribal variation that can be found both at the level of individual words and n-grams. To this end, we use machine learning to identify the most distinctive features that separate manuscript A from B. Finally, we look at possible diachronic trends in the appropriation by B's scribe of his exemplar. We argue that scribal takeovers in the exemplar impacts the practice of the copying scribe, while transitions to a different content matter cause little to no effect. INTRODUCTION Among the Royal Library of Belgium's (KBR) extraordinarily rich collection are two fourteenth-century manuscripts that are of great importance to the field of medieval Dutch literature in general, and that of mysticism in the Low Countries in particular: KBR 2879-80 and KBR 2877-78. Both manuscripts contain the complete oeuvre - consisting of letters, visions, songs, and poems - of the mystical writer Hadewijch. Since unambiguous biographical data are lacking, the historical figure of Hadewijch is largely shrouded in mystery. Through her work, however, one can get a modest glimpse of who she was and when she lived. Researchers who undertook this quest situate Hadewijch in the religious women's movement (mulieres religiosae) of the thirteenth century [Mommaers, 2003; Fraeters & Willaert, 2009, p. 13-19; Fraeters, 2013; Willaert, 2013].
- Europe > Belgium > Flanders > Antwerp Province > Antwerp (0.04)
- Oceania > Palau (0.04)
- North America > United States > New York (0.04)
- (3 more...)