Bosch, Antal van den
Improving the Inclusivity of Dutch Speech Recognition by Fine-tuning Whisper on the JASMIN-CGN Corpus
Shekoufandeh, Golshid, Boersma, Paul, Bosch, Antal van den
We test and study the variation in speech recognition of fine-tuned versions of the Whisper model on child, elderly and non-native Dutch speech from the JASMIN-CGN corpus. Our primary goal is to evaluate how speakers' age and linguistic background influence Whisper's performance. Whisper achieves varying Word Error Rates (WER) when fine-tuned on subpopulations of specific ages and linguistic backgrounds. Fine-tuned performance is remarkably better than zero-shot performance, achieving a relative reduction in WER of 81% for native children, 72% for non-native children, 67% for non-native adults, and 65% for native elderly people. Our findings underscore the importance of training speech recognition models like Whisper on underrepresented subpopulations such as children, the elderly, and non-native speakers.
The Impact of Featuring Comments in Online Discussions
Waterschoot, Cedric, Hemel, Ernst van den, Bosch, Antal van den
A widespread moderation strategy by online news platforms is to feature what the platform deems high quality comments, usually called editor picks or featured comments. In this paper, we compare online discussions of news articles in which certain comments are featured, versus discussions in which no comments are featured. We measure the impact of featuring comments on the discussion, by estimating and comparing the quality of discussions from the perspective of the user base and the platform itself. Our analysis shows that the impact on discussion quality is limited. However, we do observe an increase in discussion activity after the first comments are featured by moderators, suggesting that the moderation strategy might be used to increase user engagement and to postpone the natural decline in user activity over time.
Hybrid moderation in the newsroom: Recommending featured posts to content moderators
Waterschoot, Cedric, Bosch, Antal van den
Online news outlets are grappling with the moderation of user-generated content within their comment section. We present a recommender system based on ranking class probabilities to support and empower the moderator in choosing featured posts, a time-consuming task. By combining user and textual content features we obtain an optimal classification F1-score of 0.44 on the test set. Furthermore, we observe an optimum mean NDCG@5 of 0.87 on a large set of validation articles. As an expert evaluation, content moderators assessed the output of a random selection of articles by choosing comments to feature based on the recommendations, which resulted in a NDCG score of 0.83. We conclude that first, adding text features yields the best score and second, while choosing featured content remains somewhat subjective, content moderators found suitable comments in all but one evaluated recommendations. We end the paper by analyzing our best-performing model, a step towards transparency and explainability in hybrid content moderation.
A Kids' Open Mind Common Sense
Bosch, Antal van den (Tilburg University) | Nauts, Pim (Tilburg University) | Eckhardt, Nienke (Tilburg University)
We propose a collaborative approach to the issue of resource creation for commonsense computing by developing a collaboratory application aimed at children. Human validation is enabled through a game-with-a-purpose (GWAP) interface, gathering reliability judgements of assertions that can be used to aid the process of resource validation. Our experiments confirm that children aged 10 to 12 can be valuable and reliable partners in building commonsense databases, due to their stage of mental development and their eagerness to play GWAPs. Results show that children adapt their word choice in the assertions they provide to the difficulty level of the stimuli words, and that the judgements gathered through in-game validation can help to validate about 30% of the gathered statements automatically.
Learning Statistically Neutral Tasks without Expert Guidance
Weijters, Ton, Bosch, Antal van den, Postma, Eric O.
He intended to build a model mimicking the behavior of the autistic savant without the need either to develop arithmetical skills or to encode explicit knowledge about regularities in the structure of dates. A standard multilayer network trained with backpropagation [6] was not able to solve the date-calculation task. Although the network was able to learn the examples used for training, it did not manage to generalize to novel date-day combinations.
Learning Statistically Neutral Tasks without Expert Guidance
Weijters, Ton, Bosch, Antal van den, Postma, Eric O.