Text is a basic material, a primary data layer, in many areas of humanities and social sciences. If we want to move forward with the agenda that the fields of digital humanities and computational social sciences are projecting, it is vital to bring together the technical areas that deal with automated text processing, and scholars in the humanities and social sciences. To foster new areas of research, it is necessary to not only understand what is out there in terms of proven technologies and infrastructures such as CLARIN, but also how the developers of text analytics can work with researchers in the humanities and social sciences to understand the challenges in each other's field better. What are the research questions of the researchers working on the texts?
The latter is, due to its importance, protected not only by national health legislations, yet also by Article 8 of the European Convention on Human Rights, Right to privacy. As already mentioned, medicine is a profession that requires a certain level of maintenance of secrecy of confidential information and according to the previous Court's decisions the secrecy is even more important in cases that involves psychiatric records. The robots' involvement in medical treatments on one hand and easy access to the information they gain during the treatment on the other, bring into question the effectiveness of the provisions of Article 8 of the European Convention on Human Rights. Current legislations in countries around the world do not put much attention on this particular area, even though the modern robotic approaches have already been introduced and also very well accepted.
Even though empirical research of computer-mediated communication (CMC) has a tradition of almost two decades, there are still only very few annotated CMC/social media corpora which are available to the scientific community and the public. One crucial issue is the unclear legal situation w.r.t. On the example of a legal expertise sought for the integration of an existing German chat corpus into CLARIN-D, the talk will highlight this issue (according to German law) and describe how it has been handled in the project. The creation of standards and the adaptation of NLP tools for that new type of language resource is a digital humanities topic par excellence since (1) it focuses on data which are born digital while at the same time (2) it requires a combination of expertise from humanities and computational sciences.
The text analysis part of the AMiCA project (http://www.amicaproject.be), a cooperation between the University of Antwerp and the University of Ghent, developed methods and software to help moderators detect occurrences of unwanted or dangerous situations in their social networks. More specifically, the project developed prototype systems for the detection of cyberbullying, suicide announcements, and sexually transgressive behavior. In this talk I will focus on the text analysis methods that were used for normalization of social media text, for profiling users, and for detecting dangerous content. I will describe the architectures and results of the three resulting applications.
With the increasing volume and impact of communication on social media, social media analysis has become one of the most trending topics in natural language research, which can be observed in a growing number of workshops and conferences dedicated to this topic, projects funded, and research centers established. As a result, a number of social media resources containing chats, online commentaries, reviews, blogs, emails, forums, etc., as well as audio and video recordings, have been accumulated in the repositories of CLARIN centers. What is more, due to their distinct communicative characteristics, they pose new technical challenges for the standard natural language processing tools as well as new legal and ethical challenges for the dissemination of such resources, which has also been addressed by CLARIN, making the available infrastructure an important means for attracting new users to the CLARIN community.
The aims of the CLARIN-PLUS workshop "Creation and Use of Social Media Resources" are: to demonstrate the possibilities of social media resources and natural language processing tools for researchers with a diverse research background who are interested in empirical research of language and social practices in computer-mediated communication; to promote interdisciplinary cooperation possibilities; to initiate a discussion on the various approaches to social media data collection and processing.