AITopics | clarin

Collaborating Authors

clarin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CLASSLA-Express: a Train of CLARIN.SI Workshops on Language Resources and Tools with Easily Expanding Route

Ljubešić, Nikola, Kuzman, Taja, Petrović, Ivana Filipović, Parizoska, Jelena, Osenova, Petya

arXiv.org Artificial IntelligenceDec-2-2024

This paper introduces the CLASSLA-Express workshop series as an innovative approach to disseminating linguistic resources and infrastructure provided by the CLASSLA Knowledge Centre for South Slavic languages and the Slovenian CLARIN.SI infrastructure. The workshop series employs two key strategies: (1) conducting workshops directly in countries with interested audiences, and (2) designing the series for easy expansion to new venues. The first iteration of the CLASSLA-Express workshop series encompasses 6 workshops in 5 countries. Its goal is to share knowledge on the use of corpus querying tools, as well as the recently-released CLASSLA-web corpora - the largest general corpora for South Slavic languages. In the paper, we present the design of the workshop series, its current scope and the effortless extensions of the workshop to new venues that are already in sight.

clarin, workshop, workshop series, (10 more...)

arXiv.org Artificial Intelligence

2412.01386

Country:

Europe > Croatia > Zagreb County > Zagreb (0.06)
Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.06)
Europe > Croatia > Primorje-Gorski Kotar County > Rijeka (0.06)
(5 more...)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry: Education (0.47)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation

Ljubešić, Nikola, Kuzman, Taja

arXiv.org Artificial IntelligenceMar-26-2024

This paper presents a collection of highly comparable web corpora of Slovenian, Croatian, Bosnian, Montenegrin, Serbian, Macedonian, and Bulgarian, covering thereby the whole spectrum of official languages in the South Slavic language space. The collection of these corpora comprises a total of 13 billion tokens of texts from 26 million documents. The comparability of the corpora is ensured by a comparable crawling setup and the usage of identical crawling and post-processing technology. All the corpora were linguistically annotated with the state-of-the-art CLASSLA-Stanza linguistic processing pipeline, and enriched with document-level genre information via the Transformer-based multilingual X-GENRE classifier, which further enhances comparability at the level of linguistic annotation and metadata enrichment. The genre-focused analysis of the resulting corpora shows a rather consistent distribution of genres throughout the seven corpora, with variations in the most prominent genre categories being well-explained by the economic strength of each language community. A comparison of the distribution of genre categories across the corpora indicates that web corpora from less developed countries primarily consist of news articles.

clarin, classla-web, corpora, (14 more...)

arXiv.org Artificial Intelligence

2403.12721

Country:

Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.05)
Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
Europe > Southeast Europe (0.04)
(6 more...)

Genre: Research Report > New Finding (0.46)

Industry: Law (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Doing text analytics for Digital Humanities and Social Sciences with CLARIN (LDK tutorial), Galway 2017

VideoLectures.NETJul-27-2017, 18:05:33 GMT

Text is a basic material, a primary data layer, in many areas of humanities and social sciences. If we want to move forward with the agenda that the fields of digital humanities and computational social sciences are projecting, it is vital to bring together the technical areas that deal with automated text processing, and scholars in the humanities and social sciences. Much progress has been made in the last two decades in text analytics, a field that draws on recent advances in computational linguistics, information retrieval and machine learning. By now we know what to expect from basic tools, such as named entity recognition. To foster new areas of research, it is necessary to not only understand what is out there in terms of proven technologies and infrastructures such as CLARIN, but also how the developers of text analytics can work with researchers in the humanities and social sciences to understand the challenges in each other's field better.

artificial intelligence, natural language, text processing, (5 more...)

VideoLectures.NET

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.91)

Add feedback