AITopics | Öhman, Joey

Collaborating Authors

Öhman, Joey

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Text Annotation Handbook: A Practical Guide for Machine Learning Projects

Stollenwerk, Felix, Öhman, Joey, Petrelli, Danila, Wallerö, Emma, Olsson, Fredrik, Bengtsson, Camilla, Horndahl, Andreas, Gandler, Gabriela Zarzar

arXiv.org Artificial IntelligenceOct-18-2023

This handbook is a hands-on guide on how to approach text annotation tasks. It provides a gentle introduction to the topic, an overview of theoretical concepts as well as practical advice. The topics covered are mostly technical, but business, ethical and regulatory issues are also touched upon. The focus lies on readability and conciseness rather than completeness and scientific rigor. Experience with annotation and knowledge of machine learning are useful but not required. The document may serve as a primer or reference book for a wide range of professions such as team leaders, project managers, IT architects, software developers and machine learning engineers.

artificial intelligence, machine learning project, text annotation handbook, (2 more...)

arXiv.org Artificial Intelligence

2310.1178

Genre:

Overview (0.53)
Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Annotated Job Ads with Named Entity Recognition

Stollenwerk, Felix, Fastlund, Niklas, Nyqvist, Anna, Öhman, Joey

arXiv.org Artificial IntelligenceOct-18-2023

We have trained a named entity recognition (NER) model that screens Swedish job ads for different kinds of useful information (e.g. skills required from a job seeker). It was obtained by fine-tuning KB-BERT. The biggest challenge we faced was the creation of a labelled dataset, which required manual annotation. This paper gives an overview of the methods we employed to make the annotation process more efficient and to ensure high quality data. We also report on the performance of the resulting model.

artificial intelligence, entity recognition, text processing, (2 more...)

arXiv.org Artificial Intelligence

2310.11769

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

GPT-SW3: An Autoregressive Language Model for the Nordic Languages

Ekgren, Ariel, Gyllensten, Amaru Cuba, Stollenwerk, Felix, Öhman, Joey, Isbister, Tim, Gogoulou, Evangelia, Carlsson, Fredrik, Heiman, Alice, Casademont, Judit, Sahlgren, Magnus

arXiv.org Artificial IntelligenceMay-23-2023

We have faced all of these challenges in our work on developing the first native LLM for the There is a growing interest in building and applying Nordic (or, more accurately, North Germanic) languages. Large Language Models (LLMs) for languages The LLM, which we call GPT-SW3, other than English. This interest has is a continuation of our previous Swedish-only been fuelled partly by the unprecedented popularity model (Ekgren et al., 2022), and is a collection of ChatGPT

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2305.12987

Country: Europe > Sweden (0.47)

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

The Nordic Pile: A 1.2TB Nordic Dataset for Language Modeling

Öhman, Joey, Verlinden, Severine, Ekgren, Ariel, Gyllensten, Amaru Cuba, Isbister, Tim, Gogoulou, Evangelia, Carlsson, Fredrik, Sahlgren, Magnus

arXiv.org Artificial IntelligenceMar-30-2023

Pre-training Large Language Models (LLMs) require massive amounts of text data, and the performance of the LLMs typically correlates with the scale and quality of the datasets. This means that it may be challenging to build LLMs for smaller languages such as Nordic ones, where the availability of text corpora is limited. In order to facilitate the development of the LLMS in the Nordic languages, we curate a high-quality dataset consisting of 1.2TB of text, in all of the major North Germanic languages (Danish, Icelandic, Norwegian, and Swedish), as well as some high-quality English data. This paper details our considerations and processes for collecting, cleaning, and filtering the dataset.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2303.17183

Country:

Europe (0.93)
North America > United States > New York (0.28)

Genre: Research Report (0.50)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback