Hierarchical Classification of Transversal Skills in Job Ads Based on Sentence Embeddings

Leon, Florin, Gavrilescu, Marius, Floria, Sabina-Adriana, Minea, Alina-Adriana

arXiv.org Artificial Intelligence 

The field of text classification, a fundamental subdomain within the natural language processing (NLP) field of machine learning (ML), has witnessed a remarkable evolution in recent years. With the exponential increase in textual data generated across various domains, the need for effective text classification methods has become increasingly pressing. Text classification is the task of assigning predefined labels or categories to textual documents based on their content. This task holds immense importance across various industries and applications, including but not limited to sentiment analysis, spam detection, content recommendation, and news classification. The ability to automatically organize and categorize large volumes of text can streamline information retrieval, enhance decision-making processes, and enable efficient data management. Traditional text classification methods rely on well-established techniques such as term frequency - inverse document frequency (TF-IDF) representations and traditional ML algorithms. TF-IDF measures the importance of each term within a document relative to a corpus of documents, providing a numerical representation of textual data.