Computational Job Market Analysis with Natural Language Processing
–arXiv.org Artificial Intelligence
[Abridged Abstract] Recent technological advances underscore labor market dynamics, yielding significant consequences for employment prospects and increasing job vacancy data across platforms and languages. Aggregating such data holds potential for valuable insights into labor market demands, new skills emergence, and facilitating job matching for various stakeholders. However, despite prevalent insights in the private sector, transparent language technology systems and data for this domain are lacking. This thesis investigates Natural Language Processing (NLP) technology for extracting relevant information from job descriptions, identifying challenges including scarcity of training data, lack of standardized annotation guidelines, and shortage of effective extraction methods from job ads. We frame the problem, obtaining annotated data, and introducing extraction methodologies. Our contributions include job description datasets, a de-identification dataset, and a novel active learning algorithm for efficient model training. We propose skill extraction using weak supervision, a taxonomy-aware pre-training methodology adapting multilingual language models to the job market domain, and a retrieval-augmented model leveraging multiple skill extraction datasets to enhance overall performance. Finally, we ground extracted information within a designated taxonomy.
arXiv.org Artificial Intelligence
Apr-29-2024
- Country:
- South America > Chile
- Oceania > Australia
- North America
- Dominican Republic (0.04)
- United States
- Washington > King County
- Seattle (0.14)
- Texas
- Travis County > Austin (0.04)
- Harris County > Houston (0.04)
- New York
- New York County > New York City (0.13)
- Richmond County > New York City (0.04)
- Queens County > New York City (0.04)
- Kings County > New York City (0.04)
- Bronx County > New York City (0.04)
- New Mexico > Santa Fe County
- Santa Fe (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- California
- San Francisco County > San Francisco (0.27)
- San Diego County > San Diego (0.04)
- Los Angeles County > Long Beach (0.04)
- Washington > King County
- Canada
- Ontario > Toronto (0.04)
- Quebec > Montreal (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Europe
- Russia (0.04)
- Czechia > Prague (0.04)
- Bulgaria
- Varna Province > Varna (0.04)
- Sofia City Province > Sofia (0.04)
- Iceland > Capital Region
- Reykjavik (0.04)
- Italy > Tuscany
- Florence (0.04)
- Germany
- Berlin (0.04)
- North Rhine-Westphalia > Upper Bavaria
- Munich (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Spain
- Valencian Community > Valencia Province
- Valencia (0.04)
- Catalonia > Barcelona Province
- Barcelona (0.04)
- Andalusia > Seville Province
- Seville (0.04)
- Valencian Community > Valencia Province
- Denmark > Capital Region
- Copenhagen (0.14)
- United Kingdom > England
- Tyne and Wear > Newcastle (0.04)
- Middle East
- Sweden > Östergötland County
- Linköping (0.04)
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Asia
- Singapore (0.14)
- China > Hong Kong (0.04)
- Russia (0.04)
- Thailand > Phuket
- Phuket (0.04)
- Myanmar > Tanintharyi Region
- Dawei (0.04)
- Middle East
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.14)
- Republic of Türkiye > Istanbul Province
- Istanbul (0.04)
- Qatar > Ad-Dawhah
- Doha (0.04)
- UAE > Abu Dhabi Emirate
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.27)
- Africa > Ethiopia
- Addis Ababa > Addis Ababa (0.04)
- Genre:
- Overview (1.00)
- Instructional Material (0.92)
- Research Report
- New Finding (1.00)
- Experimental Study (0.92)
- Industry:
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Banking & Finance > Economy (0.68)
- Government > Regional Government (0.67)
- Health & Medicine > Health Care Technology
- Medical Record (0.67)
- Education > Curriculum
- Subject-Specific Education (0.45)
- Technology:
- Information Technology > Artificial Intelligence
- Natural Language
- Text Processing (1.00)
- Large Language Model (1.00)
- Information Retrieval (0.93)
- Chatbot (0.67)
- Machine Translation (0.67)
- Machine Learning
- Statistical Learning (1.00)
- Performance Analysis > Accuracy (1.00)
- Neural Networks > Deep Learning (1.00)
- Inductive Learning (0.92)
- Supervised Learning (0.67)
- Natural Language
- Information Technology > Artificial Intelligence