Automatic Language Identification in Texts: A Survey
Jauhiainen, Tommi, Lui, Marco, Zampieri, Marcos, Baldwin, Timothy, Lindén, Krister
–Journal of Artificial Intelligence Research
Language identification ("LI") is the problem of determining the natural language that a document or part thereof is written in. Automatic LI has been extensively researched for over fifty years. Today, LI is a key part of many text processing pipelines, as text processing techniques generally assume that the language of the input text is known. Research in this area has recently been especially active. This article provides a brief history of LI research, and an extensive survey of the features and methods used in the LI literature. We describe the features and methods using a unified notation, to make the relationships between methods clearer. We discuss evaluation methods, applications of LI, as well as off-the-shelf LI systems that do not require training by the end user. Finally, we identify open issues, survey the work to date on each issue, and propose future directions for research in LI.
Journal of Artificial Intelligence Research
Aug-25-2019
- Country:
- South America
- Peru > Lima Department
- Lima Province > Lima (0.04)
- Brazil > Pernambuco
- Recife (0.04)
- Peru > Lima Department
- Oceania
- New Zealand > North Island
- Waikato (0.04)
- Auckland Region > Auckland (0.04)
- Australia
- New South Wales > Sydney (0.04)
- Victoria > Melbourne (0.04)
- Queensland > Brisbane (0.04)
- Australian Capital Territory > Canberra (0.04)
- New Zealand > North Island
- North America
- United States
- Hawaii (0.04)
- New Mexico (0.04)
- Ohio (0.04)
- District of Columbia > Washington (0.04)
- Texas > Travis County
- Austin (0.04)
- Michigan > Washtenaw County
- Ann Arbor (0.04)
- Nevada > Clark County
- Las Vegas (0.04)
- Colorado > Denver County
- Denver (0.04)
- New York
- New York County > New York City (0.13)
- Richmond County > New York City (0.04)
- Queens County > New York City (0.04)
- Kings County > New York City (0.04)
- Bronx County > New York City (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Oregon > Multnomah County
- Portland (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Illinois > Cook County
- Chicago (0.04)
- Indiana > Monroe County
- Bloomington (0.04)
- Wisconsin > Dane County
- Madison (0.04)
- Florida > Palm Beach County
- Boca Raton (0.04)
- California
- San Francisco County > San Francisco (0.14)
- Los Angeles County > Los Angeles (0.14)
- Santa Cruz County > Santa Cruz (0.13)
- San Diego County > San Diego (0.04)
- Santa Clara County > San Jose (0.04)
- Orange County > Newport Beach (0.04)
- Monterey County > Monterey (0.04)
- New Jersey > Hudson County
- Hoboken (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- Canada
- United States
- Europe
- Austria > Vienna (0.13)
- Bulgaria
- Sofia City Province > Sofia (0.04)
- Varna Province > Varna (0.04)
- Iceland > Capital Region
- Reykjavik (0.04)
- Slovenia > Central Slovenia
- Municipality of Ljubljana > Ljubljana (0.04)
- Hungary > Budapest
- Budapest (0.04)
- Germany
- Saarland (0.04)
- Berlin (0.04)
- Bremen > Bremen (0.04)
- North Rhine-Westphalia > Düsseldorf Region
- Düsseldorf (0.04)
- Bavaria > Upper Bavaria
- Munich (0.04)
- Baden-Württemberg
- Stuttgart Region > Stuttgart (0.04)
- Tübingen Region > Tübingen (0.04)
- Karlsruhe Region > Karlsruhe (0.04)
- Spain
- Galicia > Madrid (0.04)
- Valencian Community > Valencia Province
- Valencia (0.04)
- Catalonia > Girona Province
- Girona (0.04)
- Netherlands
- North Holland > Amsterdam (0.04)
- South Holland > The Hague (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.13)
- Greater Manchester > Salford (0.04)
- Greater London > London (0.04)
- Buckinghamshire > Milton Keynes (0.04)
- Bristol (0.04)
- Greece > Attica
- Athens (0.04)
- Romania > Vest Development Region
- Timiș County > Timișoara (0.04)
- France
- Île-de-France > Paris
- Paris (0.04)
- Provence-Alpes-Côte d'Azur > Bouches-du-Rhône
- Marseille (0.04)
- Pays de la Loire > Loire-Atlantique
- Nantes (0.04)
- Occitanie
- Hérault > Montpellier (0.04)
- Haute-Garonne > Toulouse (0.04)
- Île-de-France > Paris
- Italy
- Middle East
- Republic of Türkiye > Istanbul Province
- Istanbul (0.04)
- Malta > Port Region
- Southern Harbour District > Valletta (0.04)
- Cyprus > Nicosia
- Nicosia (0.04)
- Republic of Türkiye > Istanbul Province
- Belgium
- Wallonia > Walloon Brabant
- Louvain-la-Neuve (0.04)
- Flanders > Antwerp Province
- Antwerp (0.04)
- Wallonia > Walloon Brabant
- Finland > Uusimaa
- Helsinki (0.04)
- Croatia
- Zagreb County > Zagreb (0.04)
- Dubrovnik-Neretva County > Dubrovnik (0.04)
- Sweden
- Vaestra Goetaland > Gothenburg (0.04)
- Uppsala County > Uppsala (0.04)
- Östergötland County > Linköping (0.04)
- Stockholm > Stockholm (0.04)
- Czechia
- Prague (0.04)
- Karlovy Vary Region > Karlovy Vary (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Switzerland > Geneva
- Geneva (0.04)
- Portugal
- Asia
- Singapore (0.04)
- Macao (0.04)
- Taiwan > Taiwan Province
- Taipei (0.04)
- India
- Gujarat > Gandhinagar (0.04)
- Karnataka > Bengaluru (0.04)
- Maharashtra > Mumbai (0.04)
- Tamil Nadu (0.04)
- Kerala (0.04)
- Goa (0.04)
- West Bengal
- NCT
- Thailand
- Phuket > Phuket (0.04)
- Chiang Mai > Chiang Mai (0.04)
- South Korea > Seoul
- Seoul (0.04)
- Vietnam > Quảng Bình Province
- Đồng Hới (0.04)
- China
- Beijing > Beijing (0.04)
- Hong Kong (0.04)
- Zhejiang Province > Hangzhou (0.04)
- Middle East
- Jordan (0.04)
- Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- İzmir Province > İzmir (0.04)
- Ankara Province > Ankara (0.04)
- Qatar > Ad-Dawhah
- Doha (0.04)
- Israel > Tel Aviv District
- Tel Aviv (0.04)
- Indonesia
- Japan
- Kyūshū & Okinawa > Kyūshū
- Miyazaki Prefecture > Miyazaki (0.04)
- Honshū
- Kansai
- Osaka Prefecture > Osaka (0.04)
- Hyogo Prefecture > Kobe (0.04)
- Chūbu > Nagano Prefecture
- Nagano (0.04)
- Kansai
- Hokkaidō > Hokkaidō Prefecture
- Sapporo (0.04)
- Kyūshū & Okinawa > Kyūshū
- Philippines > Visayas
- Central Visayas > Province of Cebu > City of Cebu (0.04)
- Malaysia
- Penang (0.04)
- Kuala Lumpur > Kuala Lumpur (0.04)
- Africa
- South Africa
- KwaZulu-Natal > Pietermaritzburg (0.04)
- Free State > Bloemfontein (0.04)
- Gauteng
- Pretoria (0.04)
- Johannesburg (0.04)
- Middle East
- Morocco > Rabat-Salé-Kénitra Region
- Rabat (0.04)
- Egypt > Cairo Governorate
- Cairo (0.04)
- Morocco > Rabat-Salé-Kénitra Region
- South Africa
- South America
- Genre:
- Overview (1.00)
- Research Report > New Finding (0.92)
- Industry:
- Information Technology > Services (1.00)
- Education (1.00)
- Technology:
- Information Technology
- Communications > Social Media (1.00)
- Artificial Intelligence
- Representation & Reasoning > Uncertainty (1.00)
- Natural Language
- Text Processing (1.00)
- Machine Translation (1.00)
- Information Retrieval (1.00)
- Grammars & Parsing (1.00)
- Machine Learning
- Performance Analysis > Accuracy (1.00)
- Ensemble Learning (1.00)
- Neural Networks > Deep Learning (0.93)
- Statistical Learning
- Support Vector Machines (0.67)
- Clustering (0.67)
- Learning Graphical Models
- Directed Networks > Bayesian Learning (1.00)
- Undirected Networks > Markov Models (0.67)
- Information Technology