An Open Dataset and Model for Language Identification