Twitter Cortex Proposes LMSOC for Socially Sensitive Pretraining
A phrase like "It's cold today" would suggest a very different temperature if it were uttered in Nairobi or Montreal, while words like "troll" and "tweet" referred to totally different things just a generation ago. Although contemporary large-scale pretrained language models are very effective at learning linguistic representations, they are not as well equipped at capturing speaker/author-related temporal, geographical, social and other contextual aspects. In the new paper LMSOC: An Approach for Socially Sensitive Pretraining, a Twitter Cortex research team proposes LMSOC, a simple but effective approach for learning both linguistically contextualized and socially sensitive representations in large-scale language models. An implicit assumption in most pretrained language models (PLMs) is that language is independent of extra-linguistic contexts such as speaker/author identity and social settings. Despite the impressive achievements of PLMs, this remains a critical weakness, as there is strong evidence that socio-linguistics can significantly impact social context processing performance.
Dec-13-2021, 17:43:12 GMT
- AI-Alerts:
- 2021 > 2021-12 > AAAI AI-Alert for Dec 14, 2021 (1.00)
- Country:
- Africa > Kenya
- Nairobi City County > Nairobi (0.26)
- North America > Canada
- Africa > Kenya
- Technology: