Goto

Collaborating Authors

dialect


AI: The Inverse Tower of Babbel

#artificialintelligence

The Old Testament's'Tower of Babel' story is an origin myth that tries to explain why humanity doesn't speak a single, universal language. According to the Bible, a united human race that speaks the same language arrived in the land of Shinar and decided to build a tower tall enough to reach heaven. Annoyed -- once again, it can probably be said -- by humanity's growing arrogance and budding hubris, God confounded humanity's speech, dividing its people into separate linguistic groups that couldn't understand one another. Just to ensure they don't start comparing and contrasting their languages to reach some form of translating breakthrough, God dispersed humankind to all corners of the earth and set the stage for what is today a world of 6,500 languages. For God, a job well done and the situation remained static for centuries, that was until tribes starting trading with each other, armies started fighting one another, and diplomats initiated conflict resolution measures to try to end the wars that were often started due to misunderstandings of one kind or another.


AI: the Inverse Tower of Babel

#artificialintelligence

I've always found the fact that the acronym for artificial intelligence in English, AI, is surprisingly similar to the first two characters for that word in both simplified Chinese -- '人工智能'. The first two characters together, 人工, mean'people' and'work' individually, but when put together mean'artificial' while '智能' means'intelligent.' This is quite a fascinating linguistic experiment, and it's interesting that the two most widely used languages in the world came up a similar acronym or character for one of the most important technologies ever invented by man. Perhaps there is some weird universal synergy going on or maybe there's an easy answer hidden somewhere deep within the linguistic annals of these two languages. Either way, this got me thinking about language.


Challenges in Detoxifying Language Models

arXiv.org Artificial Intelligence

Large language models (LM) generate remarkably fluent text and can be efficiently adapted across NLP tasks. Measuring and guaranteeing the quality of generated text in terms of safety is imperative for deploying LMs in the real world; to this end, prior work often relies on automatic evaluation of LM toxicity. We critically discuss this approach, evaluate several toxicity mitigation strategies with respect to both automatic and human evaluation, and analyze consequences of toxicity mitigation in terms of model bias and LM quality. We demonstrate that while basic intervention strategies can effectively optimize previously established automatic metrics on the RealToxicityPrompts dataset, this comes at the cost of reduced LM coverage for both texts about, and dialects of, marginalized groups. Additionally, we find that human raters often disagree with high automatic toxicity scores after strong toxicity reduction interventions -- highlighting further the nuances involved in careful evaluation of LM toxicity.


Speech recognition works for kids, and it's about time – TechCrunch

#artificialintelligence

Speech recognition technology is finally working for kids. That wasn't the case back in 1999, when my colleagues at Scholastic Education and I launched a reading intervention program called READ 180. We'd hoped to incorporate voice-enabled capabilities: Children would read to a computer program, which would provide real-time feedback on their fluency and literacy. Teachers, in turn, would receive information about their students' progress. Unfortunately, our idea was 20 years ahead of the technology, and we moved ahead with READ 180 without speech-recognition capabilities.


Northern accents are dying out and could DISAPPEAR BY 2066

Daily Mail - Science & tech

From the approachable Geordie dialect to the instantly recognisable Liverpool lilt, many of England's most distinctive accents are from the north. But a new study has warned that northern accents could all but disappear in just 45 years. Using physics modelling, researchers from the Universities of Portsmouth and Cambridge predicted how accents are likely to change across England by 2066. Their findings suggest that northern accents could be replaced with'posh' south eastern pronunciations. However, certain north-south differences are predicted to remain - we will continue to disagree about the pronunciation of bath', according to the researchers.


Scientists Are Using AI to Decode Whale Language

#artificialintelligence

When you dive into the ocean, the physiology of your body changes. As you go deeper into the water, your heart rate slows. In an environment that is seemingly hostile to its survival, the body becomes remarkably efficient at keeping you alive. The mammalian dive reflex, more romantically termed the "Master Switch of Life" by its discoverer, the physiologist Per Scholander, helped shape how we view our relationship to the water. If our bodies were so at home in the ocean, scientists wondered, what did that say about our evolutionary history?


wav2vec Unsupervised: Speech recognition without supervision

#artificialintelligence

Whether it's giving directions, answering questions, or carrying out requests, speech recognition makes life easier in countless ways. But today the technology is available for only a small fraction of the thousands of languages spoken around the globe. This is because high-quality systems need to be trained with large amounts of transcribed speech audio. Transcribed recordings of English-language novels, for example, will do little to help machines learn to understand a Basque speaker ordering food off a menu or a Tagalog speaker giving a business presentation. This is why we developed wav2vec Unsupervised (wav2vec-U), a way to build speech recognition systems that require no transcribed data at all.


AraCOVID19-MFH: Arabic COVID-19 Multi-label Fake News and Hate Speech Detection Dataset

arXiv.org Artificial Intelligence

Along with the COVID-19 pandemic, an "infodemic" of false and misleading information has emerged and has complicated the COVID-19 response efforts. Social networking sites such as Facebook and Twitter have contributed largely to the spread of rumors, conspiracy theories, hate, xenophobia, racism, and prejudice. To combat the spread of fake news, researchers around the world have and are still making considerable efforts to build and share COVID-19 related research articles, models, and datasets. This paper releases "AraCOVID19-MFH" a manually annotated multi-label Arabic COVID-19 fake news and hate speech detection dataset. Our dataset contains 10,828 Arabic tweets annotated with 10 different labels. The labels have been designed to consider some aspects relevant to the fact-checking task, such as the tweet's check worthiness, positivity/negativity, and factuality. To confirm our annotated dataset's practical utility, we used it to train and evaluate several classification models and reported the obtained results. Though the dataset is mainly designed for fake news detection, it can also be used for hate speech detection, opinion/news classification, dialect identification, and many other tasks.


How is Artificial Intelligence Challenging the Translation Industry?

#artificialintelligence

Language is perhaps the most defining factor of humankind. What makes humans different from other animals on the planet is our ability to speak out and communicate via framed words and sentences. The language of a population is one of the most defining factors across countries and nationalities, regions, and cultures. It can define the history, sociocultural situation, and even geographic diversity. From ancient times, there has been a trend for people to understand the language of one another. History traces back to Greeks and Romans traveling all across the world to discover, decipher and translate languages to find out the cultural, political, and social situations from one era to another.


A Linguistic Guide to Assassin's Creed: Valhalla

WIRED

Invading my own country has been one of the most surreal experiences of playing Assassin's Creed: Valhalla, and the variety of languages included in the game makes it one of the most thought-provoking. Assassin's Creed is an award-winning historical action game series known for putting players in the middle of transformative events in history. Valhalla is set during the Viking invasions of Britain, during which the main character, Eivor, and their brother Sigurd embark on a quest to conquer a new land. They travel by boat from their native country Norway to a place that is home to new Viking settlers, eager to forge their own legacy of glory. This gave me an outsider's perspective of my own country, eavesdropping on everyday conversations in busy settlements and deciphering the origin of war cries on mountainsides.