One good thing about being stuck at home during the pandemic is that a person can finally get into the habit of listening to "A Way with Words," a radio show that airs on Friday afternoons on New York's WNYE (91.5 FM; check local listings). The hosts, Martha Barnette and Grant Barrett, are the Click and Clack of word talk. Barnette is a writer who has studied Latin and Greek (her books include "A Garden of Words"), and Barrett is a linguist and lexicographer with an ear for contemporary slang. They make a perfect duo. The show is modelled after "Car Talk," though it is broadcast from San Diego, not Cambridge: the hosts laugh a lot, and when people call in they answer by saying, "You have a way with words," which is always nice to hear.
There are a variety of tools that can help researchers analyze large volumes of written material. In this post, I'll examine two of these tools: part-of-speech tagging and tone analysis. I'll also show how to use these methods to find patterns in a large set of Facebook posts created by members of Congress. Part-of-speech (POS) tagging is a process that labels each word in a sentence with an algorithm's best guess for the word's part of speech (for example, noun, adjective or verb). This is based on both the definition of each word and the context in which it appears.
One of the main concerns with AI technologies today is the fear that they will propagate the various biases we already have in society. A recent Stanford study turned things around, however, and highlighted how AI can also turn the mirror onto society and shed light on the biases that exist within it. The study utilized word embeddings to map relationships and associations between words and, through that measure, the changes in gender and ethnic stereotypes over the last century in the United States. The algorithms were fed text from a huge canon of books, newspapers, and other texts, while comparing these with official census demographic data and societal changes, such as the women's movement. The researchers used embedding to single out specific occupations and adjectives that tended to be biased toward women or ethnic groups each decade from 1900 to the present day.
We've come a long way in the word embedding space since the introduction of Word2Vec (Mikolov et. These days, it seems that every single machine learning practitioner can recite the "king minus man plus woman equals queen" mantra. In present, these interpretable word embeddings have become an essential part in many deep-learning based NLP systems. Earlier last October, Google AI introduced BERT: Bidirectional Encoder Representations from Transformers (paper, source). Seemingly, the researchers at Google have done it again: they've come up with a model to learn contextual word representations that redefined the state of the art for 11 NLP tasks, 'even surpassing human performance in the challenging area of question answering'.
We all have our favorite web sites and sources for keeping up to date with web design trends, inspiration, and ideas. The problem is, these can get pretty stale. Compare a few eCommerce product pages, and they all basically look the same; lots of white space for a clean look, thumbnails of products, maybe some filtering options at the left or top of the page. Sticking to these types of formulas does have it's advantages. People know how it works immediately and can easily figure out how to search, filter, and sort.