Collaborating Authors

How to Transform Target Variables for Regression With Scikit-Learn


Data preparation is a big part of applied machine learning. Correctly preparing your training data can mean the difference between mediocre and extraordinary results, even with very simple linear algorithms. Performing data preparation operations, such as scaling, is relatively straightforward for input variables and has been made routine in Python via the Pipeline scikit-learn class. On regression predictive modeling problems where a numerical value must be predicted, it can also be critical to scale and perform other data transformations on the target variable. This can be achieved in Python using the TransformedTargetRegressor class.

How to Develop an Imbalanced Classification Model to Detect Oil Spills


Many imbalanced classification tasks require a skillful model that predicts a crisp class label, where both classes are equally important. An example of an imbalanced classification problem where a class label is required and both classes are equally important is the detection of oil spills or slicks in satellite images. The detection of a spill requires mobilizing an expensive response, and missing an event is equally expensive, causing damage to the environment. One way to evaluate imbalanced classification models that predict crisp labels is to calculate the separate accuracy on the positive class and the negative class, referred to as sensitivity and specificity. These two measures can then be averaged using the geometric mean, referred to as the G-mean, that is insensitive to the skewed class distribution and correctly reports on the skill of the model on both classes. In this tutorial, you will discover how to develop a model to predict the presence of an oil spill in satellite images and evaluate it using the G-mean metric. Develop an Imbalanced Classification Model to Detect Oil Spills Photo by Lenny K Photography, some rights reserved. In this project, we will use a standard imbalanced machine learning dataset referred to as the "oil spill" dataset, "oil slicks" dataset or simply "oil."

Judging a Book by its Description : Analyzing Gender Stereotypes in the Man Bookers Prize Winning Fiction Artificial Intelligence

The presence of gender stereotypes in many aspects of society is a well-known phenomenon. In this paper, we focus on studying and quantifying such stereotypes and bias in the Man Bookers Prize winning fiction. We consider 275 books shortlisted for Man Bookers Prize between 1969 and 2017. The gender bias is analyzed by semantic modeling of book descriptions on Goodreads. This reveals the pervasiveness of gender bias and stereotype in the books on different features like occupation, introductions and actions associated to the characters in the book.

Study of film scripts shows how sexist Hollywood REALLY is

Daily Mail - Science & tech

A new analysis on the characters and dialogue of nearly 1,000 film scripts has offered startling new insight on the persistence of stereotypes in the media. Using a new tool to study the content of language and interactions, researchers have found that female characters have far less representation than males, and less than half the amount of dialogue. The analysis also found that female characters tend to be about 5 years younger than males – and, while women may be portrayed in a positive light, this often comes in the context of family values. According to the researchers, these trends are created and reinforced through both conscious and unconscious choices by writers, and as a result, female roles typically are not central to the plot. A new analysis of nearly 1,000 film scripts has offered startling new insight on the persistence of stereotypes in the media.

Automatic Transliteration Can Help Alexa Find Data Across Language Barriers : Alexa Blogs


As Alexa-enabled devices continue to expand into new countries, finding information across languages that use different scripts becomes a more pressing challenge. For example, a Japanese music catalogue may contain names written in English or the various scripts used in Japanese -- Kanji, Katakana, or Hiragana. When an Alexa customer, from anywhere in the world, asks for a certain song, album, or artist, we could have a mismatch between Alexa's transcription of the request and the script used in the corresponding catalogue. To address this problem, we developed a machine-learned multilingual named-entity transliteration system. Named-entity transliteration is the process of converting a name from one language script to another.