The objective of this course is to give you a wholistic understanding of machine learning, covering theory, application, and inner workings of supervised, unsupervised, and deep learning algorithms. In this series, we'll be covering linear regression, K Nearest Neighbors, Support Vector Machines (SVM), flat clustering, hierarchical clustering, and neural networks. For each major algorithm that we cover, we will discuss the high level intuitions of the algorithms and how they are logically meant to work. Next, we'll apply the algorithms in code using real world data sets along with a module, such as with Scikit-Learn. Finally, we'll be diving into the inner workings of each of the algorithms by recreating them in code, from scratch, ourselves, including all of the math involved.
Predictive modeling is increasingly being employed to assist human decision-makers. One purported advantage of replacing human judgment with computer models in high stakes settings-- such as sentencing, hiring, policing, college admissions, and parole decisions-- is the perceived "neutrality" of computers. It is argued that because computer models do not hold personal prejudice, the predictions they produce will be equally free from prejudice. There is growing recognition that employing algorithms does not remove the potential for bias, and can even amplify it, since training data were inevitably generated by a process that is itself biased. In this paper, we provide a probabilistic definition of algorithmic bias. We propose a method to remove bias from predictive models by removing all information regarding protected variables from the permitted training data. Unlike previous work in this area, our framework is general enough to accommodate arbitrary data types, e.g. binary, continuous, etc. Motivated by models currently in use in the criminal justice system that inform decisions on pre-trial release and paroling, we apply our proposed method to a dataset on the criminal histories of individuals at the time of sentencing to produce "race-neutral" predictions of re-arrest. In the process, we demonstrate that the most common approach to creating "race-neutral" models-- omitting race as a covariate-- still results in racially disparate predictions. We then demonstrate that the application of our proposed method to these data removes racial disparities from predictions with minimal impact on predictive accuracy.
Machine learning has led to breakthroughs such as speech recognition and smart digital assistants such as Alexa. Scammers are now using machine learning tools to mine social media data and target the executive organization chart with fraudulent emails that look and sound like they came from someone inside the company. Cybercriminals have already collected more than $3 billion over the last three years by targeting 400 companies every day, according to recent findings of Symantec Corp. security researchers. "This is one of the biggest deals in the cybercriminal world today," said Vijay Thaware (pictured, left), security response lead for Symantec. Thaware and his Symantec colleague, threat analyst Ankit Singh (right), presented their findings on Wednesday during the first day of the Black Hat USA 2017 cybersecurity conference briefings in Las Vegas.
Haranko, Karri (Aalto University) | Zagheni, Emilio (University of Washington) | Garimella, Kiran (École polytechnique fédérale de Lausanne (EPFL)) | Weber, Ingmar (Qatar Computing Research Institute)
Gender imbalances in work environments have been a long-standing concern. Identifying the existence of such imbalances is key to designing policies to help overcome them. In this work, we study gender trends in employment across various dimensions in the United States. This is done by analyzing anonymous, aggregate statistics that were extracted from LinkedIn's advertising platform. The data contain the number of male and female LinkedIn users with respect to (i) location, (ii) age, (iii) industry and (iv) certain skills. We studied which of these categories correlate the most with high relative male or female presence on LinkedIn. In addition to examining the summary statistics of the LinkedIn data, we model the gender balance as a function of the different employee features using linear regression. Our results suggest that the gender gap, as measured using LinkedIn data, varies across all feature types, but the differences are most profound among industries and skills. A high correlation between gender ratios of people in our LinkedIn data set, and data provided by the US Bureau of Labor Statistics, serves as external validation for our results.
Exposure to frequent crime incidents has been found to have a negative bearing on the well-being of city residents, even if they are not themselves a direct victim. We pursue the research question of whether naturalistic data shared on Twitter may provide a “lens” to understand changes in psychological attributes of urban communities (1) immediately following crime incidents, as well as (2) due to long-term exposure to crime. We analyze half a million Twitter posts from the City of Atlanta in 2014, where the rate of violent crime is three times of the national average. In a first study, we develop a statistical method to detect changes in social media psychological attributes in the immediate aftermath of a crime event. Second, we develop a regression model that uses historical (yearlong) crime to predict Twitter negative emotion, anxiety, anger, and sadness. We do not find significant changes in social media affect immediately following crime in Atlanta. However we do observe significant ability of historical crime to account for heightened negative emotion and anger in the future. Our findings have implications in gauging the utility of social media to infer longitudinal and population-scale patterns of urban well-being.