Goto

Collaborating Authors

 r-squared



Auditing Google's Search Algorithm: Measuring News Diversity Across Brazil, the UK, and the US

Hernandes, Raphael, Corsi, Giulio

arXiv.org Artificial Intelligence

This study examines the influence of Google's search algorithm on news diversity by analyzing search results in Brazil, the UK, and the US. It explores how Google's system preferentially favors a limited number of news outlets. Utilizing algorithm auditing techniques, the research measures source concentration with the Herfindahl-Hirschman Index (HHI) and Gini coefficient, revealing significant concentration trends. The study underscores the importance of conducting horizontal analyses across multiple search queries, as focusing solely on individual results pages may obscure these patterns. Factors such as popularity, political bias, and recency were evaluated for their impact on news rankings. Findings indicate a slight leftward bias in search outcomes and a preference for popular, often national outlets. This bias, combined with a tendency to prioritize recent content, suggests that Google's algorithm may reinforce existing media inequalities. By analyzing the largest dataset to date -- 221,863 search results -- this research provides comprehensive, longitudinal insights into how algorithms shape public access to diverse news sources.


How to unlearn a learned Machine Learning model ?

Achour, Seifeddine

arXiv.org Artificial Intelligence

In contemporary times, machine learning (ML) has sparked a remarkable revolution across numerous domains, surpassing even the loftiest of human expectations. However, despite the astounding progress made by ML, the need to regulate its outputs and capabilities has become imperative. A viable approach to address this concern is by exerting control over the data used for its training, more precisely, by unlearning the model from undesired data. In this article, I will present an elegant algorithm for unlearning a machine learning model and visualize its abilities. Additionally, I will elucidate the underlying mathematical theory and establish specific metrics to evaluate both the unlearned model's performance on desired data and its level of ignorance regarding unwanted data.


Hypothesis Transfer Learning via Transformation Functions

Simon S. Du, Jayanth Koushik, Aarti Singh, Barnabas Poczos

Neural Information Processing Systems

We consider the Hypothesis Transfer Learning (HTL) problem where one incorporates a hypothesis trained on the source domain into the learning procedure of the target domain. Existing theoretical analysis either only studies specific algorithms or only presents upper bounds on the generalization error but not on the excess risk. In this paper, we propose a unified algorithm-dependent framework for HTL through a novel notion of transformation function, which characterizes the relation between the source and the target domains. We conduct a general risk analysis of this framework and in particular, we show for the first time, if two domains are related, HTL enjoys faster convergence rates of excess risks for Kernel Smoothing and Kernel Ridge Regression than those of the classical non-transfer learning settings. Experiments on real world data demonstrate the effectiveness of our framework.


An Optimal House Price Prediction Algorithm: XGBoost

Sharma, Hemlata, Harsora, Hitesh, Ogunleye, Bayode

arXiv.org Artificial Intelligence

An accurate prediction of house prices is a fundamental requirement for various sectors including real estate and mortgage lending. It is widely recognized that a property value is not solely determined by its physical attributes but is significantly influenced by its surrounding neighbourhood. Meeting the diverse housing needs of individuals while balancing budget constraints is a primary concern for real estate developers. To this end, we addressed the house price prediction problem as a regression task and thus employed various machine learning techniques capable of expressing the significance of independent variables. We made use of the housing dataset of Ames City in Iowa, USA to compare support vector regressor, random forest regressor, XGBoost, multilayer perceptron and multiple linear regression algorithms for house price prediction. Afterwards, we identified the key factors that influence housing costs. Our results show that XGBoost is the best performing model for house price prediction.


The Machine Learning Series in Python: Level 1 - Couponos 99

#artificialintelligence

In this The Machine Learning Series in Python: Level 1 Course you will master the foundations of Machine Learning and practice building ML models with real-world case studies. We will start from scratch and explain: What Machine Learning is, The Machine Learning Process of how to build a ML model, Regression: Predict a continuous number, Simple Linear Regression, Ordinary Least Squares, Multiple Linear Regression, R-Squared, Adjusted R-Squared. We will also do the following the three following practical activities: Real-World Case Study: Build a Multiple Linear Regression model, Real-World Case Study: Build a Logistic Regression model, Real-World Case Study: Build a K-Means Clustering model.


Linear Regression: Mathematical Intuition

#artificialintelligence

Since the start of your data scientist journey, you have been commonly accustomed with this machine learning algorithm. Linear Regression as it is the basic and foremost machine learning algorithm we generally start with while analysing different regression problems. As the word linear says, the linear relationship between input variables(x) with the dependent output variable(y). Basically the linear regression analysis performs the task of predicticting the output variable by modelling or finding relationships between the independent variables(x). And the approach of finding the best ouput is by fitting the predicted line towards the best fit line.


Understanding Conditional Variance and Conditional Covariance

#artificialintelligence

Conditional Variance and Conditional Covariance are concepts that are central to statistical modeling. In this article, we'll learn what they are, and we'll illustrate how to calculate them using a real-world data set. First, a quick refresher on what is variance and covariance. Variance of a random variable measures its variation around its mean. The covariance between two random variables is a measure of how correlated are their variations around their respective means.


Statistics (III) ANOVA in Data Science & Machine Learning

#artificialintelligence

For the last part of the Statistics series, we will cover the ANOVA, Post-hoc Pairwise Comparison, Two-way ANOVA, and R-squared. Previously, our study focused on one or two groups of subjects. How can we handle the concept of multiple groups with multiple factors? For example, the dose level and gender may impact the effectiveness of a vaccine. How can we determine whether it is statistically significant for particular combinations?


Data Science Techniques: How to Predict the Sales With Multiple Linear Regression

#artificialintelligence

Linear regression is one of the most popular techniques in data science. It can help you predict many different scenarios. Although it is a widespread technique, it is not a one-size-fits-all model because not all relationships in life are linear. "All models are wrong, but some are useful." You are interested in predicting physical and downloaded album sales from money spent on advertising. Your boss comes into the office and asks how many albums you would sell if you spend $100,000 advertising.