Attraction-Repulsion clustering with applications to fairness
del Barrio, Eustasio, Inouzhe, Hristo, Loubes, Jean-Michel
Cluster analysis or clustering is the task of dividing a set of objects in such a way that elements in the same group or cluster are more similar, according to some dissimilarity measure, than elements in different groups. To achieve this task there are two main types of algorithms: partitioning algorithms, which try to split the data into k groups that usually minimize some optimality criteria, or agglomerative algorithms, which start with single observations and merge them into clusters according to some dissimilarity measure. Such methods have been investigated in a large amount of literature, hence we refer to [12] and references therein for an overview. Clustering techniques used as unsupervised classification procedures are increasingly more influential in people's life since they are used in credit scoring, article recommendation, risk assessment, spam filtering or sentencing recommendations in courts of law, among others. Hence controlling the outcome of such procedures, in particular ensuring that some variables which should not be taken into account due to moral or legal issues are not playing a role in the classification of the observations, has become an important field of research known as fair learning. We refer to [15], [3], [1] or [9] for an overview of such legal issues and mathematical solutions to address them. For instance avoiding discrimination against sensitive characteristics such as sex, race or age can not only be achieved using the naive solution of simply ignoring such protected attribute. Indeed, if the the data at hand reflects a real world bias, machine learning algorithms can pick on this behaviour and emulate it. More precisely, suppose we have data that includes information about attributes that we know or suspect that are biased with respect to the protected class.
Apr-10-2019
- Country:
- North America > United States (0.14)
- Europe
- Spain > Castile and León
- Valladolid Province > Valladolid (0.04)
- France > Occitanie
- Haute-Garonne > Toulouse (0.04)
- Spain > Castile and León
- Genre:
- Research Report (0.40)
- Industry:
- Law (1.00)
- Technology: