Paaßen, Benjamin
Fairness in KI-Systemen
Strotherm, Janine, Müller, Alissa, Hammer, Barbara, Paaßen, Benjamin
Zusammenfassung The more AI-assisted decisions affect people's lives, the more important the fairness of such decisions becomes. In this chapter, we provide an introduction to research on fairness in machine learning. We explain the main fairness definitions and strategies for achieving fairness using concrete examples and place fairness research in the European context. Our contribution is aimed at an interdisciplinary audience and therefore avoids mathematical formulation but emphasizes visualizations and examples. Machine Bias - There's software used across the country to predict future criminals.
Automatic Creativity Measurement in Scratch Programs Across Modalities
Kovalkov, Anastasia, Paaßen, Benjamin, Segal, Avi, Pinkwart, Niels, Gal, Kobi
Promoting creativity is considered an important goal of education, but creativity is notoriously hard to measure.In this paper, we make the journey fromdefining a formal measure of creativity that is efficientlycomputable to applying the measure in a practical domain. The measure is general and relies on coretheoretical concepts in creativity theory, namely fluency, flexibility, and originality, integratingwith prior cognitive science literature. We adapted the general measure for projects in the popular visual programming language Scratch.We designed a machine learning model for predicting the creativity of Scratch projects, trained and evaluated on human expert creativity assessments in an extensive user study. Our results show that opinions about creativity in Scratch varied widely across experts. The automatic creativity assessment aligned with the assessment of the human experts more than the experts agreed with each other. This is a first step in providing computational models for measuring creativity that can be applied to educational technologies, and to scale up the benefit of creativity education in schools.
An A*-algorithm for the Unordered Tree Edit Distance with Custom Costs
Paaßen, Benjamin
The unordered tree edit distance is a natural metric to compute distances between trees without intrinsic child order, such as representations of chemical molecules. While the unordered tree edit distance is MAX SNP-hard in principle, it is feasible for small cases, e.g. via an A* algorithm. Unfortunately, current heuristics for the A* algorithm assume unit costs for deletions, insertions, and replacements, which limits our ability to inject domain knowledge. In this paper, we present three novel heuristics for the A* algorithm that work with custom cost functions. In experiments on two chemical data sets, we show that custom costs make the A* computation faster and improve the error of a 5-nearest neighbor regressor, predicting chemical properties. We also show that, on these data, polynomial edit distances can achieve similar results as the unordered tree edit distance.
Reservoir Memory Machines as Neural Computers
Paaßen, Benjamin, Schulz, Alexander, Stewart, Terrence C., Hammer, Barbara
Differentiable neural computers extend artificial neural networks with an explicit memory without interference, thus enabling the model to perform classic computation tasks such as graph traversal. However, such models are difficult to train, requiring long training times and large datasets. In this work, we achieve some of the computational capabilities of differentiable neural computers with a model that can be trained extremely efficiently, namely an echo state network with an explicit memory without interference. This extension raises the computation power of echo state networks from strictly less than finite state machines to strictly more than finite state machines. Further, we demonstrate experimentally that our model performs comparably to its fully-trained deep version on several typical benchmark tasks for differentiable neural computers.
Adversarial Edit Attacks for Tree Data
Paaßen, Benjamin
Many machine learning models can be attacked with adversarial examples, i.e. inputs close to correctly classified examples that are classified incorrectly. However, most research on adversarial attacks to date is limited to vectorial data, in particular image data. In this contribution, we extend the field by introducing adversarial edit attacks for tree-structured data with potential applications in medicine and automated program analysis. Our approach solely relies on the tree edit distance and a logarithmic number of black-box queries to the attacked classifier without any need for gradient information. We evaluate our approach on two programming and two biomedical data sets and show that many established tree classifiers, like tree-kernel-SVMs and recursive neural networks, can be attacked effectively.
Embeddings and Representation Learning for Structured Data
Paaßen, Benjamin, Gallicchio, Claudio, Micheli, Alessio, Sperduti, Alessandro
Performing machine learning on structured data is complicated by the fact that such data does not have vectorial form. Therefore, multiple approaches have emerged to construct vectorial representations of structured data, from kernel and distance approaches to recurrent, recursive, and convolutional neural networks. Recent years have seen heightened attention in this demanding field of research and several new approaches have emerged, such as metric learning on structured data, graph convolutional neural networks, and recurrent decoder networks for structured data. In this contribution, we provide an high-level overview of the state-of-the-art in representation learning and embeddings for structured data across a wide range of machine learning fields.
Dynamic fairness - Breaking vicious cycles in automatic decision making
Paaßen, Benjamin, Bunge, Astrid, Hainke, Carolin, Sindelar, Leon, Vogelsang, Matthias
In recent years, machine learning techniques have been increasingly applied in sensitive decision making processes, raising fairness concerns. Past research has shown that machine learning may reproduce and even exacerbate human bias due to biased training data or flawed model assumptions, and thus may lead to discriminatory actions. To counteract such biased models, researchers have proposed multiple mathematical definitions of fairness according to which classifiers can be optimized. However, it has also been shown that the outcomes generated by some fairness notions may be unsatisfactory. In this contribution, we add to this research by considering decision making processes in time. We establish a theoretic model in which even perfectly accurate classifiers which adhere to almost all common fairness definitions lead to stable long-term inequalities due to vicious cycles. Only demographic parity, which enforces equal rates of positive decisions across groups, avoids these effects and establishes a virtuous cycle, which leads to perfectly accurate and fair classification in the long term.
The Continuous Hint Factory - Providing Hints in Vast and Sparsely Populated Edit Distance Spaces
Paaßen, Benjamin, Hammer, Barbara, Price, Thomas William, Barnes, Tiffany, Gross, Sebastian, Pinkwart, Niels
Intelligent tutoring systems can support students in solving multi-step tasks by providing hints regarding what to do next. However, engineering such next-step hints manually or via an expert model becomes infeasible if the space of possible states is too large. Therefore, several approaches have emerged to infer next-step hints automatically, relying on past students' data. In particular, the Hint Factory (Barnes & Stamper, 2008) recommends edits that are most likely to guide students from their current state towards a correct solution, based on what successful students in the past have done in the same situation. Still, the Hint Factory relies on student data being available for any state a student might visit while solving the task, which is not the case for some learning tasks, such as open-ended programming tasks. In this contribution we provide a mathematical framework for edit-based hint policies and, based on this theory, propose a novel hint policy to provide edit hints in vast and sparsely populated state spaces. In particular, we extend the Hint Factory by considering data of past students in all states which are similar to the student's current state and creating hints approximating the weighted average of all these reference states. Because the space of possible weighted averages is continuous, we call this approach the Continuous Hint Factory. In our experimental evaluation, we demonstrate that the Continuous Hint Factory can predict more accurately what capable students would do compared to existing prediction schemes on two learning tasks, especially in an open-ended programming task, and that the Continuous Hint Factory is comparable to existing hint policies at reproducing tutor hints on a simple UML diagram task.
Tree Edit Distance Learning via Adaptive Symbol Embeddings
Paaßen, Benjamin, Gallicchio, Claudio, Micheli, Alessio, Hammer, Barbara
Metric learning has the aim to improve classification accuracy by learning a distance measure which brings data points from the same class closer together and pushes data points from different classes further apart. Recent research has demonstrated that metric learning approaches can also be applied to trees, such as molecular structures, abstract syntax trees of computer programs, or syntax trees of natural language, by learning the cost function of an edit distance, i.e. the costs of replacing, deleting, or inserting nodes in a tree. However, learning such costs directly may yield an edit distance which violates metric axioms, is challenging to interpret, and may not generalize well. In this contribution, we propose a novel metric learning approach for trees which learns an edit distance indirectly by embedding the tree nodes as vectors, such that the Euclidean distance between those vectors supports class discrimination. We learn such embeddings by reducing the distance to prototypical trees from the same class and increasing the distance to prototypical trees from different classes. In our experiments, we show that our proposed metric learning approach improves upon the state-of-the-art in metric learning for trees on six benchmark data sets, ranging from computer science over biomedical data to a natural-language processing data set containing over 300,000 nodes.
Tree Edit Distance Learning via Adaptive Symbol Embeddings: Supplementary Materials and Results
Paaßen, Benjamin
Metric learning has the aim to improve classification accuracy by learning a distance measure which brings data points from the same class closer together and pushes data points from different classes further apart. Recent research has demonstrated that metric learning approaches can also be applied to trees, such as molecular structures, abstract syntax trees of computer programs, or syntax trees of natural language, by learning the cost function of an edit distance, i.e. the costs of replacing, deleting, or inserting nodes in a tree. However, learning such costs directly may yield an edit distance which violates metric axioms, is challenging to interpret, and may not generalize well. In this contribution, we propose a novel metric learning approach for trees which learns an edit distance indirectly by embedding the tree nodes as vectors, such that the Euclidean distance between those vectors supports class discrimination. We learn such embeddings by reducing the distance to prototypical trees from the same class and increasing the distance to prototypical trees from different classes. In our experiments, we show that our proposed metric learning approach improves upon the state-of-the-art in metric learning for trees on six benchmark data sets, ranging from computer science over biomedical data to a natural-language processing data set containing over 300,000 nodes.