Not enough data to create a plot.
Try a different view from the menu above.
Staab, Steffen
Fairness implications of encoding protected categorical attributes
Mougan, Carlos, Alvarez, Jose M., Patro, Gourab K, Ruggieri, Salvatore, Staab, Steffen
Protected attributes are often presented as categorical features that need to be encoded before feeding them into a machine learning algorithm. Encoding these attributes is paramount as they determine the way the algorithm will learn from the data. Categorical feature encoding has a direct impact on the model performance and fairness. In this work, we compare the accuracy and fairness implications of the two most well-known encoders: one-hot encoding and target encoding. We distinguish between two types of induced bias that can arise while using these encodings and can lead to unfair models. The first type, irreducible bias, is due to direct group category discrimination and a second type, reducible bias, is due to large variance in less statistically represented groups. We take a deeper look into how regularization methods for target encoding can improve the induced bias while encoding categorical features. Furthermore, we tackle the problem of intersectional fairness that arises when mixing two protected categorical features leading to higher cardinality. This practice is a powerful feature engineering technique used for boosting model performance. We study its implications on fairness as it can increase both types of induced bias
Box Embeddings for the Description Logic EL++
Xiong, Bo, Potyka, Nico, Tran, Trung-Kien, Nayyeri, Mojtaba, Staab, Steffen
Recently, various methods for representation learning on Knowledge Bases (KBs) have been developed. However, these approaches either only focus on learning the embeddings of the data-level knowledge (ABox) or exhibit inherent limitations when dealing with the concept-level knowledge (TBox), e.g., not properly modelling the structure of the logical knowledge. We present BoxEL, a geometric KB embedding approach that allows for better capturing logical structure expressed in the theories of Description Logic EL++. BoxEL models concepts in a KB as axis-parallel boxes exhibiting the advantage of intersectional closure, entities as points inside boxes, and relations between concepts/entities as affine transformations. We show theoretical guarantees (soundness) of BoxEL for preserving logical structure. Namely, the trained model of BoxEL embedding with loss 0 is a (logical) model of the KB. Experimental results on subsumption reasoning and a real-world application--protein-protein prediction show that BoxEL outperforms traditional knowledge graph embedding methods as well as state-of-the-art EL++ embedding approaches.
Wikidated 1.0: An Evolving Knowledge Graph Dataset of Wikidata's Revision History
Schmelzeisen, Lukas, Dima, Corina, Staab, Steffen
Wikidata is the largest general-interest knowledge base that is openly available. It is collaboratively edited by thousands of volunteer editors and has thus evolved considerably since its inception in 2012. In this paper, we present Wikidated 1.0, a dataset of Wikidata's full revision history, which encodes changes between Wikidata revisions as sets of deletions and additions of RDF triples. To the best of our knowledge, it constitutes the first large dataset of an evolving knowledge graph, a recently emerging research subject in the Semantic Web community. We introduce the methodology for generating Wikidated 1.0 from dumps of Wikidata, discuss its implementation and limitations, and present statistical characteristics of the dataset.
ProGS: Property Graph Shapes Language (Extended Version)
Seifer, Philipp, Lämmel, Ralf, Staab, Steffen
Property graphs constitute data models for representing knowledge graphs. They allow for the convenient representation of facts, including facts about facts, represented by triples in subject or object position of other triples. Knowledge graphs such as Wikidata are created by a diversity of contributors and a range of sources leaving them prone to two types of errors. The first type of error, falsity of facts, is addressed by property graphs through the representation of provenance and validity, making triples occur as first-order objects in subject position of metadata triples. The second type of error, violation of domain constraints, has not been addressed with regard to property graphs so far. In RDF representations, this error can be addressed by shape languages such as SHACL or ShEx, which allow for checking whether graphs are valid with respect to a set of domain constraints. Borrowing ideas from the syntax and semantics definitions of SHACL, we design a shape language for property graphs, ProGS, which allows for formulating shape constraints on property graphs including their specific constructs, such as edges with identities and key-value annotations to both nodes and edges. We define a formal semantics of ProGS, investigate the resulting complexity of validating property graphs against sets of ProGS shapes, compare with corresponding results for SHACL, and implement a prototypical validator that utilizes answer set programming.
LaHAR: Latent Human Activity Recognition using LDA
Boukhers, Zeyd, Wete, Danniene, Staab, Steffen
Processing sequential multi-sensor data becomes important in many tasks due to the dramatic increase in the availability of sensors that can acquire sequential data over time. Human Activity Recognition (HAR) is one of the fields which are actively benefiting from this availability. Unlike most of the approaches addressing HAR by considering predefined activity classes, this paper proposes a novel approach to discover the latent HAR patterns in sequential data. To this end, we employed Latent Dirichlet Allocation (LDA), which is initially a topic modelling approach used in text analysis. To make the data suitable for LDA, we extract the so-called "sensory words" from the sequential data. We carried out experiments on a challenging HAR dataset, demonstrating that LDA is capable of uncovering underlying structures in sequential data, which provide a human-understandable representation of the data. The extrinsic evaluations reveal that LDA is capable of accurately clustering HAR data sequences compared to the labelled activities.
Understanding Social Networks using Transfer Learning
Sun, Jun, Staab, Steffen, Kunegis, Jérôme
A detailed understanding of users contributes to the understanding of the Web's evolution, and to the development of Web applications. Although for new Web platforms such a study is especially important, it is often jeopardized by the lack of knowledge about novel phenomena due to the sparsity of data. Akin to human transfer of experiences from one domain to the next, transfer learning as a subfield of machine learning adapts knowledge acquired in one domain to a new domain. We systematically investigate how the concept of transfer learning may be applied to the study of users on newly created (emerging) Web platforms, and propose our transfer learning-based approach, TraNet. We show two use cases where TraNet is applied to tasks involving the identification of user trust and roles on different Web platforms. We compare the performance of TraNet with other approaches and find that our approach can best transfer knowledge on users across platforms in the given tasks.
CLEARumor at SemEval-2019 Task 7: ConvoLving ELMo Against Rumors
Baris, Ipek, Schmelzeisen, Lukas, Staab, Steffen
This paper describes our submission to SemEval-2019 Task 7: RumourEval: Determining Rumor Veracity and Support for Rumors. We participated in both subtasks. The goal of subtask A is to classify the type of interaction between a rumorous social media post and a reply post as support, query, deny, or comment. The goal of subtask B is to predict the veracity of a given rumor. For subtask A, we implement a CNN-based neural architecture using ELMo embeddings of post text combined with auxiliary features and achieve a F1-score of 44.6%. For subtask B, we employ a MLP neural network leveraging our estimates for subtask A and achieve a F1-score of 30.1% (second place in the competition). We provide results and analysis of our system performance and present ablation experiments.
Learning Taxonomies of Concepts and not Words using Contextualized Word Representations: A Position Paper
Schmelzeisen, Lukas, Staab, Steffen
Taxonomies are semantic hierarchies of concepts. One limitation of current taxonomy learning systems is that they define concepts as single words. This position paper argues that contextualized word representations, which recently achieved state-of-the-art results on many competitive NLP tasks, are a promising method to address this limitation. We outline a novel approach for taxonomy learning that (1) defines concepts as synsets, (2) learns density-based approximations of contextualized word representations, and (3) can measure similarity and hypernymy among them.
Predicting User Roles in Social Networks using Transfer Learning with Feature Transformation
Sun, Jun, Kunegis, Jérôme, Staab, Steffen
How can we recognise social roles of people, given a completely unlabelled social network? We present a transfer learning approach to network role classification based on feature transformations from each network's local feature distribution to a global feature space. Experiments are carried out on real-world datasets. (See manuscript for the full abstract.)
Structural Dynamics of Knowledge Networks
Preusse, Julia (University of Koblenz-Landau) | Kunegis, Jérôme (University of Koblenz-Landau) | Thimm, Matthias (University of Koblenz-Landau) | Staab, Steffen (University of Koblenz-Landau) | Gottron, Thomas (University of Koblenz-Landau)
We investigate the structural patterns of the appearance and disappearance of links in dynamic knowledge networks. Human knowledge is nowadays increasingly created and curated online, in a collaborative and highly dynamic fashion. The knowledge thus created is interlinked in nature, and an important open task is to understand its temporal evolution. In this paper, we study the underlying mechanisms of changes in knowledge networks which are of structural nature, i.e., which are a direct result of a knowledge network's structure. Concretely, we ask whether the appearance and disappearance of interconnections between concepts (items of a knowledge base) can be predicted using information about the network formed by these interconnections. In contrast to related work on this problem, we take into account the disappearance of links in our study, to account for the fact that the evolution of collaborative knowledge bases includes a high proportion of removals and reverts. We perform an empirical study on the best-known and largest collaborative knowledge base, Wikipedia, and show that traditional indicators of structural change used in the link analysis literature can be classified into four classes, which we show to indicate growth, decay, stability and instability of links. We finally use these methods to identify the underlying reasons for individual additions and removals of knowledge links.