Collaborating Authors

Search: Overviews

Latent gaze information in highly dynamic decision-tasks Artificial Intelligence

Digitization is penetrating more and more areas of life. Tasks are increasingly being completed digitally, and are therefore not only fulfilled faster, more efficiently but also more purposefully and successfully. The rapid developments in the field of artificial intelligence in recent years have played a major role in this, as they brought up many helpful approaches to build on. At the same time, the eyes, their movements, and the meaning of these movements are being progressively researched. The combination of these developments has led to exciting approaches. In this dissertation, I present some of these approaches which I worked on during my Ph.D. First, I provide insight into the development of models that use artificial intelligence to connect eye movements with visual expertise. This is demonstrated for two domains or rather groups of people: athletes in decision-making actions and surgeons in arthroscopic procedures. The resulting models can be considered as digital diagnostic models for automatic expertise recognition. Furthermore, I show approaches that investigate the transferability of eye movement patterns to different expertise domains and subsequently, important aspects of techniques for generalization. Finally, I address the temporal detection of confusion based on eye movement data. The results suggest the use of the resulting model as a clock signal for possible digital assistance options in the training of young professionals. An interesting aspect of my research is that I was able to draw on very valuable data from DFB youth elite athletes as well as on long-standing experts in arthroscopy. In particular, the work with the DFB data attracted the interest of radio and print media, namely DeutschlandFunk Nova and SWR DasDing. All resulting articles presented here have been published in internationally renowned journals or at conferences.

Fair ranking: a critical review, challenges, and future directions Artificial Intelligence

Ranking, recommendation, and retrieval systems are widely used in online platforms and other societal systems, including e-commerce, media-streaming, admissions, gig platforms, and hiring. In the recent past, a large "fair ranking" research literature has been developed around making these systems fair to the individuals, providers, or content that are being ranked. Most of this literature defines fairness for a single instance of retrieval, or as a simple additive notion for multiple instances of retrievals over time. This work provides a critical overview of this literature, detailing the often context-specific concerns that such an approach misses: the gap between high ranking placements and true provider utility, spillovers and compounding effects over time, induced strategic incentives, and the effect of statistical uncertainty. We then provide a path forward for a more holistic and impact-oriented fair ranking research agenda, including methodological lessons from other fields and the role of the broader stakeholder community in overcoming data bottlenecks and designing effective regulatory environments.

DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise Annotations Artificial Intelligence

AI-aided drug discovery (AIDD) is gaining increasing popularity due to its promise of making the search for new pharmaceuticals quicker, cheaper and more efficient. In spite of its extensive use in many fields, such as ADMET prediction, virtual screening, protein folding and generative chemistry, little has been explored in terms of the out-of-distribution (OOD) learning problem with \emph{noise}, which is inevitable in real world AIDD applications. In this work, we present DrugOOD, a systematic OOD dataset curator and benchmark for AI-aided drug discovery, which comes with an open-source Python package that fully automates the data curation and OOD benchmarking processes. We focus on one of the most crucial problems in AIDD: drug target binding affinity prediction, which involves both macromolecule (protein target) and small-molecule (drug compound). In contrast to only providing fixed datasets, DrugOOD offers automated dataset curator with user-friendly customization scripts, rich domain annotations aligned with biochemistry knowledge, realistic noise annotations and rigorous benchmarking of state-of-the-art OOD algorithms. Since the molecular data is often modeled as irregular graphs using graph neural network (GNN) backbones, DrugOOD also serves as a valuable testbed for \emph{graph OOD learning} problems. Extensive empirical studies have shown a significant performance gap between in-distribution and out-of-distribution experiments, which highlights the need to develop better schemes that can allow for OOD generalization under noise for AIDD.

Educational Timetabling: Problems, Benchmarks, and State-of-the-Art Results Artificial Intelligence

Educational Timetabling, in essence, consists in assigning teacher/student meetings to days, timeslots, and classrooms. Despite this apparent simplicity, experience teaches us that every single institution has its own rules, conventions, and fixations, thus making each specific problem almost unique. As a consequence, uncountably many different problem formulations have been proposed in the literature on Educational Timetabling, depending on the type of institution (high-school, university, or other), the type of meetings (lectures, exams,...), and the different settings, constraints, and objectives. Many papers in the literature tackle a specific problem using a selected search method. The authors normally claim the success of the application, though rarely dispelling the doubt over the readers that the method used was more the authors' "favorite" rather than the most suitable for the problem under consideration.

Surrogate-assisted distributed swarm optimisation for computationally expensive models Artificial Intelligence

Advances in parallel and distributed computing have enabled efficient implementation of the distributed swarm and evolutionary algorithms for complex and computationally expensive models. Evolutionary algorithms provide gradient-free optimisation which is beneficial for models that do not have such information available, for instance, geoscientific landscape evolution models. However, such models are so computationally expensive that even distributed swarm and evolutionary algorithms with the power of parallel computing struggle. We need to incorporate efficient strategies such as surrogate assisted optimisation that further improves their performance; however, this becomes a challenge given parallel processing and inter-process communication for implementing surrogate training and prediction. In this paper, we implement surrogate-based estimation of fitness evaluation in distributed swarm optimisation over a parallel computing architecture. Our results demonstrate very promising results for benchmark functions and geoscientific landscape evolution models. We obtain a reduction in computationally time while retaining optimisation solution accuracy through the use of surrogates in a parallel computing environment.

Reliable Causal Discovery with Improved Exact Search and Weaker Assumptions Machine Learning

Many of the causal discovery methods rely on the faithfulness assumption to guarantee asymptotic correctness. However, the assumption can be approximately violated in many ways, leading to sub-optimal solutions. Although there is a line of research in Bayesian network structure learning that focuses on weakening the assumption, such as exact search methods with well-defined score functions, they do not scale well to large graphs. In this work, we introduce several strategies to improve the scalability of exact score-based methods in the linear Gaussian setting. In particular, we develop a super-structure estimation method based on the support of inverse covariance matrix which requires assumptions that are strictly weaker than faithfulness, and apply it to restrict the search space of exact search. We also propose a local search strategy that performs exact search on the local clusters formed by each variable and its neighbors within two hops in the super-structure. Numerical experiments validate the efficacy of the proposed procedure, and demonstrate that it scales up to hundreds of nodes with a high accuracy.

Systems Challenges for Trustworthy Embodied Systems Artificial Intelligence

A new generation of increasingly autonomous and self-learning systems, which we call embodied systems, is about to be developed. When deploying these systems into a real-life context we face various engineering challenges, as it is crucial to coordinate the behavior of embodied systems in a beneficial manner, ensure their compatibility with our human-centered social values, and design verifiably safe and reliable human-machine interaction. We are arguing that raditional systems engineering is coming to a climacteric from embedded to embodied systems, and with assuring the trustworthiness of dynamic federations of situationally aware, intent-driven, explorative, ever-evolving, largely non-predictable, and increasingly autonomous embodied systems in uncertain, complex, and unpredictable real-world contexts. We are also identifying a number of urgent systems challenges for trustworthy embodied systems, including robust and human-centric AI, cognitive architectures, uncertainty quantification, trustworthy self-integration, and continual analysis and assurance.

Forecasting: theory and practice Machine Learning

Forecasting has always been at the forefront of decision making and planning. The uncertainty that surrounds the future is both exciting and challenging, with individuals and organisations seeking to minimise risks and maximise utilities. The large number of forecasting applications calls for a diverse set of forecasting methods to tackle real-life challenges. This article provides a non-systematic review of the theory and the practice of forecasting. We provide an overview of a wide range of theoretical, state-of-the-art models, methods, principles, and approaches to prepare, produce, organise, and evaluate forecasts. We then demonstrate how such theoretical concepts are applied in a variety of real-life contexts. We do not claim that this review is an exhaustive list of methods and applications. However, we wish that our encyclopedic presentation will offer a point of reference for the rich work that has been undertaken over the last decades, with some key insights for the future of forecasting theory and practice. Given its encyclopedic nature, the intended mode of reading is non-linear. We offer cross-references to allow the readers to navigate through the various topics. We complement the theoretical concepts and applications covered by large lists of free or open-source software implementations and publicly-available databases.

Survey on English Entity Linking on Wikidata Artificial Intelligence

Wikidata is a frequently updated, community-driven, and multilingual knowledge graph. Hence, Wikidata is an attractive basis for Entity Linking, which is evident by the recent increase in published papers. This survey focuses on four subjects: (1) Which Wikidata Entity Linking datasets exist, how widely used are they and how are they constructed? (2) Do the characteristics of Wikidata matter for the design of Entity Linking datasets and if so, how? (3) How do current Entity Linking approaches exploit the specific characteristics of Wikidata? (4) Which Wikidata characteristics are unexploited by existing Entity Linking approaches? This survey reveals that current Wikidata-specific Entity Linking datasets do not differ in their annotation scheme from schemes for other knowledge graphs like DBpedia. Thus, the potential for multilingual and time-dependent datasets, naturally suited for Wikidata, is not lifted. Furthermore, we show that most Entity Linking approaches use Wikidata in the same way as any other knowledge graph missing the chance to leverage Wikidata-specific characteristics to increase quality. Almost all approaches employ specific properties like labels and sometimes descriptions but ignore characteristics such as the hyper-relational structure. Hence, there is still room for improvement, for example, by including hyper-relational graph embeddings or type information. Many approaches also include information from Wikipedia, which is easily combinable with Wikidata and provides valuable textual information, which Wikidata lacks.

False Data Injection Threats in Active Distribution Systems: A Comprehensive Survey Artificial Intelligence

With the proliferation of smart devices and revolutions in communications, electrical distribution systems are gradually shifting from passive, manually-operated and inflexible ones, to a massively interconnected cyber-physical smart grid to address the energy challenges of the future. However, the integration of several cutting-edge technologies has introduced several security and privacy vulnerabilities due to the large-scale complexity and resource limitations of deployments. Recent research trends have shown that False Data Injection (FDI) attacks are becoming one of the most malicious cyber threats within the entire smart grid paradigm. Therefore, this paper presents a comprehensive survey of the recent advances in FDI attacks within active distribution systems and proposes a taxonomy to classify the FDI threats with respect to smart grid targets. The related studies are contrasted and summarized in terms of the attack methodologies and implications on the electrical power distribution networks. Finally, we identify some research gaps and recommend a number of future research directions to guide and motivate prospective researchers.