Krzyziński, Mateusz
Mining United Nations General Assembly Debates
Grzyb, Mateusz, Krzyziński, Mateusz, Sobieski, Bartłomiej, Spytek, Mikołaj, Pieliński, Bartosz, Dan, Daniel, Wróblewska, Anna
The United Nations (UN) is an international organization founded in 1945, comprising 193 member states. It was established after World War II with the intent to prevent future conflicts and foster global peace and security. The UN is a global forum where countries discuss and address critical issues ranging from international security, economic development, climate change, human rights, and humanitarian aid. It operates through various organs, including the General Assembly, the Security Council, and specialized agencies like UNESCO and WHO. The UN is pivotal in international cooperation and diplomacy, striving to maintain global stability and promote sustainable development. The United Nations General Assembly (UNGA) serves as a global forum for member states to discuss and work together on international issues.
Interpretable Machine Learning for Survival Analysis
Langbein, Sophie Hanna, Krzyziński, Mateusz, Spytek, Mikołaj, Baniecki, Hubert, Biecek, Przemysław, Wright, Marvin N.
With the spread and rapid advancement of black box machine learning models, the field of interpretable machine learning (IML) or explainable artificial intelligence (XAI) has become increasingly important over the last decade. This is particularly relevant for survival analysis, where the adoption of IML techniques promotes transparency, accountability and fairness in sensitive areas, such as clinical decision making processes, the development of targeted therapies, interventions or in other medical or healthcare related contexts. More specifically, explainability can uncover a survival model's potential biases and limitations and provide more mathematically sound ways to understand how and which features are influential for prediction or constitute risk factors. However, the lack of readily available IML methods may have deterred medical practitioners and policy makers in public health from leveraging the full potential of machine learning for predicting time-to-event data. We present a comprehensive review of the limited existing amount of work on IML methods for survival analysis within the context of the general IML taxonomy. In addition, we formally detail how commonly used IML methods, such as such as individual conditional expectation (ICE), partial dependence plots (PDP), accumulated local effects (ALE), different feature importance measures or Friedman's H-interaction statistics can be adapted to survival outcomes. An application of several IML methods to real data on data on under-5 year mortality of Ghanaian children from the Demographic and Health Surveys (DHS) Program serves as a tutorial or guide for researchers, on how to utilize the techniques in practice to facilitate understanding of model decisions or predictions.
survex: an R package for explaining machine learning survival models
Spytek, Mikołaj, Krzyziński, Mateusz, Langbein, Sophie Hanna, Baniecki, Hubert, Wright, Marvin N., Biecek, Przemysław
Summary: Due to their flexibility and superior performance, machine learning models frequently complement and outperform traditional statistical survival models. However, their widespread adoption is hindered by a lack of user-friendly tools to explain their internal operations and prediction rationales. To tackle this issue, we introduce the survex R package, which provides a cohesive framework for explaining any survival model by applying explainable artificial intelligence techniques. The capabilities of the proposed software encompass understanding and diagnosing survival models, which can lead to their improvement. By revealing insights into the decision-making process, such as variable effects and importances, survex enables the assessment of model reliability and the detection of biases. Thus, transparency and responsibility may be promoted in sensitive areas, such as biomedical research and healthcare applications.
Exploration of the Rashomon Set Assists Trustworthy Explanations for Medical Data
Kobylińska, Katarzyna, Krzyziński, Mateusz, Machowicz, Rafał, Adamek, Mariusz, Biecek, Przemysław
The machine learning modeling process conventionally culminates in selecting a single model that maximizes a selected performance metric. However, this approach leads to abandoning a more profound analysis of slightly inferior models. Particularly in medical and healthcare studies, where the objective extends beyond predictions to valuable insight generation, relying solely on a single model can result in misleading or incomplete conclusions. This problem is particularly pertinent when dealing with a set of models known as $\textit{Rashomon set}$, with performance close to maximum one. Such a set can be numerous and may contain models describing the data in a different way, which calls for comprehensive analysis. This paper introduces a novel process to explore models in the Rashomon set, extending the conventional modeling approach. We propose the $\texttt{Rashomon_DETECT}$ algorithm to detect models with different behavior. It is based on recent developments in the eXplainable Artificial Intelligence (XAI) field. To quantify differences in variable effects among models, we introduce the Profile Disparity Index (PDI) based on measures from functional data analysis. To illustrate the effectiveness of our approach, we showcase its application in predicting survival among hemophagocytic lymphohistiocytosis (HLH) patients - a foundational case study. Additionally, we benchmark our approach on other medical data sets, demonstrating its versatility and utility in various contexts. If differently behaving models are detected in the Rashomon set, their combined analysis leads to more trustworthy conclusions, which is of vital importance for high-stakes applications such as medical applications.
HADES: Homologous Automated Document Exploration and Summarization
Wilczyński, Piotr, Żółkowski, Artur, Krzyziński, Mateusz, Wiśnios, Emilia, Pieliński, Bartosz, Giziński, Stanisław, Sienkiewicz, Julian, Biecek, Przemysław
This paper introduces HADES, a novel tool for automatic comparative documents with similar structures. HADES is designed to streamline the work of professionals dealing with large volumes of documents, such as policy documents, legal acts, and scientific papers. The tool employs a multi-step pipeline that begins with processing PDF documents using topic modeling, summarization, and analysis of the most important words for each topic. The process concludes with an interactive web app with visualizations that facilitate the comparison of the documents. HADES has the potential to significantly improve the productivity of professionals dealing with high volumes of documents, reducing the time and effort required to complete tasks related to comparative document analysis. Our package is publically available on GitHub.
SurvSHAP(t): Time-dependent explanations of machine learning survival models
Krzyziński, Mateusz, Spytek, Mikołaj, Baniecki, Hubert, Biecek, Przemysław
Machine and deep learning survival models demonstrate similar or even improved time-to-event prediction capabilities compared to classical statistical learning methods yet are too complex to be interpreted by humans. Several model-agnostic explanations are available to overcome this issue; however, none directly explain the survival function prediction. In this paper, we introduce SurvSHAP(t), the first time-dependent explanation that allows for interpreting survival black-box models. It is based on SHapley Additive exPlanations with solid theoretical foundations and a broad adoption among machine learning practitioners. The proposed methods aim to enhance precision diagnostics and support domain experts in making decisions. Experiments on synthetic and medical data confirm that SurvSHAP(t) can detect variables with a time-dependent effect, and its aggregation is a better determinant of the importance of variables for a prediction than SurvLIME. SurvSHAP(t) is model-agnostic and can be applied to all models with functional output. We provide an accessible implementation of time-dependent explanations in Python at http://github.com/MI2DataLab/survshap.