Not enough data to create a plot.
Try a different view from the menu above.
Berlin
I ditched Google Search. Now I'm saving the planet with Ecosia instead
Ecosia was founded in 2009 by Christian Kroll, who felt compelled to do something after he saw the effects of deforestation while on a trip around the world. And so Ecosia was born, a search engine that puts its advertising revenue towards tree-planting projects. Ecosia started off as a search engine, but has since expanded with a few other products that include Ecosia Browser (a Chromium-based web browser), Ecosia Chat (an AI chatbot powered by OpenAI's API), and Freetree (a browser extension that plants trees as you shop). Ecosia is a not-for-profit tech company based in Berlin, Germany, that dedicates all profits to the betterment of our planet. In addition to turning every web search into an opportunity to plant and protect trees, Ecosia invests in various initiatives that further regenerative agriculture, renewable energy, and fighting climate change.
Arctique: An artificial histopathological dataset unifying realism and controllability for uncertainty quantification, Vanessa E. Guarino
Uncertainty Quantification (UQ) is crucial for reliable image segmentation. Yet, while the field sees continual development of novel methods, a lack of agreedupon benchmarks limits their systematic comparison and evaluation: Current UQ methods are typically tested either on overly simplistic toy datasets or on complex real-world datasets that do not allow to discern true uncertainty. To unify both controllability and complexity, we introduce Arctique, a procedurally generated dataset modeled after histopathological colon images. We chose histopathological images for two reasons: 1) their complexity in terms of intricate object structures and highly variable appearance, which yields challenging segmentation problems, and 2) their broad prevalence for medical diagnosis and respective relevance of high-quality UQ. To generate Arctique, we established a Blender-based framework for 3D scene creation with intrinsic noise manipulation. Arctique contains up to 50,000 rendered images with precise masks as well as noisy label simulations. We show that by independently controlling the uncertainty in both images and labels, we can effectively study the performance of several commonly used UQ methods. Hence, Arctique serves as a critical resource for benchmarking and advancing UQ techniques and other methodologies in complex, multi-object environments, bridging the gap between realism and controllability. All code is publicly available, allowing re-creation and controlled manipulations of our shipped images as well as creation and rendering of new scenes.
Interaction-Force Transport Gradient Flows Pavel Dvurechensky Humboldt University of Berlin Weierstrass Institute for Berlin, Germany Applied Analysis and Stochastics & HSE University
This paper presents a new gradient flow dissipation geometry over non-negative and probability measures. This is motivated by a principled construction that combines the unbalanced optimal transport and interaction forces modeled by reproducing kernels. Using a precise connection between the Hellinger geometry and the maximum mean discrepancy (MMD), we propose the interaction-force transport (IFT) gradient flows and its spherical variant via an infimal convolution of the Wasserstein and spherical MMD tensors. We then develop a particle-based optimization algorithm based on the JKO-splitting scheme of the mass-preserving spherical IFT gradient flows. Finally, we provide both theoretical global exponential convergence guarantees and improved empirical simulation results for applying the IFT gradient flows to the sampling task of MMD-minimization. Furthermore, we prove that the spherical IFT gradient flow enjoys the best of both worlds by providing the global exponential convergence guarantee for both the MMD and KL energy.
RELICT: A Replica Detection Framework for Medical Image Generation
Aydin, Orhun Utku, Koch, Alexander, Hilbert, Adam, Rieger, Jana, Lohrke, Felix, Ishida, Fujimaro, Tanioka, Satoru, Frey, Dietmar
Despite the potential of synthetic medical data for augmenting and improving the generalizability of deep learning models, memorization in generative models can lead to unintended leakage of sensitive patient information and limit model utility. Thus, the use of memorizing generative models in the medical domain can jeopardize patient privacy. We propose a framework for identifying replicas, i.e. nearly identical copies of the training data, in synthetic medical image datasets. Our REpLIca deteCTion (RELICT) framework for medical image generative models evaluates image similarity using three complementary approaches: (1) voxel-level analysis, (2) feature-level analysis by a pretrained medical foundation model, and (3) segmentation-level analysis. Two clinically relevant 3D generative modelling use cases were investigated: non-contrast head CT with intracerebral hemorrhage (N=774) and time-of-flight MR angiography of the Circle of Willis (N=1,782). Expert visual scoring was used as the reference standard to assess the presence of replicas. We report the balanced accuracy at the optimal threshold to assess replica classification performance. The reference visual rating identified 45 of 50 and 5 of 50 generated images as replicas for the NCCT and TOF-MRA use cases, respectively. Image-level and feature-level measures perfectly classified replicas with a balanced accuracy of 1 when an optimal threshold was selected for the NCCT use case. A perfect classification of replicas for the TOF-MRA case was not possible at any threshold, with the segmentation-level analysis achieving a balanced accuracy of 0.79. Replica detection is a crucial but neglected validation step for the development of generative models in medical imaging. The proposed RELICT framework provides a standardized, easy-to-use tool for replica detection and aims to facilitate responsible and ethical medical image synthesis.
Learning to Execute: Efficiently Learning Universal Plan-Conditioned Policies in Robotics Ingmar Schubert, and Marc Toussaint Learning and Intelligent Systems Group, TU Berlin, Germany
Applications of Reinforcement Learning (RL) in robotics are often limited by high data demand. On the other hand, approximate models are readily available in many robotics scenarios, making model-based approaches like planning a data-efficient alternative. Still, the performance of these methods suffers if the model is imprecise or wrong. In this sense, the respective strengths and weaknesses of RL and modelbased planners are complementary. In the present work, we investigate how both approaches can be integrated into one framework that combines their strengths. We introduce Learning to Execute (L2E), which leverages information contained in approximate plans to learn universal policies that are conditioned on plans. In our robotic manipulation experiments, L2E exhibits increased performance when compared to pure RL, pure planning, or baseline methods combining learning and planning.
Atlas: A Novel Pathology Foundation Model by Mayo Clinic, Charit\'e, and Aignostics
Alber, Maximilian, Tietz, Stephan, Dippel, Jonas, Milbich, Timo, Lesort, Timothée, Korfiatis, Panos, Krügener, Moritz, Cancer, Beatriz Perez, Shah, Neelay, Möllers, Alexander, Seegerer, Philipp, Carpen-Amarie, Alexandra, Standvoss, Kai, Dernbach, Gabriel, de Jong, Edwin, Schallenberg, Simon, Kunft, Andreas, von Ankershoffen, Helmut Hoffer, Schaeferle, Gavin, Duffy, Patrick, Redlon, Matt, Jurmeister, Philipp, Horst, David, Ruff, Lukas, Müller, Klaus-Robert, Klauschen, Frederick, Norgan, Andrew
Recent advances in digital pathology have demonstrated the effectiveness of foundation models across diverse applications. In this report, we present Atlas, a novel vision foundation model based on the RudolfV approach. Our model was trained on a dataset comprising 1.2 million histopathology whole slide images, collected from two medical institutions: Mayo Clinic and Charit\'e - Universt\"atsmedizin Berlin. Comprehensive evaluations show that Atlas achieves state-of-the-art performance across twenty-one public benchmark datasets, even though it is neither the largest model by parameter count nor by training dataset size.
"Oh LLM, I'm Asking Thee, Please Give Me a Decision Tree": Zero-Shot Decision Tree Induction and Embedding with Large Language Models
Knauer, Ricardo, Koddenbrock, Mario, Wallsberger, Raphael, Brisson, Nicholas M., Duda, Georg N., Falla, Deborah, Evans, David W., Rodner, Erik
Large language models (LLMs) provide powerful means to leverage prior knowledge for predictive modeling when data is limited. In this work, we demonstrate how LLMs can use their compressed world knowledge to generate intrinsically interpretable machine learning models, i.e., decision trees, without any training data. We find that these zero-shot decision trees can surpass data-driven trees on some small-sized tabular datasets and that embeddings derived from these trees perform on par with data-driven tree-based embeddings on average. Our knowledge-driven decision tree induction and embedding approaches therefore serve as strong new baselines for data-driven machine learning methods in the low-data regime.
Evaluating Large Language Models with fmeval
Schwöbel, Pola, Franceschi, Luca, Zafar, Muhammad Bilal, Vasist, Keerthan, Malhotra, Aman, Shenhar, Tomer, Tailor, Pinal, Yilmaz, Pinar, Diamond, Michael, Donini, Michele
fmeval is an open source library to evaluate large language models (LLMs) in a range of tasks. It helps practitioners evaluate their model for task performance and along multiple responsible AI dimensions. This paper presents the library and exposes its underlying design principles: simplicity, coverage, extensibility and performance. We then present how these were implemented in the scientific and engineering choices taken when developing fmeval. A case study demonstrates a typical use case for the library: picking a suitable model for a question answering task. We close by discussing limitations and further work in the development of the library. fmeval can be found at https://github.com/aws/fmeval.
Multi-Modal Dataset Creation for Federated~Learning with DICOM Structured Reports
Tölle, Malte, Burger, Lukas, Kelm, Halvar, André, Florian, Bannas, Peter, Diller, Gerhard, Frey, Norbert, Garthe, Philipp, Groß, Stefan, Hennemuth, Anja, Kaderali, Lars, Krüger, Nina, Leha, Andreas, Martin, Simon, Meyer, Alexander, Nagel, Eike, Orwat, Stefan, Scherer, Clemens, Seiffert, Moritz, Seliger, Jan Moritz, Simm, Stefan, Friede, Tim, Seidler, Tim, Engelhardt, Sandy
Purpose: Federated training is often hindered by heterogeneous datasets due to divergent data storage options, inconsistent naming schemes, varied annotation procedures, and disparities in label quality. This is particularly evident in the emerging multi-modal learning paradigms, where dataset harmonization including a uniform data representation and filtering options are of paramount importance. Methods: DICOM structured reports enable the standardized linkage of arbitrary information beyond the imaging domain and can be used within Python deep learning pipelines with highdicom. Building on this, we developed an open platform for data integration and interactive filtering capabilities that simplifies the process of assembling multi-modal datasets. Results: In this study, we extend our prior work by showing its applicability to more and divergent data types, as well as streamlining datasets for federated training within an established consortium of eight university hospitals in Germany. We prove its concurrent filtering ability by creating harmonized multi-modal datasets across all locations for predicting the outcome after minimally invasive heart valve replacement. The data includes DICOM data (i.e. computed tomography images, electrocardiography scans) as well as annotations (i.e. calcification segmentations, pointsets and pacemaker dependency), and metadata (i.e. prosthesis and diagnoses). Conclusion: Structured reports bridge the traditional gap between imaging systems and information systems. Utilizing the inherent DICOM reference system arbitrary data types can be queried concurrently to create meaningful cohorts for clinical studies. The graphical interface as well as example structured report templates will be made publicly available.