AITopics | Brack, Manuel

Collaborating Authors

Brack, Manuel

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Cake that is Intelligence and Who Gets to Bake it: An AI Analogy and its Implications for Participation

Mundt, Martin, Ovalle, Anaelia, Friedrich, Felix, Pranav, A, Paul, Subarnaduti, Brack, Manuel, Kersting, Kristian, Agnew, William

arXiv.org Artificial IntelligenceFeb-6-2025

In a widely popular analogy by Turing Award Laureate Yann LeCun, machine intelligence has been compared to cake --where unsupervised learning forms the base, supervised learning adds the icing, and reinforcement learning is the cherry on top. We expand this "cake that is intelligence" analogy from a simple structural metaphor to the full life-cycle of AI systems, extending it to sourcing of ingredients (data), conception of recipes (instructions), the baking process (training), and the tasting and selling of the cake (evaluation and distribution). Leveraging our re-conceptualization, we describe each step's entailed social ramifications and how they are bounded by statistical assumptions within machine learning. Whereas these technical foundations and social impacts are deeply intertwined, they are often studied in isolation, creating barriers that restrict meaningful participation. Our re-conceptualization paves the way to bridge this gap by mapping where technical foundations interact with social outcomes, highlighting opportunities for cross-disciplinary dialogue. Finally, we conclude with actionable recommendations at each stage of the metaphorical AI cake's life-cycle, empowering prospective AI practitioners, users, and researchers, with increased awareness and ability to engage in broader AI discourse.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.03038

Country:

Europe > Germany > Bremen > Bremen (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.82)

Industry:

Law (1.00)
Health & Medicine (1.00)
Education (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
(2 more...)

Add feedback

LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps

Friedrich, Felix, Tedeschi, Simone, Schramowski, Patrick, Brack, Manuel, Navigli, Roberto, Nguyen, Huu, Li, Bo, Kersting, Kristian

arXiv.org Artificial IntelligenceDec-19-2024

Building safe Large Language Models (LLMs) across multiple languages is essential in ensuring both safe access and linguistic diversity. To this end, we introduce M-ALERT, a multilingual benchmark that evaluates the safety of LLMs in five languages: English, French, German, Italian, and Spanish. M-ALERT includes 15k high-quality prompts per language, totaling 75k, following the detailed ALERT taxonomy. Our extensive experiments on 10 state-of-the-art LLMs highlight the importance of language-specific safety analysis, revealing that models often exhibit significant inconsistencies in safety across languages and categories. For instance, Llama3.2 shows high unsafety in the category crime_tax for Italian but remains safe in other languages. Similar differences can be observed across all models. In contrast, certain categories, such as substance_cannabis and crime_propaganda, consistently trigger unsafe responses across models and languages. These findings underscore the need for robust multilingual safety practices in LLMs to ensure safe and responsible usage across diverse user communities.

large language model, natural language, qwen2, (17 more...)

arXiv.org Artificial Intelligence

2412.15035

Country:

North America > United States (0.93)
Europe (0.92)

Genre: Research Report > New Finding (0.87)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

SCAR: Sparse Conditioned Autoencoders for Concept Detection and Steering in LLMs

Härle, Ruben, Friedrich, Felix, Brack, Manuel, Deiseroth, Björn, Schramowski, Patrick, Kersting, Kristian

arXiv.org Artificial IntelligenceDec-5-2024

Large Language Models (LLMs) have demonstrated remarkable capabilities in generating human-like text, but their output may not be aligned with the user or even produce harmful content. This paper presents a novel approach to detect and steer concepts such as toxicity before generation. We introduce the Sparse Conditioned Autoencoder (SCAR), a single trained module that extends the otherwise untouched LLM. SCAR ensures full steerability, towards and away from concepts (e.g., toxic content), without compromising the quality of the model's text generation on standard evaluation benchmarks. We demonstrate the effective application of our approach through a variety of concepts, including toxicity, safety, and writing style alignment. As such, this work establishes a robust framework for controlling LLM generations, ensuring their ethical and safe deployment in real-world applications.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2411.07122

Country: North America > United States (0.68)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (1.00)
Media (0.93)
Government > Regional Government (0.68)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Core Tokensets for Data-efficient Sequential Training of Transformers

Paul, Subarnaduti, Brack, Manuel, Schramowski, Patrick, Kersting, Kristian, Mundt, Martin

arXiv.org Artificial IntelligenceOct-8-2024

Deep networks are frequently tuned to novel tasks and continue learning from ongoing data streams. Such sequential training requires consolidation of new and past information, a challenge predominantly addressed by retaining the most important data points - formally known as coresets. Traditionally, these coresets consist of entire samples, such as images or sentences. However, recent transformer architectures operate on tokens, leading to the famous assertion that an image is worth 16x16 words. Intuitively, not all of these tokens are equally informative or memorable. Going beyond coresets, we thus propose to construct a deeper-level data summary on the level of tokens. Our respectively named core tokensets both select the most informative data points and leverage feature attribution to store only their most relevant features. We demonstrate that core tokensets yield significant performance retention in incremental image classification, open-ended visual question answering, and continual image captioning with significantly reduced memory. In fact, we empirically find that a core tokenset of 1% of the data performs comparably to at least a twice as large and up to 10 times larger coreset. Deep learning models are rarely deployed in static environments and instead need to be continuously adjusted to their ever-changing surroundings. As distribution shifts occur over time, we require models to retain their performance on previous settings while accommodating new examples (Hadsell et al., 2020; Kudithipudi et al., 2022; Mundt et al., 2023). Naively, re-training a model from scratch for every change in the environment quickly becomes unfeasible for complex architectures with billions of parameters (Touvron et al., 2023; Brown et al., 2020). Consequently, research has focused on identifying small, representative subsets of the training data (Lopez-Paz & Ranzato, 2017), which can later be replayed continuously to avoid (catastrophic) forgetting (McCloskey & Cohen, 1989) of previously acquired information (Graham et al., 2021; Hassani et al., 2021; Xu et al., 2021).

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2410.058

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.64)

Industry: Education > Educational Setting > Continuing Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

T-FREE: Tokenizer-Free Generative LLMs via Sparse Representations for Memory-Efficient Embeddings

Deiseroth, Björn, Brack, Manuel, Schramowski, Patrick, Kersting, Kristian, Weinbach, Samuel

arXiv.org Artificial IntelligenceJun-27-2024

Tokenizers are crucial for encoding information in Large Language Models, but their development has recently stagnated, and they contain inherent weaknesses. Major limitations include computational overhead, ineffective vocabulary use, and unnecessarily large embedding and head layers. Additionally, their performance is biased towards a reference corpus, leading to reduced effectiveness for underrepresented languages. To remedy these issues, we propose T-FREE, which directly embeds words through sparse activation patterns over character triplets, and does not require a reference corpus. T-FREE inherently exploits morphological similarities and allows for strong compression of embedding layers. In our exhaustive experimental evaluation, we achieve competitive downstream performance with a parameter reduction of more than 85% on these layers. Further, T-FREE shows significant improvements in cross-lingual transfer learning.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2406.19223

Country: Europe > Belgium (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LLavaGuard: VLM-based Safeguards for Vision Dataset Curation and Safety Assessment

Helff, Lukas, Friedrich, Felix, Brack, Manuel, Kersting, Kristian, Schramowski, Patrick

arXiv.org Artificial IntelligenceJun-7-2024

We introduce LlavaGuard, a family of VLM-based safeguard models, offering a versatile framework for evaluating the safety compliance of visual content. Specifically, we designed LlavaGuard for dataset annotation and generative model safeguarding. To this end, we collected and annotated a high-quality visual dataset incorporating a broad safety taxonomy, which we use to tune VLMs on context-aware safety risks. As a key innovation, LlavaGuard's new responses contain comprehensive information, including a safety rating, the violated safety categories, and an in-depth rationale. Further, our introduced customizable taxonomy categories enable the context-specific alignment of LlavaGuard to various scenarios. Our experiments highlight the capabilities of LlavaGuard in complex and real-world applications. We provide checkpoints ranging from 7B to 34B parameters demonstrating state-of-the-art performance, with even the smallest models outperforming baselines like GPT-4. We make our dataset and model weights publicly available and invite further research to address the diverse needs of communities and contexts.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2406.05113

Country:

Europe (0.28)
North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report (0.64)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.70)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.68)
Government > Regional Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

DeiSAM: Segment Anything with Deictic Prompting

Shindo, Hikaru, Brack, Manuel, Sudhakaran, Gopika, Dhami, Devendra Singh, Schramowski, Patrick, Kersting, Kristian

arXiv.org Artificial IntelligenceFeb-21-2024

Large-scale, pre-trained neural networks have demonstrated strong capabilities in various tasks, including zero-shot image segmentation. To identify concrete objects in complex scenes, humans instinctively rely on deictic descriptions in natural language, i.e., referring to something depending on the context such as "The object that is on the desk and behind the cup.". However, deep learning approaches cannot reliably interpret such deictic representations due to their lack of reasoning capabilities in complex scenarios. To remedy this issue, we propose DeiSAM -- a combination of large pre-trained neural networks with differentiable logic reasoners -- for deictic promptable segmentation. Given a complex, textual segmentation description, DeiSAM leverages Large Language Models (LLMs) to generate first-order logic rules and performs differentiable forward reasoning on generated scene graphs. Subsequently, DeiSAM segments objects by matching them to the logically inferred image regions. As part of our evaluation, we propose the Deictic Visual Genome (DeiVG) dataset, containing paired visual input and complex, deictic textual prompts. Our empirical results demonstrate that DeiSAM is a substantial improvement over purely data-driven baselines for deictic promptable segmentation.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2402.14123

Country: Europe (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

MultiFusion: Fusing Pre-Trained Models for Multi-Lingual, Multi-Modal Image Generation

Bellagente, Marco, Brack, Manuel, Teufel, Hannah, Friedrich, Felix, Deiseroth, Björn, Eichenberg, Constantin, Dai, Andrew, Baldock, Robert, Nanda, Souradeep, Oostermeijer, Koen, Cruz-Salinas, Andres Felipe, Schramowski, Patrick, Kersting, Kristian, Weinbach, Samuel

arXiv.org Artificial IntelligenceDec-20-2023

The recent popularity of text-to-image diffusion models (DM) can largely be attributed to the intuitive interface they provide to users. The intended generation can be expressed in natural language, with the model producing faithful interpretations of text prompts. However, expressing complex or nuanced ideas in text alone can be difficult. To ease image generation, we propose MultiFusion that allows one to express complex and nuanced concepts with arbitrarily interleaved inputs of multiple modalities and languages. MutliFusion leverages pre-trained models and aligns them for integration into a cohesive system, thereby avoiding the need for extensive training from scratch. Our experimental results demonstrate the efficient transfer of capabilities from individual modules to the downstream model. Specifically, the fusion of all independent components allows the image generation module to utilize multilingual, interleaved multimodal inputs despite being trained solely on monomodal data in a single language.

artificial intelligence, machine learning, usion, (18 more...)

arXiv.org Artificial Intelligence

2305.15296

Country: Europe (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

LEDITS++: Limitless Image Editing using Text-to-Image Models

Brack, Manuel, Friedrich, Felix, Kornmeier, Katharina, Tsaban, Linoy, Schramowski, Patrick, Kersting, Kristian, Passos, Apolinário

arXiv.org Artificial IntelligenceNov-28-2023

Text-to-image diffusion models have recently received increasing interest for their astonishing ability to produce high-fidelity images from solely text inputs. Subsequent research efforts aim to exploit and apply their capabilities to real image editing. However, existing image-to-image methods are often inefficient, imprecise, and of limited versatility. They either require time-consuming fine-tuning, deviate unnecessarily strongly from the input image, and/or lack support for multiple, simultaneous edits. To address these issues, we introduce LEDITS++, an efficient yet versatile and precise textual image manipulation technique. LEDITS++'s novel inversion approach requires no tuning nor optimization and produces high-fidelity results with a few diffusion steps. Second, our methodology supports multiple simultaneous edits and is architecture-agnostic. Third, we use a novel implicit masking technique that limits changes to relevant image regions. We propose the novel TEdBench++ benchmark as part of our exhaustive evaluation. Our results demonstrate the capabilities of LEDITS++ and its improvements over previous methods. The project page is available at https://leditsplusplus-project.static.hf.space .

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2311.16711

Country: Europe (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Media > Photography (0.63)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

AtMan: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation

Deiseroth, Björn, Deb, Mayukh, Weinbach, Samuel, Brack, Manuel, Schramowski, Patrick, Kersting, Kristian

arXiv.org Artificial IntelligenceNov-5-2023

Generative transformer models have become increasingly complex, with large numbers of parameters and the ability to process multiple input modalities. Current methods for explaining their predictions are resource-intensive. Most crucially, they require prohibitively large amounts of extra memory, since they rely on backpropagation which allocates almost twice as much GPU memory as the forward pass. This makes it difficult, if not impossible, to use them in production. We present AtMan that provides explanations of generative transformer models at almost no extra cost. Specifically, AtMan is a modality-agnostic perturbation method that manipulates the attention mechanisms of transformers to produce relevance maps for the input with respect to the output prediction. Instead of using backpropagation, AtMan applies a parallelizable token-based search method based on cosine similarity neighborhood in the embedding space. Our exhaustive experiments on text and image-text benchmarks demonstrate that AtMan outperforms current state-of-the-art gradient-based methods on several metrics while being computationally efficient. As such, AtMan is suitable for use in large model inference deployments.

explanation, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2301.0811

Country:

North America > United States (1.00)
Europe (0.93)

Genre: Research Report (0.82)

Industry: Government > Regional Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback