AITopics | Elstner, Jannes

Collaborating Authors

Elstner, Jannes

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence

Wollschläger, Tom, Elstner, Jannes, Geisler, Simon, Cohen-Addad, Vincent, Günnemann, Stephan, Gasteiger, Johannes

arXiv.org Artificial IntelligenceFeb-24-2025

The safety alignment of large language models (LLMs) can be circumvented through adversarially crafted inputs, yet the mechanisms by which these attacks bypass safety barriers remain poorly understood. Prior work suggests that a single refusal direction in the model's activation space determines whether an LLM refuses a request. In this study, we propose a novel gradient-based approach to representation engineering and use it to identify refusal directions. Contrary to prior work, we uncover multiple independent directions and even multi-dimensional concept cones that mediate refusal. Moreover, we show that orthogonality alone does not imply independence under intervention, motivating the notion of representational independence that accounts for both linear and non-linear effects. Using this framework, we identify mechanistically independent refusal directions. We show that refusal mechanisms in LLMs are governed by complex spatial structures and identify functionally independent directions, confirming that multiple distinct mechanisms drive refusal behavior. Our gradient-based approach uncovers these mechanisms and can further serve as a foundation for future work on understanding LLMs.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.1742

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Optimizing CAD Models with Latent Space Manipulation

Elstner, Jannes, Schönhof, Raoul G. C., Tauber, Steffen, Huber, Marco F

arXiv.org Artificial IntelligenceMar-9-2023

When it comes to the optimization of CAD models in the automation domain, neural networks currently play only a minor role. Optimizing abstract features such as automation capability is challenging, since they can be very difficult to simulate, are too complex for rule-based systems, and also have little to no data available for machine-learning methods. On the other hand, image manipulation methods that can manipulate abstract features in images such as StyleCLIP have seen much success. They rely on the latent space of pretrained generative adversarial networks, and could therefore also make use of the vast amount of unlabeled CAD data. In this paper, we show that such an approach is also suitable for optimizing abstract automation-related features of CAD parts. We achieved this by extending StyleCLIP to work with CAD models in the form of voxel models, which includes using a 3D StyleGAN and a custom classifier. Finally, we demonstrate the ability of our system for the optimiziation of automation-related features by optimizing the grabability of various CAD models. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/) Peer review under the responsibility of the scientific committee of the 33rd CIRP Design Conference.

artificial intelligence, machine learning, optimization, (18 more...)

arXiv.org Artificial Intelligence

2303.12739

Country: Europe > Germany (0.47)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Feature Visualization within an Automated Design Assessment leveraging Explainable Artificial Intelligence Methods

Schönhof, Raoul, Werner, Artem, Elstner, Jannes, Zopcsak, Boldizsar, Awad, Ramez, Huber, Marco

arXiv.org Artificial IntelligenceJan-28-2022

Not only automation of manufacturing processes but also automation of automation procedures itself become increasingly relevant to automation research. In this context, automated capability assessment, mainly leveraged by deep learning systems driven from 3D CAD data, have been presented. Current assessment systems may be able to assess CAD data with regards to abstract features, e.g. the ability to automatically separate components from bulk goods, or the presence of gripping surfaces. Nevertheless, they suffer from the factor of black box systems, where an assessment can be learned and generated easily, but without any geometrical indicator about the reasons of the system's decision. By utilizing explainable AI (xAI) methods, we attempt to open up the black box. Explainable AI methods have been used in order to assess whether a neural network has successfully learned a given task or to analyze which features of an input might lead to an adversarial attack. These methods aim to derive additional insights into a neural network, by analyzing patterns from a given input and its impact to the network output. Within the NeuroCAD Project, xAI methods are used to identify geometrical features which are associated with a certain abstract feature. Within this work, a sensitivity analysis (SA), the layer-wise relevance propagation (LRP), the Gradient-weighted Class Activation Mapping (Grad-CAM) method as well as the Local Interpretable Model-Agnostic Explanations (LIME) have been implemented in the NeuroCAD environment, allowing not only to assess CAD models but also to identify features which have been relevant for the network decision. In the medium run, this might enable to identify regions of interest supporting product designers to optimize their models with regards to assembly processes.

explainable artificial intelligence method, machine learning, neural network, (3 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.procir.2021.05.075

2201.12107

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Add feedback