Goto

Collaborating Authors

 mechanics



Who's asking? User personas and the mechanics of latent misalignment

Neural Information Processing Systems

Studies show that safety-tuned models may nevertheless divulge harmful information. In this work, we show that whether they do so depends significantly on who they are talking to, which we refer to as . In fact, we find manipulating user persona to be more effective for eliciting harmful content than certain more direct attempts to control model refusal. We study both natural language prompting and activation steering as intervention methods and show that activation steering is significantly more effective at bypassing safety filters.We shed light on the mechanics of this phenomenon by showing that even when model generations are safe, harmful content can persist in hidden representations and can be extracted by decoding from earlier layers. We also show we can predict a persona's effect on refusal given only the geometry of its steering vector. Finally, we show that certain user personas induce the model to form more charitable interpretations of otherwise dangerous queries.


Dissecting Chain-of-Thought: Compositionality through In-Context Filtering and Learning

Neural Information Processing Systems

Chain-of-thought (CoT) is a method that enables language models to handle complex reasoning tasks by decomposing them into simpler steps. Despite its success, the underlying mechanics of CoT are not yet fully understood. In an attempt to shed light on this, our study investigates the impact of CoT on the ability of transformers to in-context learn a simple to study, yet general family of compositional functions: multi-layer perceptrons (MLPs). In this setting, we find that the success of CoT can be attributed to breaking down in-context learning of a compositional function into two distinct phases: focusing on and filtering data related to each step of the composition and in-context learning the single-step composition function. Through both experimental and theoretical evidence, we demonstrate how CoT significantly reduces the sample complexity of in-context learning (ICL) and facilitates the learning of complex functions that non-CoT methods struggle with. Furthermore, we illustrate how transformers can transition from vanilla in-context learning to mastering a compositional function with CoT by simply incorporating additional layers that perform the necessary data-filtering for CoT via the attention mechanism. In addition to these test-time benefits, we show CoT helps accelerate pretraining by learning shortcuts to represent complex functions and filtering plays an important role in this process. These findings collectively provide insights into the mechanics of CoT, inviting further investigation of its role in complex reasoning tasks.


ProgressGym: Alignment with a Millennium of Moral Progress

Neural Information Processing Systems

Frontier AI systems, including large language models (LLMs), hold increasing influence over the epistemology of human users. Such influence can reinforce prevailing societal values, potentially contributing to the lock-in of misguided moral beliefs and, consequently, the perpetuation of problematic moral practices on a broad scale.


Automating modeling in mechanics: LLMs as designers of physics-constrained neural networks for constitutive modeling of materials

Tacke, Marius, Busch, Matthias, Abdolazizi, Kian, Eichinger, Jonas, Linka, Kevin, Cyron, Christian, Aydin, Roland

arXiv.org Artificial Intelligence

Large language model (LLM)-based agentic frameworks increasingly adopt the paradigm of dynamically generating task-specific agents. We suggest that not only agents but also specialized software modules for scientific and engineering tasks can be generated on demand. We demonstrate this concept in the field of solid mechanics. There, so-called constitutive models are required to describe the relationship between mechanical stress and body deformation. Constitutive models are essential for both the scientific understanding and industrial application of materials. However, even recent data-driven methods of constitutive modeling, such as constitutive artificial neural networks (CANNs), still require substantial expert knowledge and human labor. We present a framework in which an LLM generates a CANN on demand, tailored to a given material class and dataset provided by the user. The framework covers LLM-based architecture selection, integration of physical constraints, and complete code generation. Evaluation on three benchmark problems demonstrates that LLM-generated CANNs achieve accuracy comparable to or greater than manually engineered counterparts, while also exhibiting reliable generalization to unseen loading scenarios and extrapolation to large deformations. These findings indicate that LLM-based generation of physics-constrained neural networks can substantially reduce the expertise required for constitutive modeling and represent a step toward practical end-to-end automation.




How Genes Have Harnessed Physics to Grow Living Things

WIRED

The same pulling force that causes "tears" in a glass of wine also shapes embryos. It's another example of how genes exploit mechanical forces for growth and development. Sip a glass of wine, and you will notice liquid continuously weeping down the wetted side of the glass. In 1855, James Thomson, brother of Lord Kelvin, explained in the that these wine "tears" or "legs" result from the difference in surface tension between alcohol and water. "This fact affords an explanation of several very curious motions," Thomson wrote.



"novel and insightful " with " a very intriguing and intuitive explanation for the mechanics of pruning " and " well

Neural Information Processing Systems

We thank the reviewers for their detailed, valuable reviews. We agree with the reviewers' concern and will ensure the final version includes the following data that These studies use a "more robust pruning regime" ( ResNet20 results, seem to apply to less "modern" regimes ( R3 Generalization gap vs. test accuracy; train accuracy not reported: We will update our manuscript to clearly discuss training accuracies and plot the generalization gaps. R3 "Pearson correlation and slope do not give an accurate characterization": We will move methodological details to the "main body" R3 Hyperparameter choices ("These networks reach much lower accuracy than expected... L1/L2 regular-30 Section 2 justified our exposition's focus on less-regularized models, which is not unprecedented: It led to our exploring pruning of the last convolutional layers of VGG11/ResNet18. R3 "[DSD] is worth a comparison" and "the claim... is hard to extract": DSD, we show that the parameters can reenter at zero or their original values (Figure D.2) while achieving the full R3 "[15] is not found to improve generalization":