AITopics | tracr

Collaborating Authors

tracr

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Tracr: Compiled Transformers as a Laboratory for Interpretability David Lindner

Neural Information Processing SystemsFeb-14-2026, 22:18:19 GMT

We show how to "compile" human-readable programs into standard decoder-only transformer models.

large language model, machine learning, selector, (22 more...)

Neural Information Processing Systems

Genre:

Overview (0.68)
Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Tracr: Compiled Transformers as a Laboratory for Interpretability

Neural Information Processing SystemsDec-26-2025, 03:47:17 GMT

We show how to compile human-readable programs into standard decoder-only transformer models. Our compiler, Tracr, generates models with known structure. This structure can be used to design experiments. For example, we use it to study superposition in transformers that execute multi-step algorithms. Additionally, the known structure of Tracr-compiled models can serve as for evaluating interpretability methods. Commonly, because the programs learned by transformers are unknown it is unclear whether an interpretation succeeded. We demonstrate our approach by implementing and examining programs including computing token frequencies, sorting, and parenthesis checking.

name change, tracr, transformer, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Tracr: Compiled Transformers as a Laboratory for Interpretability David Lindner

Neural Information Processing SystemsOct-8-2025, 22:39:00 GMT

We show how to "compile" human-readable programs into standard decoder-only transformer models.

large language model, machine learning, selector, (21 more...)

Neural Information Processing Systems

Genre:

Overview (0.68)
Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

A Large Language Model-Supported Threat Modeling Framework for Transportation Cyber-Physical Systems

Salek, M Sabbir, Chowdhury, Mashrur, Munir, Muhaimin Bin, Cai, Yuchen, Hasan, Mohammad Imtiaz, Tine, Jean-Michel, Khan, Latifur, Rahman, Mizanur

arXiv.org Artificial IntelligenceJul-29-2025

Existing threat modeling frameworks related to transportation cyber-physical systems (CPS) are often narrow in scope, labor-intensive, and require substantial cybersecurity expertise. To this end, we introduce the Transportation Cybersecurity and Resiliency Threat Modeling Framework (TraCR-TMF), a large language model (LLM)-based threat modeling framework for transportation CPS that requires limited cybersecurity expert intervention. TraCR-TMF identifies threats, potential attack techniques, and relevant countermeasures for transportation CPS. Three LLM-based approaches support these identifications: (i) a retrieval-augmented generation approach requiring no cybersecurity expert intervention, (ii) an in-context learning approach with low expert intervention, and (iii) a supervised fine-tuning approach with moderate expert intervention. TraCR-TMF offers LLM-based attack path identification for critical assets based on vulnerabilities across transportation CPS entities. Additionally, it incorporates the Common Vulnerability Scoring System (CVSS) scores of known exploited vulnerabilities to prioritize threat mitigations. The framework was evaluated through two cases. First, the framework identified relevant attack techniques for various transportation CPS applications, 73% of which were validated by cybersecurity experts as correct. Second, the framework was used to identify attack paths for a target asset in a real-world cyberattack incident. TraCR-TMF successfully predicted exploitations, like lateral movement of adversaries, data exfiltration, and data encryption for ransomware, as reported in the incident. These findings show the efficacy of TraCR-TMF in transportation CPS threat modeling, while reducing the need for extensive involvement of cybersecurity experts. To facilitate real-world adoptions, all our codes are shared via an open-source repository.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2506.00831

Country:

Asia (0.67)
North America > United States > Texas (0.28)
North America > United States > South Carolina (0.28)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Government > Military > Cyberwarfare (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Tracr: Compiled Transformers as a Laboratory for Interpretability

Neural Information Processing SystemsJan-19-2025, 09:00:13 GMT

interpretability, tracr, transformer, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.49)

Add feedback

Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks

Jain, Samyak, Kirk, Robert, Lubana, Ekdeep Singh, Dick, Robert P., Tanaka, Hidenori, Grefenstette, Edward, Rocktäschel, Tim, Krueger, David Scott

arXiv.org Artificial IntelligenceNov-21-2023

Fine-tuning large pre-trained models has become the de facto strategy for developing both task-specific and general-purpose machine learning systems, including developing models that are safe to deploy. Despite its clear importance, there has been minimal work that explains how fine-tuning alters the underlying capabilities learned by a model during pretraining: does fine-tuning yield entirely novel capabilities or does it just modulate existing ones? We address this question empirically in synthetic, controlled settings where we can use mechanistic interpretability tools (e.g., network pruning and probing) to understand how the model's underlying capabilities are changing. We perform an extensive analysis of the effects of fine-tuning in these settings, and show that: (i) fine-tuning rarely alters the underlying model capabilities; (ii) a minimal transformation, which we call a 'wrapper', is typically learned on top of the underlying model capabilities, creating the illusion that they have been modified; and (iii) further fine-tuning on a task where such hidden capabilities are relevant leads to sample-efficient 'revival' of the capability, i.e., the model begins reusing these capability after only a few gradient steps. This indicates that practitioners can unintentionally remove a model's safety wrapper merely by fine-tuning it on a, e.g., superficially unrelated, downstream task. We additionally perform analysis on language models trained on the TinyStories dataset to support our claims in a more realistic setup.

arxiv preprint arxiv, fine-tuning, spurious correlation, (14 more...)

arXiv.org Artificial Intelligence

2311.12786

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
(5 more...)

Genre: Research Report (0.81)

Industry: Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Tracr: Compiled Transformers as a Laboratory for Interpretability

Lindner, David, Kramár, János, Farquhar, Sebastian, Rahtz, Matthew, McGrath, Thomas, Mikulik, Vladimir

arXiv.org Machine LearningNov-3-2023

We show how to "compile" human-readable programs into standard decoder-only transformer models. Our compiler, Tracr, generates models with known structure. This structure can be used to design experiments. For example, we use it to study "superposition" in transformers that execute multi-step algorithms. Additionally, the known structure of Tracr-compiled models can serve as ground-truth for evaluating interpretability methods. Commonly, because the "programs" learned by transformers are unknown it is unclear whether an interpretation succeeded. We demonstrate our approach by implementing and examining programs including computing token frequencies, sorting, and parenthesis checking. We provide an open-source implementation of Tracr at https://github.com/google-deepmind/tracr.

residual stream, selector, transformer, (16 more...)

arXiv.org Machine Learning

2301.05062

Genre:

Research Report (1.00)
Overview (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback