Better Language Model Inversion by Compactly Representing Next-Token Distributions

Jun-14-2026, 16:10:46 GMT–Neural Information Processing Systems

Language model inversion seeks to recover hidden prompts using only language model outputs. This capability has implications for security and accountability in language model deployments, such as leaking private information from an API-protected language model's system message. We propose a new method-- prompt inversion from logprob sequences (PILS)--that recovers hidden prompts by gleaning clues from the model's next-token probabilities over the course of multiple generation steps. Our method is enabled by a key insight: The vector-valued outputs of a language model occupy a low-dimensional subspace. This enables us to losslessly compress the full next-token probability distribution over multiple generation steps using a linear map, allowing more output information to be used for inversion.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Jun-14-2026, 16:10:46 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (1.00)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.67)

Industry:
- Information Technology (0.93)
- Government > Regional Government
  - North America Government > United States Government (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.72)
  - Machine Learning > Neural Networks
    - Deep Learning (0.97)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found