Goto

Collaborating Authors

 cipher




Aligning LLM Agents by Learning Latent Preference from User Edits

Neural Information Processing Systems

We study interactive learning of language agents based on user edits made to the agent's output. In a typical setting such as writing assistants, the user interacts with a language agent to generate a response given a context, and may optionally edit the agent response to personalize it based on their latent preference, in addition to improving the correctness. The edit feedback is naturally generated, making it a suitable candidate for improving the agent's alignment with the user's preference, and for reducing the cost of user edits over time. We propose a learning framework, PRELUDE that infers a description of the user's latent preference based on historic edit data and using it to define a prompt policy that drives future response generation. This avoids fine-tuning the agent, which is costly, challenging to scale with the number of users, and may even degrade its performance on other tasks.


CIPHER: Scalable Time Series Analysis for Physical Sciences with Application to Solar Wind Phenomena

Kobayashi, Jasmine R., Martin, Daniela, Filho, Valmir P Moraes, O'Brien, Connor, Hong, Jinsu, Saikia, Sudeshna Boro, Lamdouar, Hala, Miles, Nathan D., Scoczynski, Marcella, Stone, Mavis, Sundaresan, Sairam, Jungbluth, Anna, Muñoz-Jaramillo, Andrés, Samara, Evangelia, Gallego, Joseph

arXiv.org Artificial Intelligence

Labeling or classifying time series is a persistent challenge in the physical sciences, where expert annotations are scarce, costly, and often inconsistent. Yet robust labeling is essential to enable machine learning models for understanding, prediction, and forecasting. We present the \textit{Clustering and Indexation Pipeline with Human Evaluation for Recognition} (CIPHER), a framework designed to accelerate large-scale labeling of complex time series in physics. CIPHER integrates \textit{indexable Symbolic Aggregate approXimation} (iSAX) for interpretable compression and indexing, density-based clustering (HDBSCAN) to group recurring phenomena, and a human-in-the-loop step for efficient expert validation. Representative samples are labeled by domain scientists, and these annotations are propagated across clusters to yield systematic, scalable classifications. We evaluate CIPHER on the task of classifying solar wind phenomena in OMNI data, a central challenge in space weather research, showing that the framework recovers meaningful phenomena such as coronal mass ejections and stream interaction regions. Beyond this case study, CIPHER highlights a general strategy for combining symbolic representations, unsupervised learning, and expert knowledge to address label scarcity in time series across the physical sciences. The code and configuration files used in this study are publicly available to support reproducibility.


All Code, No Thought: Current Language Models Struggle to Reason in Ciphered Language

Guo, Shiyuan, Sleight, Henry, Roger, Fabien

arXiv.org Artificial Intelligence

Detecting harmful AI actions is important as AI agents gain adoption. Chain-of-thought (CoT) monitoring is one method widely used to detect adversarial attacks and AI misalignment. However, attackers and misaligned models might evade CoT monitoring through ciphered reasoning: reasoning hidden in encrypted, translated, or compressed text. To assess this risk, we test whether models can perform ciphered reasoning. For each of 28 different ciphers, we fine-tune and prompt up to 10 models to reason in that cipher. Across the models we test, we find an asymmetry: model accuracy can drop significantly when reasoning in ciphered text, even though models demonstrate comprehension of ciphered text by being able to translate it accurately to English. Even frontier models struggle with lesser-known ciphers, although they can reason accurately in well-known ciphers like rot13. We show that ciphered reasoning capability correlates with cipher prevalence in pretraining data. We also identify scaling laws showing that ciphered reasoning capability improves slowly with additional fine-tuning data. Our work suggests that evading CoT monitoring using ciphered reasoning may be an ineffective tactic for current models and offers guidance on constraining the development of this capability in future frontier models. Modern large language models (LLMs) rely on chain-of-thought (CoT) (Wei et al., 2022) to achieve strong performance (Guo et al., 2025). CoT increases the proportion of model computation that occurs in natural language (Korbak et al., 2025), which allows automated systems to monitor model CoTs for misaligned behavior. CoT monitoring has been employed in tasks as diverse as reinforcement learning (RL) training of frontier models (Baker et al., 2025), AI control (Kutasov et al., 2025), frontier model evaluation (METR, 2025), agent monitoring (Meinke et al., 2024), and jailbreak safeguards (Sharma et al., 2025) to supervise model behavior. In many of these settings, access to legible reasoning traces is critical; without it, monitors are much less capable (Baker et al., 2025).


Happiness is Sharing a Vocabulary: A Study of Transliteration Methods

Jung, Haeji, Kim, Jinju, Kim, Kyungjin, Roh, Youjeong, Mortensen, David R.

arXiv.org Artificial Intelligence

Transliteration has emerged as a promising means to bridge the gap between various languages in multilingual NLP, showing promising results especially for languages using non-Latin scripts. We investigate the degree to which shared script, overlapping token vocabularies, and shared phonology contribute to performance of multilingual models. To this end, we conduct controlled experiments using three kinds of transliteration (romanization, phonemic transcription, and substitution ciphers) as well as orthography. We evaluate each model on two downstream tasks -- named entity recognition (NER) and natural language inference (NLI) -- and find that romanization significantly outperforms other input types in 7 out of 8 evaluation settings, largely consistent with our hypothesis that it is the most effective approach. We further analyze how each factor contributed to the success, and suggest that having longer (subword) tokens shared with pre-trained languages leads to better utilization of the model.




ALICE: An Interpretable Neural Architecture for Generalization in Substitution Ciphers

Shen, Jeff, Smith, Lindsay M.

arXiv.org Artificial Intelligence

To enhance interpretability, we introduce a novel bijective decoding head that explicitly models permutations via the Gumbel-Sinkhorn method, enabling direct extraction of learned cipher mappings. Our architectural innovations and analysis methods are applicable beyond cryptograms and offer new insights into neural network generalization and interpretability. A cryptogram is a type of puzzle in which text is encrypted using a substitution cipher, and the user's task is to recover the original plaintext by inferring the cipher used for the encryption. Users typically solve cryptograms based on prior knowledge about language letter frequency distributions and common words. Originally developed for real encryption purposes, they are now popular in newspapers and puzzle books for entertainment purposes due to their simplicity. This simplicity, however, provides a unique testbed for testing and understanding generalization and reasoning in neural networks. In a one-to-one monoalphabetic substitution cipher, each letter in a fixed alphabet is mapped to a unique substitute character; this cipher represents a bijective mapping over the alphabet. While other ciphers exist (e.g., Vigen ` ere cipher, Playfair cipher), we focus here on one-to-one monoalphabetic substitution ciphers, as the problem space is extremely large but remains structurally simple to interpret. We hereafter mean one-to-one monoalphabetic substitution cipher when we say "cipher", unless otherwise specified. More formally, let Σ be a finite alphabet of size V representing allowable characters (e.g., 26 for the English alphabet).


Dorabella Cipher as Musical Inspiration

Hauer, Bradley, Choi, Colin, Hindle, Abram, Smallwood, Scott, Kondrak, Grzegorz

arXiv.org Artificial Intelligence

The Dorabella cipher is an encrypted note written by English composer Edward Elgar, which has defied decipherment attempts for more than a century. While most proposed solutions are English texts, we investigate the hypothesis that Dorabella represents enciphered music. We weigh the evidence for and against the hypothesis, devise a simplified music notation, and attempt to reconstruct a melody from the cipher. Our tools are n-gram models of music which we validate on existing music corpora enciphered using monoalphabetic substitution. By applying our methods to Dorabella, we produce a decipherment with musical qualities, which is then transformed via artful composition into a listenable melody. Far from arguing that the end result represents the only true solution, we instead frame the process of decipherment as part of the composition process.