AITopics

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.54)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.42)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.36)

Neural Information Processing SystemsFeb-18-2026, 18:03:29 GMT

Aligning LLM Agents by Learning Latent Preference from User Edits Ge Gao Alexey T aymanov Eduardo Salinas Paul Mineiro Dipendra Misra Department of Computer Science, Cornell University

The inferred user preference descriptions are used to define prompts for generating responses in the future.

large language model, machine learning, natural language, (20 more...)

Country:

North America > Dominican Republic > Puerto Plata > Puerto Plata (0.05)
North America > United States > Maryland (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Atlantic Ocean (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Media (0.68)
Government (0.67)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.67)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsFeb-12-2026, 14:15:55 GMT

4acb23d5d9b4bea566799afac0ee3125-Paper-Conference.pdf

machine learning, natural language, recognition, (19 more...)

Country:

Asia > Indonesia > Bali (0.04)
Europe > France (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Kobayashi, Jasmine R., Martin, Daniela, Filho, Valmir P Moraes, O'Brien, Connor, Hong, Jinsu, Saikia, Sudeshna Boro, Lamdouar, Hala, Miles, Nathan D., Scoczynski, Marcella, Stone, Mavis, Sundaresan, Sairam, Jungbluth, Anna, Muñoz-Jaramillo, Andrés, Samara, Evangelia, Gallego, Joseph

CIPHER: Scalable Time Series Analysis for Physical Sciences with Application to Solar Wind Phenomena

arXiv.org Artificial IntelligenceOct-27-2025

Labeling or classifying time series is a persistent challenge in the physical sciences, where expert annotations are scarce, costly, and often inconsistent. Yet robust labeling is essential to enable machine learning models for understanding, prediction, and forecasting. We present the \textit{Clustering and Indexation Pipeline with Human Evaluation for Recognition} (CIPHER), a framework designed to accelerate large-scale labeling of complex time series in physics. CIPHER integrates \textit{indexable Symbolic Aggregate approXimation} (iSAX) for interpretable compression and indexing, density-based clustering (HDBSCAN) to group recurring phenomena, and a human-in-the-loop step for efficient expert validation. Representative samples are labeled by domain scientists, and these annotations are propagated across clusters to yield systematic, scalable classifications. We evaluate CIPHER on the task of classifying solar wind phenomena in OMNI data, a central challenge in space weather research, showing that the framework recovers meaningful phenomena such as coronal mass ejections and stream interaction regions. Beyond this case study, CIPHER highlights a general strategy for combining symbolic representations, unsupervised learning, and expert knowledge to address label scarcity in time series across the physical sciences. The code and configuration files used in this study are publicly available to support reproducibility.

artificial intelligence, machine learning, time sery, (15 more...)

2510.21022

Country: North America > United States > Colorado (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Government > Space Agency (0.49)
Information Technology (0.47)
Government > Regional Government > North America Government > United States Government (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Guo, Shiyuan, Sleight, Henry, Roger, Fabien

All Code, No Thought: Current Language Models Struggle to Reason in Ciphered Language

arXiv.org Artificial IntelligenceOct-17-2025

Detecting harmful AI actions is important as AI agents gain adoption. Chain-of-thought (CoT) monitoring is one method widely used to detect adversarial attacks and AI misalignment. However, attackers and misaligned models might evade CoT monitoring through ciphered reasoning: reasoning hidden in encrypted, translated, or compressed text. To assess this risk, we test whether models can perform ciphered reasoning. For each of 28 different ciphers, we fine-tune and prompt up to 10 models to reason in that cipher. Across the models we test, we find an asymmetry: model accuracy can drop significantly when reasoning in ciphered text, even though models demonstrate comprehension of ciphered text by being able to translate it accurately to English. Even frontier models struggle with lesser-known ciphers, although they can reason accurately in well-known ciphers like rot13. We show that ciphered reasoning capability correlates with cipher prevalence in pretraining data. We also identify scaling laws showing that ciphered reasoning capability improves slowly with additional fine-tuning data. Our work suggests that evading CoT monitoring using ciphered reasoning may be an ineffective tactic for current models and offers guidance on constraining the development of this capability in future frontier models. Modern large language models (LLMs) rely on chain-of-thought (CoT) (Wei et al., 2022) to achieve strong performance (Guo et al., 2025). CoT increases the proportion of model computation that occurs in natural language (Korbak et al., 2025), which allows automated systems to monitor model CoTs for misaligned behavior. CoT monitoring has been employed in tasks as diverse as reinforcement learning (RL) training of frontier models (Baker et al., 2025), AI control (Kutasov et al., 2025), frontier model evaluation (METR, 2025), agent monitoring (Meinke et al., 2024), and jailbreak safeguards (Sharma et al., 2025) to supervise model behavior. In many of these settings, access to legible reasoning traces is critical; without it, monitors are much less capable (Baker et al., 2025).

large language model, machine learning, natural language, (17 more...)

2510.09714

Genre: Research Report > New Finding (0.93)

Industry:

Information Technology > Security & Privacy (0.34)
Government > Military (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceOct-14-2025

Happiness is Sharing a Vocabulary: A Study of Transliteration Methods

Jung, Haeji, Kim, Jinju, Kim, Kyungjin, Roh, Youjeong, Mortensen, David R.

Transliteration has emerged as a promising means to bridge the gap between various languages in multilingual NLP, showing promising results especially for languages using non-Latin scripts. We investigate the degree to which shared script, overlapping token vocabularies, and shared phonology contribute to performance of multilingual models. To this end, we conduct controlled experiments using three kinds of transliteration (romanization, phonemic transcription, and substitution ciphers) as well as orthography. We evaluate each model on two downstream tasks -- named entity recognition (NER) and natural language inference (NLI) -- and find that romanization significantly outperforms other input types in 7 out of 8 evaluation settings, largely consistent with our hypothesis that it is the most effective approach. We further analyze how each factor contributed to the success, and suggest that having longer (subword) tokens shared with pre-trained languages leads to better utilization of the model.

artificial intelligence, computational linguistic, natural language, (18 more...)

2510.10827

Country:

North America > United States (1.00)
Europe (1.00)
Asia (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Neural Information Processing SystemsOct-10-2025, 21:48:04 GMT

Aligning LLM Agents by Learning Latent Preference from User Edits Ge Gao Alexey T aymanov Eduardo Salinas Paul Mineiro Dipendra Misra Department of Computer Science, Cornell University

The inferred user preference descriptions are used to define prompts for generating responses in the future.

agent, user edit, user preference, (16 more...)

Country:

North America > Dominican Republic > Puerto Plata > Puerto Plata (0.05)
North America > United States > Maryland (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Atlantic Ocean (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Media (0.68)
Government (0.67)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.67)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-10-2025, 01:30:38 GMT

4acb23d5d9b4bea566799afac0ee3125-Paper-Conference.pdf

dataset, recognition, text recognition, (15 more...)

Country:

Asia > Indonesia > Bali (0.04)
Europe > France (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Shen, Jeff, Smith, Lindsay M.

ALICE: An Interpretable Neural Architecture for Generalization in Substitution Ciphers

arXiv.org Artificial IntelligenceSep-26-2025

To enhance interpretability, we introduce a novel bijective decoding head that explicitly models permutations via the Gumbel-Sinkhorn method, enabling direct extraction of learned cipher mappings. Our architectural innovations and analysis methods are applicable beyond cryptograms and offer new insights into neural network generalization and interpretability. A cryptogram is a type of puzzle in which text is encrypted using a substitution cipher, and the user's task is to recover the original plaintext by inferring the cipher used for the encryption. Users typically solve cryptograms based on prior knowledge about language letter frequency distributions and common words. Originally developed for real encryption purposes, they are now popular in newspapers and puzzle books for entertainment purposes due to their simplicity. This simplicity, however, provides a unique testbed for testing and understanding generalization and reasoning in neural networks. In a one-to-one monoalphabetic substitution cipher, each letter in a fixed alphabet is mapped to a unique substitute character; this cipher represents a bijective mapping over the alphabet. While other ciphers exist (e.g., Vigen ` ere cipher, Playfair cipher), we focus here on one-to-one monoalphabetic substitution ciphers, as the problem space is extremely large but remains structurally simple to interpret. We hereafter mean one-to-one monoalphabetic substitution cipher when we say "cipher", unless otherwise specified. More formally, let Σ be a finite alphabet of size V representing allowable characters (e.g., 26 for the English alphabet).

cipher, machine learning, natural language, (19 more...)

2509.07282

Country:

Europe (1.00)
North America > United States (0.93)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

arXiv.org Artificial IntelligenceSep-23-2025

Dorabella Cipher as Musical Inspiration

Hauer, Bradley, Choi, Colin, Hindle, Abram, Smallwood, Scott, Kondrak, Grzegorz

The Dorabella cipher is an encrypted note written by English composer Edward Elgar, which has defied decipherment attempts for more than a century. While most proposed solutions are English texts, we investigate the hypothesis that Dorabella represents enciphered music. We weigh the evidence for and against the hypothesis, devise a simplified music notation, and attempt to reconstruct a melody from the cipher. Our tools are n-gram models of music which we validate on existing music corpora enciphered using monoalphabetic substitution. By applying our methods to Dorabella, we produce a decipherment with musical qualities, which is then transformed via artful composition into a listenable melody. Far from arguing that the end result represents the only true solution, we instead frame the process of decipherment as part of the composition process.

cipher, machine learning, natural language, (18 more...)

2509.1795

Country: North America > Canada > Alberta (0.28)

Genre: Research Report (0.82)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language (0.73)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)