AITopics

Technology: Information Technology > Artificial Intelligence (0.42)

Neural Information Processing SystemsNov-21-2025, 16:22:26 GMT

Style Transfer from Non-Parallel Text by Cross-Alignment

This paper focuses on style transfer on the basis of non-parallel text. This is an instance of a broad family of problems including machine translation, decipherment, and sentiment modification. The key challenge is to separate the content from other aspects such as style. We assume a shared latent content distribution across different text corpora, and propose a method that leverages refined alignment of latent representations to perform style transfer. The transferred sentences from one style should match example sentences from the other style as a population. We demonstrate the effectiveness of this cross-alignment method on three tasks: sentiment modification, decipherment of word substitution ciphers, and recovery of word order.

name change, non-parallel text, style transfer, (3 more...)

Technology: Information Technology > Artificial Intelligence (0.42)

Neural Information Processing SystemsNov-21-2025, 06:24:06 GMT

Style Transfer from Non-Parallel Text by Cross-Alignment

Tianxiao Shen, Tao Lei, Regina Barzilay, Tommi Jaakkola

This paper focuses on style transfer on the basis of non-parallel text.

arxiv preprint arxiv, machine learning, natural language, (17 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.96)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)

arXiv.org Artificial IntelligenceSep-23-2025

Dorabella Cipher as Musical Inspiration

Hauer, Bradley, Choi, Colin, Hindle, Abram, Smallwood, Scott, Kondrak, Grzegorz

The Dorabella cipher is an encrypted note written by English composer Edward Elgar, which has defied decipherment attempts for more than a century. While most proposed solutions are English texts, we investigate the hypothesis that Dorabella represents enciphered music. We weigh the evidence for and against the hypothesis, devise a simplified music notation, and attempt to reconstruct a melody from the cipher. Our tools are n-gram models of music which we validate on existing music corpora enciphered using monoalphabetic substitution. By applying our methods to Dorabella, we produce a decipherment with musical qualities, which is then transformed via artful composition into a listenable melody. Far from arguing that the end result represents the only true solution, we instead frame the process of decipherment as part of the composition process.

cipher, machine learning, natural language, (18 more...)

2509.1795

Country: North America > Canada > Alberta (0.28)

Genre: Research Report (0.82)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language (0.73)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)

Neural Information Processing SystemsOct-3-2024, 01:19:42 GMT

Style Transfer from Non-Parallel Text by Cross-Alignment

Tianxiao Shen, Tao Lei, Regina Barzilay, Tommi Jaakkola

This paper focuses on style transfer on the basis of non-parallel text. This is an instance of a broad family of problems including machine translation, decipherment, and sentiment modification. The key challenge is to separate the content from other aspects such as style. We assume a shared latent content distribution across different text corpora, and propose a method that leverages refined alignment of latent representations to perform style transfer. The transferred sentences from one style should match example sentences from the other style as a population. We demonstrate the effectiveness of this cross-alignment method on three tasks: sentiment modification, decipherment of word substitution ciphers, and recovery of word order.

arxiv preprint arxiv, evaluation, style transfer, (15 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.96)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.89)

arXiv.org Artificial IntelligenceSep-3-2024

Determination of language families using deep learning

Lerner, Peter B.

Deep learning currently is used in LLMs (Large Language Models), for image identification, creation of deepfakes and analyses of astrophysical and financial information (Krizhevsky, 2012), (Sutskever, 2014), (Vaswani, 2017), (Wang, 2015), (George, 2018), (Li, 2010). When the instruments of deep learning became widely available, it was decided that the decipherment of all dead languages was only a matter of time (see (Xusen, 2019) and op.

correlation, cypro-minoan, fingerprint, (16 more...)

2409.02393

Country: Asia > Philippines (0.04)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJun-11-2024

Decipherment-Aware Multilingual Learning in Jointly Trained Language Models

Lee, Grandee

The principle that governs unsupervised multilingual learning (UCL) in jointly trained language models (mBERT as a popular example) is still being debated. Many find it surprising that one can achieve UCL with multiple monolingual corpora. In this work, we anchor UCL in the context of language decipherment and show that the joint training methodology is a decipherment process pivotal for UCL. In a controlled setting, we investigate the effect of different decipherment settings on the multilingual learning performance and consolidate the existing opinions on the contributing factors to multilinguality. From an information-theoretic perspective we draw a limit to the UCL performance and demonstrate the importance of token alignment in challenging decipherment settings caused by differences in the data domain, language order and tokenization granularity. Lastly, we apply lexical alignment to mBERT and investigate the contribution of aligning different lexicon groups to downstream performance.

computational linguistic, linguistics, proceedings, (15 more...)

2406.07231

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(20 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

arXiv.org Artificial IntelligenceJun-2-2024

Deciphering Oracle Bone Language with Diffusion Models

Guan, Haisu, Yang, Huanxin, Wang, Xinyu, Han, Shengwei, Liu, Yongge, Jin, Lianwen, Bai, Xiang, Liu, Yuliang

Originating from China's Shang Dynasty approximately 3,000 years ago, the Oracle Bone Script (OBS) is a cornerstone in the annals of linguistic history, predating many established writing systems. Despite the discovery of thousands of inscriptions, a vast expanse of OBS remains undeciphered, casting a veil of mystery over this ancient language. The emergence of modern AI technologies presents a novel frontier for OBS decipherment, challenging traditional NLP methods that rely heavily on large textual corpora, a luxury not afforded by historical languages. This paper introduces a novel approach by adopting image generation techniques, specifically through the development of Oracle Bone Script Decipher (OBSD). Utilizing a conditional diffusion-based strategy, OBSD generates vital clues for decipherment, charting a new course for AI-assisted analysis of ancient languages. To validate its efficacy, extensive experiments were conducted on an oracle bone script dataset, with quantitative results demonstrating the effectiveness of OBSD. Code and decipherment results will be made available at https://github.com/guanhaisu/OBSD.

chinese character, decipherment, diffusion model, (17 more...)

2406.00684

Country: Asia > China (0.25)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

#artificialintelligenceNov-4-2020, 06:45:12 GMT

Translating lost languages using machine learning

Recent research suggests that most languages that have ever existed are no longer spoken. Dozens of these dead languages are also considered to be lost, or "undeciphered" -- that is, we don't know enough about their grammar, vocabulary, or syntax to be able to actually understand their texts. Lost languages are more than a mere academic curiosity; without them, we miss an entire body of knowledge about the people who spoke them. Unfortunately, most of them have such minimal records that scientists can't decipher them by using machine-translation algorithms like Google Translate. Some don't have a well-researched "relative" language to be compared to, and often lack traditional dividers like white space and punctuation.

algorithm, artificial intelligence, natural language, (11 more...)

#artificialintelligence

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.40)

Genre: Research Report > New Finding (0.71)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.56)

#artificialintelligenceOct-31-2020, 10:05:26 GMT

Translating Lost Languages Using Machine Learning - Liwaiwai

Recent research suggests that most languages that have ever existed are no longer spoken. Dozens of these dead languages are also considered to be lost, or "undeciphered" -- that is, we don't know enough about their grammar, vocabulary, or syntax to be able to actually understand their texts. Lost languages are more than a mere academic curiosity; without them, we miss an entire body of knowledge about the people who spoke them. Unfortunately, most of them have such minimal records that scientists can't decipher them by using machine-translation algorithms like Google Translate. Some don't have a well-researched "relative" language to be compared to, and often lack traditional dividers like white space and punctuation.

algorithm, artificial intelligence, natural language, (13 more...)

#artificialintelligence

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)

Genre: Research Report > New Finding (0.71)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.56)