unsupervised translation
Unsupervised Translation of Programming Languages
A transcompiler, also known as source-to-source translator, is a system that converts source code from a high-level programming language (such as C++ or Python) to another. Transcompilers are primarily used for interoperability, and to port codebases written in an obsolete or deprecated language (e.g.
Review for NeurIPS paper: Unsupervised Translation of Programming Languages
Additional Feedback: - Would have been great to see more discussion on what are the challenges of translation for programming languages vs. natural languages. What challenges are unique to source code? Just off the top of my head: APIs and libraries variable names compositionality purity / side effects scoping rules types - The problem of programming language translation is rarely the syntax of the language itself, but rather the semantic differences between the languages, and even worse, between their underlying libraries. I had hoped for a more elaborate discussion of this point. Looks like name and signature of the method would be critical for the translation to be successful?
Review for NeurIPS paper: Unsupervised Translation of Programming Languages
Reviewers agree that this paper is a significant advance in the problem of language translation. One lingering concern is with the positioning of the paper. In particular, the introduction needs to do a better job in recognizing that this paper focuses on small self-contained units of code. In order to be useful in a software engineering context, a translation tool would have to address a number of problems that are not addressed by this work, such as major differences in the design patterns used by APIs in different languages. Without a proper acknowledgment of the limitations of the approach early in the paper, this paper could make it difficult to publish follow-up work.
Unsupervised Translation of Programming Languages
A transcompiler, also known as source-to-source translator, is a system that converts source code from a high-level programming language (such as C or Python) to another. Transcompilers are primarily used for interoperability, and to port codebases written in an obsolete or deprecated language (e.g. They typically rely on handcrafted rewrite rules, applied to the source code abstract syntax tree. Unfortunately, the resulting translations often lack readability, fail to respect the target language conventions, and require manual modifications in order to work properly. The overall translation process is time-consuming and requires expertise in both the source and target languages, making code-translation projects expensive.
Towards Unsupervised Representation Learning: Learning, Evaluating and Transferring Visual Representations
Unsupervised representation learning aims at finding methods that learn representations from data without annotation-based signals. Abstaining from annotations not only leads to economic benefits but may - and to some extent already does - result in advantages regarding the representation's structure, robustness, and generalizability to different tasks. In the long run, unsupervised methods are expected to surpass their supervised counterparts due to the reduction of human intervention and the inherently more general setup that does not bias the optimization towards an objective originating from specific annotation-based signals. While major advantages of unsupervised representation learning have been recently observed in natural language processing, supervised methods still dominate in vision domains for most tasks. In this dissertation, we contribute to the field of unsupervised (visual) representation learning from three perspectives: (i) Learning representations: We design unsupervised, backpropagation-free Convolutional Self-Organizing Neural Networks (CSNNs) that utilize self-organization-and Hebbian-based learning rules to learn convolutional kernels and masks to achieve deeper backpropagation-free models. Thereby, we observe that backpropagation-based and -free methods can suffer from an objective function mismatch between the unsupervised pretext task and the target task. This mismatch can lead to performance decreases for the target task.
- North America > United States > California > Los Angeles County > Los Angeles (0.13)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (10 more...)
- Overview (1.00)
- Research Report > New Finding (0.92)
- Law (1.00)
- Education (1.00)
- Transportation > Ground > Road (0.92)
- (4 more...)
Unsupervised Machine Translation On Dravidian Languages
Koneru, Sai, Liu, Danni, Niehues, Jan
Unsupervised neural machine translation (UNMT) is beneficial especially for low resource languages such as those from the Dravidian family. However, UNMT systems tend to fail in realistic scenarios involving actual low resource languages. Recent works propose to utilize auxiliary parallel data and have achieved state-of-the-art results. In this work, we focus on unsupervised translation between English and Kannada, a low resource Dravidian language. We additionally utilize a limited amount of auxiliary data between English and other related Dravidian languages. We show that unifying the writing systems is essential in unsupervised translation between the Dravidian languages. We explore several model architectures that use the auxiliary data in order to maximize knowledge sharing and enable UNMT for distant language pairs. Our experiments demonstrate that it is crucial to include auxiliary languages that are similar to our focal language, Kannada. Furthermore, we propose a metric to measure language similarity and show that it serves as a good indicator for selecting the auxiliary languages.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Europe > Germany > Berlin (0.04)
- (10 more...)
Unsupervised Pivot Translation for Distant Languages
Leng, Yichong, Tan, Xu, Qin, Tao, Li, Xiang-Yang, Liu, Tie-Yan
Unsupervised neural machine translation (NMT) has attracted a lot of attention recently. While state-of-the-art methods for unsupervised translation usually perform well between similar languages (e.g., English-German translation), they perform poorly between distant languages, because unsupervised alignment does not work well for distant languages. In this work, we introduce unsupervised pivot translation for distant languages, which translates a language to a distant language through multiple hops, and the unsupervised translation on each hop is relatively easier than the original direct translation. We propose a learning to route (LTR) method to choose the translation path between the source and target languages. LTR is trained on language pairs whose best translation path is available and is applied on the unseen language pairs for path selection. Experiments on 20 languages and 294 distant language pairs demonstrate the advantages of the unsupervised pivot translation for distant languages, as well as the effectiveness of the proposed LTR for path selection. Specifically, in the best case, LTR achieves an improvement of 5.58 BLEU points over the conventional direct unsupervised method.
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Texas > Dallas County > Dallas (0.04)
- Europe > Germany > Berlin (0.04)
- (2 more...)
Unsupervised Paraphrasing without Translation
Paraphrasing exemplifies the ability to abstract semantic content from surface forms. Recent work on automatic paraphrasing is dominated by methods leveraging Machine Translation (MT) as an intermediate step. This contrasts with humans, who can paraphrase without being bilingual. This work proposes to learn paraphrasing models from an unlabeled monolingual corpus only. To that end, we propose a residual variant of vector-quantized variational auto-encoder. We compare with MT-based approaches on paraphrase identification, generation, and training augmentation. Monolingual paraphrasing outperforms unsupervised translation in all settings. Comparisons with supervised translation are more mixed: monolingual paraphrasing is interesting for identification and augmentation; supervised translation is superior for generation.
- Asia > North Korea (0.28)
- North America > United States > Pennsylvania (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)