A Theory of Unsupervised Translation Motivated by Understanding Animal Communication
Neural networks are capable of translating between languages--in some cases even between two languages where there is little or no access to parallel translations, in what is known as Unsupervised Machine Translation (UMT). Given this progress, it is intriguing to ask whether machine learning tools can ultimately enable understanding animal communication, particularly that of highly intelligent animals. We propose a theoretical framework for analyzing UMT when no parallel translations are available and when it cannot be assumed that the source and target corpora address related subject domains or posses similar linguistic structure. We exemplify this theory with two stylized models of language, for which our framework provides bounds on necessary sample complexity; the bounds are formally proven and experimentally verified on synthetic data. These bounds show that the error rates are inversely related to the language complexity and amount of common ground. This suggests that unsupervised translation of animal communication may be feasible if the communication system is sufficiently complex.
Time Series Kernels based on Nonlinear Vector AutoRegressive Delay Embeddings
Kernel design is a pivotal but challenging aspect of time series analysis, especially in the context of small datasets. In recent years, Reservoir Computing (RC) has emerged as a powerful tool to compare time series based on the underlying dynamics of the generating process rather than the observed data. However, the performance of RC highly depends on the hyperparameter setting, which is hard to interpret and costly to optimize because of the recurrent nature of RC. Here, we present a new kernel for time series based on the recently established equivalence between reservoir dynamics and Nonlinear Vector AutoRegressive (NVAR) processes. The kernel is non-recurrent and depends on a small set of meaningful hyperparameters, for which we suggest an effective heuristic. We demonstrate excellent performance on a wide range of real-world classification tasks, both in terms of accuracy and speed. This further advances the understanding of RC representation learning models and extends the typical use of the NVAR framework to kernel design and representation of real-world time series data.
Estimating and Controlling for Equalized Odds via Sensitive Attribute Predictors
As the use of machine learning models in real world high-stakes decision settings continues to grow, it is highly important that we are able to audit and control for any potential fairness violations these models may exhibit towards certain groups. To do so, one naturally requires access to sensitive attributes, such as demographics, biological sex, or other potentially sensitive features that determine group membership. Unfortunately, in many settings, this information is often unavailable. In this work we study the well known equalized odds (EOD) definition of fairness. In a setting without sensitive attributes, we first provide tight and computable upper bounds for the EOD violation of a predictor. These bounds precisely reflect the worst possible EOD violation. Second, we demonstrate how one can provably control the worst-case EOD by a new post-processing correction method. Our results characterize when directly controlling for EOD with respect to the predicted sensitive attributes is - and when is not - optimal when it comes to controlling worst-case EOD. Our results hold under assumptions that are milder than previous works, and we illustrate these results with experiments on synthetic and real datasets.
Supplementary Material StreamNet: Memory-Efficient Streaming Tiny Deep Learning Inference on the Microcontroller
Figure 1: The system architecture of StreamNet TensorFlow Lite for Microcontrollers (TFLM) (1) tailors for the TinyML applications and adopts the interpreter-based approach to make cross-platform interoperability in the embedded system possible. However, TFLM's interpreter increases the performance overhead of the TinyML applications on MCUs. Unlike TFLM, StreamNet and MCUNetv2 replace the interpreter with a code generator. StreamNet is built on top of MCUNetv2 (2; 3) and adds the feature of the 1D and 2D stream processing (4; 5; 6; 7; 8; 9; 10). The code generator of StreamNet produces kernel implementations with fixed parameters at the compile time.
eb1e78328c46506b46a4ac4a1e378b91-AuthorFeedback.pdf
Thank you for your time in reading the paper and the positive feedback! Below are responses for each reviewer. Thank you for your detailed reading of the paper and positive feedback! With even more runs, we expect this distinction will be even clearer. This is the core of the transfer learning question, and a central part of our paper.
Language Model Tokenizers Introduce Unfairness Between Languages
Recent language models have shown impressive multilingual performance, even when not explicitly trained for it. Despite this, there are concerns about the quality of their outputs across different languages. In this paper, we show how disparity in the treatment of different languages arises at the tokenization stage, well before a model is even invoked. The same text translated into different languages can have drastically different tokenization lengths, with differences up to 15 times in some cases. These disparities persist even for tokenizers that are intentionally trained for multilingual support.