rpe
Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding
The attention module, which is a crucial component in Transformer, cannot scale efficiently to long sequences due to its quadratic complexity. Many works focus on approximating the dot-then-exponentiate softmax function in the original attention, leading to sub-quadratic or even linear-complexity Transformer architectures. However, we show that these methods cannot be applied to more powerful attention modules that go beyond the dot-then-exponentiate style, e.g., Transformers with relative positional encoding (RPE). Since in many state-of-the-art models, relative positional encoding is used as default, designing efficient Transformers that can incorporate RPE is appealing. In this paper, we propose a novel way to accelerate attention calculation for Transformers with RPE on top of the kernelized attention.
Softmax Transformers are Turing-Complete
Jiang, Hongjian, Hahn, Michael, Zetzsche, Georg, Lin, Anthony Widjaja
Hard attention Chain-of-Thought (CoT) transformers are known to be Turing-complete. However, it is an open problem whether softmax attention Chain-of-Thought (CoT) transformers are Turing-complete. In this paper, we prove a stronger result that length-generalizable softmax CoT transformers are Turing-complete. More precisely, our Turing-completeness proof goes via the CoT extension of the Counting RASP (C-RASP), which correspond to softmax CoT transformers that admit length generalization. We prove Turing-completeness for CoT C-RASP with causal masking over a unary alphabet (more generally, for letter-bounded languages). While we show this is not Turing-complete for arbitrary languages, we prove that its extension with relative positional encoding is Turing-complete for arbitrary languages. We empirically validate our theory by training transformers for languages requiring complex (non-linear) arithmetic reasoning.
Monotone Classification with Relative Approximations
In monotone classification, the input is a multi-set $P$ of points in $\mathbb{R}^d$, each associated with a hidden label from $\{-1, 1\}$. The goal is to identify a monotone function $h$, which acts as a classifier, mapping from $\mathbb{R}^d$ to $\{-1, 1\}$ with a small {\em error}, measured as the number of points $p \in P$ whose labels differ from the function values $h(p)$. The cost of an algorithm is defined as the number of points having their labels revealed. This article presents the first study on the lowest cost required to find a monotone classifier whose error is at most $(1 + ε) \cdot k^*$ where $ε\ge 0$ and $k^*$ is the minimum error achieved by an optimal monotone classifier -- in other words, the error is allowed to exceed the optimal by at most a relative factor. Nearly matching upper and lower bounds are presented for the full range of $ε$. All previous work on the problem can only achieve an error higher than the optimal by an absolute factor.
RHealthTwin: Towards Responsible and Multimodal Digital Twins for Personalized Well-being
Ferdousi, Rahatara, Hossain, M Anwar
The rise of large language models (LLMs) has created new possibilities for digital twins in healthcare. However, the deployment of such systems in consumer health contexts raises significant concerns related to hallucination, bias, lack of transparency, and ethical misuse. In response to recommendations from health authorities such as the World Health Organization (WHO), we propose Responsible Health Twin (RHealthTwin), a principled framework for building and governing AI-powered digital twins for well-being assistance. RHealthTwin processes multimodal inputs that guide a health-focused LLM to produce safe, relevant, and explainable responses. At the core of RHealthTwin is the Responsible Prompt Engine (RPE), which addresses the limitations of traditional LLM configuration. Conventionally, users input unstructured prompt and the system instruction to configure the LLM, which increases the risk of hallucination. In contrast, RPE extracts predefined slots dynamically to structure both inputs. This guides the language model to generate responses that are context aware, personalized, fair, reliable, and explainable for well-being assistance. The framework further adapts over time through a feedback loop that updates the prompt structure based on user satisfaction. We evaluate RHealthTwin across four consumer health domains including mental support, symptom triage, nutrition planning, and activity coaching. RPE achieves state-of-the-art results with BLEU = 0.41, ROUGE-L = 0.63, and BERTScore = 0.89 on benchmark datasets. Also, we achieve over 90% in ethical compliance and instruction-following metrics using LLM-as-judge evaluation, outperforming baseline strategies. We envision RHealthTwin as a forward-looking foundation for responsible LLM-based applications in health and well-being.