one-hot vector
5812f92450ccaf17275500841c70924a-Supplemental.pdf
We present a brief proof about the local optimality of one-hot encodings in the decision-theoretic framework presented in Section 3.2. We seek to prove that, under assumptions of an identity reward matrix, tokens constrained to a unit hypercube, and gaussian additive noise, one-hot tokens are an optimally robust communication strategy. We only seek to prove local optimality, as one many trivially generate multiple, equally optimal tokens by, for example, flipping all bits. The following derivation uses Karush-Kuhn-Tucker (KKT) conditions, a generalization of Lagrange multipliers [17]. We maximize the function, subject to constraints. T>j Ti Ti + ||Tj||2 Ti # ~µi + ~λi = ~0 (13) (14) We seek to show that one-hot vectors are an optimum, so we now show that one-hot vectors indeed respect the constraints and set the derivatives to zero.
Introducing Visual Scenes and Reasoning: A More Realistic Benchmark for Spoken Language Understanding
Wu, Di, Jiang, Liting, Fang, Ruiyu, Bianjing, null, Xie, Hongyan, Su, Haoxiang, Huang, Hao, He, Zhongjiang, Song, Shuangyong, Li, Xuelong
Spoken Language Understanding (SLU) consists of two sub-tasks: intent detection (ID) and slot filling (SF). Given its broad range of real-world applications, enhancing SLU for practical deployment is increasingly critical. Profile-based SLU addresses ambiguous user utterances by incorporating context awareness (CA), user profiles (UP), and knowledge graphs (KG) to support disambiguation, thereby advancing SLU research toward real-world applicability. However, existing SLU datasets still fall short in representing real-world scenarios. Specifically, (1) CA uses one-hot vectors for representation, which is overly idealized, and (2) models typically focuses solely on predicting intents and slot labels, neglecting the reasoning process that could enhance performance and interpretability. To overcome these limitations, we introduce VRSLU, a novel SLU dataset that integrates both Visual images and explicit Reasoning. For over-idealized CA, we use GPT-4o and FLUX.1-dev to generate images reflecting users' environments and statuses, followed by human verification to ensure quality. For reasoning, GPT-4o is employed to generate explanations for predicted labels, which are then refined by human annotators to ensure accuracy and coherence. Additionally, we propose an instructional template, LR-Instruct, which first predicts labels and then generates corresponding reasoning. This two-step approach helps mitigate the influence of reasoning bias on label prediction. Experimental results confirm the effectiveness of incorporating visual information and highlight the promise of explicit reasoning in advancing SLU.
In-Situ Tweedie Discrete Diffusion Models
Li, Xiao, Zhang, Jiaqi, Zhang, Shuxiang, Chen, Tianshui, Lin, Liang, Wang, Guangrun
While diffusion models excel at generating continuous data such as images, adapting them to discrete tasks has relied on indirect approaches that either operate in continuous embedding spaces or use token masking mechanisms, both of which deviate from modeling the true discrete data distribution that can be theoretically guaranteed by Tweedie's formula. We propose in-situ Tweedie Discrete Diffusion (TDD), a framework that performs diffusion guaranteed by Tweedie's formula directly within the discrete one-hot space, hence "in-situ." Unlike prior methods that diffuse continuous embeddings or mask tokens, TDD directly corrupts one-hot vectors with Gaussian noise and performs iterative denoising through a timestep-conditioned cross-entropy objective rather than mean-squared-error reconstruction. At each denoising step, the model predicts class probabilities, applies argmax to obtain discrete predictions, converts them to one-hot vectors, and feeds them into the next iteration with progressively reduced noise. This process naturally unifies discriminative classification and generative modeling under a single framework. Experiments demonstrate that TDD achieves strong performance on both image classification and text generation tasks, with extensive ablation studies confirming the effectiveness of each design component. Our work establishes a principled approach to discrete diffusion that preserves the core characteristics of diffusion models while operating natively in discrete space.
Traveling Salesman-Based Token Ordering Improves Stability in Homomorphically Encrypted Language Models
Rho, Donghwan, Seo, Sieun, Sung, Hyewon, Min, Chohong, Ryu, Ernest K.
As users increasingly interact with large language models (LLMs) using private information, secure and encrypted communication becomes essential. Homomorphic encryption (HE) provides a principled solution by enabling computation directly on encrypted data. Although prior work has explored aspects of running LLMs under HE, the challenge of text generation, particularly next-token prediction, has received limited attention and remains a key obstacle to practical encrypted interaction. In this work, we propose a TSP-based token reordering strategy to address the difficulties of encrypted text generation, together with a post-processing step that further reduces approximation error. Theoretical analysis and experimental results demonstrate that our method prevents collapse, improves coherence in generated text, and preserves data privacy throughout. Overall, our contributions advance the feasibility of practical and privacy-preserving LLM inference.