Goto

Collaborating Authors

 semantic variable


On the Value of Cross-Modal Misalignment in Multimodal Representation Learning

arXiv.org Artificial Intelligence

Multimodal representation learning, exemplified by multimodal contrastive learning (MMCL) using image-text pairs, aims to learn powerful representations by aligning cues across modalities. This approach relies on the core assumption that the exemplar image-text pairs constitute two representations of an identical concept. However, recent research has revealed that real-world datasets often exhibit cross-modal misalignment. There are two distinct viewpoints on how to address this issue: one suggests mitigating the misalignment, and the other leveraging it. We seek here to reconcile these seemingly opposing perspectives, and to provide a practical guide for practitioners. Using latent variable models we thus formalize cross-modal misalignment by introducing two specific mechanisms: Selection bias, where some semantic variables are absent in the text, and perturbation bias, where semantic variables are altered -- both leading to misalignment in data pairs. Our theoretical analysis demonstrates that, under mild assumptions, the representations learned by MMCL capture exactly the information related to the subset of the semantic variables invariant to selection and perturbation biases. This provides a unified perspective for understanding misalignment. Based on this, we further offer actionable insights into how misalignment should inform the design of real-world ML systems. We validate our theoretical findings via extensive empirical studies on both synthetic data and real image-text datasets, shedding light on the nuanced impact of cross-modal misalignment on multimodal representation learning.


Parrot: Efficient Serving of LLM-based Applications with Semantic Variable

arXiv.org Artificial Intelligence

The rise of large language models (LLMs) has enabled LLM-based applications (a.k.a. AI agents or co-pilots), a new software paradigm that combines the strength of LLM and conventional software. Diverse LLM applications from different tenants could design complex workflows using multiple LLM requests to accomplish one task. However, they have to use the over-simplified request-level API provided by today's public LLM services, losing essential application-level information. Public LLM services have to blindly optimize individual LLM requests, leading to sub-optimal end-to-end performance of LLM applications. This paper introduces Parrot, an LLM service system that focuses on the end-to-end experience of LLM-based applications. Parrot proposes Semantic Variable, a unified abstraction to expose application-level knowledge to public LLM services. A Semantic Variable annotates an input/output variable in the prompt of a request, and creates the data pipeline when connecting multiple LLM requests, providing a natural way to program LLM applications. Exposing Semantic Variables to the public LLM service allows it to perform conventional data flow analysis to uncover the correlation across multiple LLM requests. This correlation opens a brand-new optimization space for the end-to-end performance of LLM-based applications. Extensive evaluations demonstrate that Parrot can achieve up to an order-of-magnitude improvement for popular and practical use cases of LLM applications.


Semantic Communication for Cooperative Multi-Task Processing over Wireless Networks

arXiv.org Artificial Intelligence

The same authors in [9] studied distributed relevant Applications involving machine-to-machine or human-tomachine information encoding for collaborative feature extraction to communications often have to prioritize task execution fulfill a task, leveraging distributed IB. Moreover, [10] offered over the exact reconstruction of transmitted information a framework for collaborative retrieval of the message using at the receiver. Unlike the traditional information multiple received semantic information and also expanded it theory established by Shannon, which emphasizes the accurate using reinforcement learning in [11]. To consider some physical transmission and reception of bits, the design of communication layer communication aspects, [12] contributed to resource systems for these applications takes a distinct approach, allocation in a multi-user system according to the single task drawing attention to task performance rather than fidelity in accuracy, channel conditions, and computing requests.


Semantic Latent Decomposition with Normalizing Flows for Face Editing

arXiv.org Artificial Intelligence

Navigating in the latent space of StyleGAN has shown effectiveness for face editing. However, the resulting methods usually encounter challenges in complicated navigation due to the entanglement among different attributes in the latent space. To address this issue, this paper proposes a novel framework, termed SDFlow, with a semantic decomposition in original latent space using continuous conditional normalizing flows. Specifically, SDFlow decomposes the original latent code into different irrelevant variables by jointly optimizing two components: (i) a semantic encoder to estimate semantic variables from input faces and (ii) a flow-based transformation module to map the latent code into a semantic-irrelevant variable in Gaussian distribution, conditioned on the learned semantic variables. To eliminate the entanglement between variables, we employ a disentangled learning strategy under a mutual information framework, thereby providing precise manipulation controls. Experimental results demonstrate that SDFlow outperforms existing state-of-the-art face editing methods both qualitatively and quantitatively. The source code is made available at https://github.com/phil329/SDFlow.