caesar
This is the most underrated sci-fi film franchise of the 21st century
AS A sci-fi fan, you learn not to dwell on the films that could have been. Whether it's Alejandro Jodorowsky's unmade Dune, Guillermo del Toro's cancelled take on At the Mountains of Madness, or the versions of Return of the Jedi that Davids Lynch and Cronenberg could have made, it's best not to torture yourself over cinematic what-ifs. That's why I had given up hope of there being a new instalment of the most underrated sci-fi film franchise of the 21st century so far. Though well received by critics and audiences alike, none of the four films have won Oscars or seem to have made much of an impact on pop culture. But then, earlier this month, we got confirmation that a fifth movie was on the way.
CAESAR: An Embodied Simulator for Generating Multimodal Referring Expression Datasets
Humans naturally use verbal utterances and nonverbal gestures to refer to various objects (known as $\textit{referring expressions}$) in different interactional scenarios. As collecting real human interaction datasets are costly and laborious, synthetic datasets are often used to train models to unambiguously detect relationships among objects. However, existing synthetic data generation tools that provide referring expressions generally neglect nonverbal gestures. Additionally, while a few small-scale datasets contain multimodal cues (verbal and nonverbal), these datasets only capture the nonverbal gestures from an exo-centric perspective (observer). As models can use complementary information from multimodal cues to recognize referring expressions, generating multimodal data from multiple views can help to develop robust models.
Is a robot programmed to prank you annoying? Yes
Is a robot programmed to prank you annoying? Feedback discovers a robot that can mimic Turkish ice cream vendors, who are known for playing tricks on their customers. Researchers concluded that customers, perhaps predictably, don't trust it Feedback is a grumpy sort, so we run a mile when faced with any kind of enforced fun. It is possible, therefore, that we would struggle to buy an ice cream in Turkey, because doing so requires enjoying, or at least tolerating, an extended prank. Turkish ice cream vendors are prone to playing tricks on their customers, like handing them a cone full of ice cream only to whisk it out of their grasp using sleight of hand.
Watermarking Diffusion Language Models
Gloaguen, Thibaud, Staab, Robin, Jovanoviฤ, Nikola, Vechev, Martin
We introduce the first watermark tailored for diffusion language models (DLMs), an emergent LLM paradigm able to generate tokens in arbitrary order, in contrast to standard autoregressive language models (ARLMs) which generate tokens sequentially. While there has been much work in ARLM watermarking, a key challenge when attempting to apply these schemes directly to the DLM setting is that they rely on previously generated tokens, which are not always available with DLM generation. In this work we address this challenge by: (i) applying the watermark in expectation over the context even when some context tokens are yet to be determined, and (ii) promoting tokens which increase the watermark strength when used as context for other tokens. This is accomplished while keeping the watermark detector unchanged. Our experimental evaluation demonstrates that the DLM watermark leads to a >99% true positive rate with minimal quality impact and achieves similar robustness to existing ARLM watermarks, enabling for the first time reliable DLM watermarking.
Defense against Prompt Injection Attacks via Mixture of Encodings
Zhang, Ruiyi, Sullivan, David, Jackson, Kyle, Xie, Pengtao, Chen, Mei
Large Language Models (LLMs) have emerged as a dominant approach for a wide range of NLP tasks, with their access to external information further enhancing their capabilities. However, this introduces new vulnerabilities, known as prompt injection attacks, where external content embeds malicious instructions that manipulate the LLM's output. Recently, the Base64 defense has been recognized as one of the most effective methods for reducing success rate of prompt injection attacks. Despite its efficacy, this method can degrade LLM performance on certain NLP tasks. To address this challenge, we propose a novel defense mechanism: mixture of encodings, which utilizes multiple character encodings, including Base64. Extensive experimental results show that our method achieves one of the lowest attack success rates under prompt injection attacks, while maintaining high performance across all NLP tasks, outperforming existing character encoding-based defense methods. This underscores the effectiveness of our mixture of encodings strategy for both safety and task performance metrics.
CAESAR: An Embodied Simulator for Generating Multimodal Referring Expression Datasets
Humans naturally use verbal utterances and nonverbal gestures to refer to various objects (known as \textit{referring expressions}) in different interactional scenarios. As collecting real human interaction datasets are costly and laborious, synthetic datasets are often used to train models to unambiguously detect relationships among objects. However, existing synthetic data generation tools that provide referring expressions generally neglect nonverbal gestures. Additionally, while a few small-scale datasets contain multimodal cues (verbal and nonverbal), these datasets only capture the nonverbal gestures from an exo-centric perspective (observer). As models can use complementary information from multimodal cues to recognize referring expressions, generating multimodal data from multiple views can help to develop robust models.
Caesar: A Low-deviation Compression Approach for Efficient Federated Learning
Yan, Jiaming, Liu, Jianchun, Xu, Hongli, Huang, Liusheng, Gong, Jiantao, Liu, Xudong, Hou, Kun
Compression is an efficient way to relieve the tremendous communication overhead of federated learning (FL) systems. However, for the existing works, the information loss under compression will lead to unexpected model/gradient deviation for the FL training, significantly degrading the training performance, especially under the challenges of data heterogeneity and model obsolescence. To strike a delicate trade-off between model accuracy and traffic cost, we propose Caesar, a novel FL framework with a low-deviation compression approach. For the global model download, we design a greedy method to optimize the compression ratio for each device based on the staleness of the local model, ensuring a precise initial model for local training. Regarding the local gradient upload, we utilize the device's local data properties (\ie, sample volume and label distribution) to quantify its local gradient's importance, which then guides the determination of the gradient compression ratio. Besides, with the fine-grained batch size optimization, Caesar can significantly diminish the devices' idle waiting time under the synchronized barrier. We have implemented Caesar on two physical platforms with 40 smartphones and 80 NVIDIA Jetson devices. Extensive results show that Caesar can reduce the traffic costs by about 25.54%$\thicksim$37.88% compared to the compression-based baselines with the same target accuracy, while incurring only a 0.68% degradation in final test accuracy relative to the full-precision communication.
Multiple-policy Evaluation via Density Estimation
Chen, Yilei, Pacchiano, Aldo, Paschalidis, Ioannis Ch.
We study the multiple-policy evaluation problem where we are given a set of $K$ policies and the goal is to evaluate their performance (expected total reward over a fixed horizon) to an accuracy $\epsilon$ with probability at least $1-\delta$. We propose an algorithm named $\mathrm{CAESAR}$ for this problem. Our approach is based on computing an approximate optimal offline sampling distribution and using the data sampled from it to perform the simultaneous estimation of the policy values. $\mathrm{CAESAR}$ has two phases. In the first we produce coarse estimates of the visitation distributions of the target policies at a low order sample complexity rate that scales with $\tilde{O}(\frac{1}{\epsilon})$. In the second phase, we approximate the optimal offline sampling distribution and compute the importance weighting ratios for all target policies by minimizing a step-wise quadratic loss function inspired by the DualDICE \cite{nachum2019dualdice} objective. Up to low order and logarithmic terms $\mathrm{CAESAR}$ achieves a sample complexity $\tilde{O}\left(\frac{H^4}{\epsilon^2}\sum_{h=1}^H\max_{k\in[K]}\sum_{s,a}\frac{(d_h^{\pi^k}(s,a))^2}{\mu^*_h(s,a)}\right)$, where $d^{\pi}$ is the visitation distribution of policy $\pi$, $\mu^*$ is the optimal sampling distribution, and $H$ is the horizon.