Goto

Collaborating Authors

 kitty



Kitty: Accurate and Efficient 2-bit KV Cache Quantization with Dynamic Channel-wise Precision Boost

Xia, Haojun, Wu, Xiaoxia, Li, Jisen, Wu, Robert, Wang, Junxiong, Wang, Jue, Li, Chenxi, Singhal, Aman, Shah, Alay Dilipbhai, Ariyak, Alpay, Zhuang, Donglin, Zhou, Zhongzhu, Athiwaratkun, Ben, Zheng, Zhen, Song, Shuaiwen Leon

arXiv.org Artificial Intelligence

The KV cache is a dominant memory bottleneck for LLM inference. While 4-bit KV quantization preserves accuracy, 2-bit often degrades it, especially on long-context reasoning. We close this gap via an algorithm-system co-design for mixed-precision KV caching: Kitty. On the algorithm side, extensive experiments show that Dynamic Channel-wise Precision Boost -- which ranks Key-cache channels by sensitivity and keeps only a small fraction at higher precision -- maintains near-zero loss in accuracy drop while approaching 2-bit memory. The main challenge is handling dynamic 4-bit channel boosts while keeping the page layout coalesced and the dequantization uniform, with no scattered reads or hard-coded masks. Kitty addresses these issues by decompose each mixed-precision Key page into two tensors with unified 2-bit precision. Based on this, Kitty provides a page-centric KV layout, Triton-compatible page dequantization kernels, and a lightweight runtime pipeline that preserves coalescing and avoids divergence. Across seven tasks and two model families (Qwen3, LLaMA3), Kitty cuts KV memory by nearly 8x with negligible accuracy loss, enabling up to 8x larger batches and 2.1x-4.1x higher throughput under the same memory budget. We release the full implementation of Kitty at https://github.com/Summer-Summer/Kitty.



Amplified Vulnerabilities: Structured Jailbreak Attacks on LLM-based Multi-Agent Debate

Qi, Senmao, Zou, Yifei, Li, Peng, Lin, Ziyi, Cheng, Xiuzhen, Yu, Dongxiao

arXiv.org Artificial Intelligence

Multi-Agent Debate (MAD), leveraging collaborative interactions among Large Language Models (LLMs), aim to enhance reasoning capabilities in complex tasks. However, the security implications of their iterative dialogues and role-playing characteristics, particularly susceptibility to jailbreak attacks eliciting harmful content, remain critically underexplored. This paper systematically investigates the jailbreak vulnerabilities of four prominent MAD frameworks built upon leading commercial LLMs (GPT-4o, GPT-4, GPT-3.5-turbo, and DeepSeek) without compromising internal agents. We introduce a novel structured prompt-rewriting framework specifically designed to exploit MAD dynamics via narrative encapsulation, role-driven escalation, iterative refinement, and rhetorical obfuscation. Our extensive experiments demonstrate that MAD systems are inherently more vulnerable than single-agent setups. Crucially, our proposed attack methodology significantly amplifies this fragility, increasing average harmfulness from 28.14% to 80.34% and achieving attack success rates as high as 80% in certain scenarios. These findings reveal intrinsic vulnerabilities in MAD architectures and underscore the urgent need for robust, specialized defenses prior to real-world deployment.


How Giant Robot Captured Asian America

The New Yorker

The first issue of the magazine Giant Robot I ever came across featured the Hong Kong actor Tony Leung Chiu-wai on the cover--this was enough to stand out on a crowded newsstand in the mid-nineteen-nineties. But what caught my attention were the teasers for a random assortment of other stories, about gangs, surfing, shaved ice, orgies. But who was I? I was a teen-ager and desperate to know. I suspected Giant Robot could help me figure it out. For anyone under the age of forty, this level of impressionability might sound a bit silly.


MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts

Gu, Tianle, Huang, Kexin, Luo, Ruilin, Yao, Yuanqi, Yang, Yujiu, Teng, Yan, Wang, Yingchun

arXiv.org Artificial Intelligence

Large Language Models (LLMs) can memorize sensitive information, raising concerns about potential misuse. LLM Unlearning, a post-hoc approach to remove this information from trained LLMs, offers a promising solution to mitigate these risks. However, previous practices face three key challenges: 1. Utility: successful unlearning often causes catastrophic collapse on unrelated tasks. 2. Efficiency: many methods either involve adding similarly sized models, which slows down unlearning or inference, or require retain data that are difficult to obtain. 3. Robustness: even effective methods may still leak data via extraction techniques. To address these challenges, we propose MEOW, a simple yet effective gradient descent-based unlearning method. Specifically, we use an offline LLM to generate a set of inverted facts. Then, we design a new metric, MEMO, to quantify memorization in LLMs. Finally, based on the signals provided by MEMO, we select the most appropriate set of inverted facts and finetune the model based on them. We evaluate MEOW on the commonly used unlearn benchmark, ToFU, with Llama2-7B-Chat and Phi-1.5B, and test it on both NLU and NLG tasks. Results demonstrate significant improvement of MEOW in forget quality without substantial loss in model utility. Meanwhile, MEOW does not exhibit significant degradation in NLU or NLG capabilities, and there is even a slight improvement in NLU performance.


Aptly: Making Mobile Apps from Natural Language

Patton, Evan W., Kim, David Y. J., Granquist, Ashley, Liu, Robin, Scott, Arianna, Zamanova, Jennet, Abelson, Harold

arXiv.org Artificial Intelligence

We present Aptly, an extension of the MIT App Inventor platform enabling mobile app development via natural language powered by code-generating large language models (LLMs). Aptly complements App Inventor's block language with a text language designed to allow visual code generation via text-based LLMs. We detail the technical aspects of how the Aptly server integrates LLMs with a realtime collaboration function to facilitate the automated creation and editing of mobile apps given user instructions. The paper concludes with insights from a study of a pilot implementation involving high school students, which examines Aptly's practicality and user experience. The findings underscore Aptly's potential as a tool that democratizes app development and fosters technological creativity.


Do YOU know how your cat is feline? Experts reveal what your kitty's facial expressions really mean

Daily Mail - Science & tech

If you own a cat, it might feel like you've developed a shared language with your pet, whether it's a certain meow or a slow blink. But do you really know how your kitty is feeling? A study this week revealed that cats have almost 300 different facial expressions. This includes 126 facial expressions that suggest they're feeling friendly, and 102 that indicate they're in a grump. Here, MailOnline reveals how to tell if your cat is feeling content, anxious or even unfriendly, based on their facial expressions and body movements.


The cozy cat game that escaped from Valve

Engadget

Imagine a game that might be described as the opposite of Half-Life 2, Left 4 Dead or Counter-Strike: Global Offensive. These are first-person shooters set in wartorn, post-apocalyptic cities, so their inverse might be a third-person game with no weapons at all, set in a warm, buzzing metropolis of friendly characters, maybe starring an adorable cat. Weirdly, the result could look a lot like Little Kitty, Big City, the first project from former Valve designer Matt T. Wood. In nearly 17 years at Valve, Wood helped build and ship the company's most notable titles, including Left 4 Dead, Left 4 Dead 2, Portal 2, CS:GO and both episodes of Half-Life 2. He was a founding member of the CS:GO project and worked on that series for six years; he was pivotal in crafting Portal 2's co-op mode, and he created choreography and combat scenes in Half-Life and Left 4 Dead. Level design was one of his specialties.


Smart cat flap temporarily locks your kitty out if it detects it's holding prey

Daily Mail - Science & tech

While our cats may just think we are terrible, hairless hunters, their'gifts' of a chewed up mouse are not always welcome. To help prevent this messy occurrence, an entrepreneur has created a smart cat flap that will lock your kitty out the house temporarily while they are holding their prey. Martin Rosinski, 37, was sick of being woken by his adorable serial killer Jinx, who would drag in rodents at night and meow loudly to alert her sleeping owners. In June 2021, the app technical director modified his microchip cat flap by installing a camera and artificial intelligence (AI) technology that detects the presence of prey. If prey is recognised, the cat flap is temporarily locked and a notification is sent to the owner's phone along with a video of the attempted entry.