Goto

Collaborating Authors

 otter



Otters: An Energy-Efficient SpikingTransformer via Optical Time-to-First-Spike Encoding

Yan, Zhanglu, Mao, Jiayi, Liu, Qianhui, Li, Fanfan, Pan, Gang, Luo, Tao, Zhu, Bowen, Wong, Weng-Fai

arXiv.org Artificial Intelligence

However, such energy advantage is often unrealized because inference requires evaluating a temporal decay function and subsequent multiplication with the synaptic weights. We fabricated a custom indium oxide optoelectronic synapse, showing how its natural physical decay directly implements the required temporal function. By treating the device's analog output as the fused product of the synaptic weight and temporal decay, optoelectronic synaptic TTFS (named Otters) eliminates these expensive digital operations. To use the Otters paradigm in complex architectures like the transformer, which are challenging to train directly due to the sparsity issue, we introduce a novel quantized neural network-to-SNN conversion algorithm. This complete hardware-software co-design enables our model to achieve state-of-the-art accuracy across seven GLUE benchmark datasets and demonstrates a 1.77 improvement in energy efficiency over previous leading SNNs, based on a comprehensive analysis of compute, data movement, and memory access costs using energy measurements from a commercial 22nm process. Our work thus establishes a new paradigm for energy-efficient SNNs, translating fundamental device physics directly into powerful computational primitives. Large language models (LLMs) have demonstrated remarkable capabilities, yet their immense computational and energy costs hinder their deployment in resource-constrained environments such as edge devices (Lin et al., 2023; Jegham et al., 2025). This critical challenge has spurred research on more efficient, brain-inspired architectures, with spiking neural networks (SNNs) emerging as a promising candidate (Tang et al., 2025; Xing et al.).


OTTER: Effortless Label Distribution Adaptation of Zero-shot Models

Neural Information Processing Systems

Popular zero-shot models suffer due to artifacts inherited from pretraining. One particularly detrimental issue, caused by unbalanced web-scale pretraining data, is mismatched label distribution. Existing approaches that seek to repair the label distribution are not suitable in zero-shot settings, as they have mismatching requirements, such as needing access to labeled downstream task data or knowledge of the true label balance in the pretraining distribution. We sidestep these challenges and introduce a simple and lightweight approach to adjust pretrained model predictions via optimal transport. Our technique requires only an estimate of the label distribution of a downstream task.


Otter transcribes my life, and I just can't quit it

PCWorld

Otter is an AI-powered transcription service and app, and I use it every time I interview someone. Even in a group setting, it's the perfect tool for a journalist: it records and transcribes what people are saying, identifies the speaker, and allows me to click on the transcribed text and hear the recorded audio, just to check up on it. Otter even offers AI services, so I can see an AI-generated summary of the conversation and what needs to happen next. Yesterday, my wife complained that the secretary of a non-profit she volunteers at had quit, forcing her to record the minutes of a meeting. So, why should you use Otter?


Otter.ai's Meeting Agent can schedule calls and write emails for you

Engadget

The next time you join a video call, Otter.ai is hoping its new AI tool will help make things run smoother. On Tuesday, the company introduced the Otter Meeting Agent. It's part of a suite of three new AI helpers designed to assist a variety of different users. The first of those, the voice-activated Meeting Agent, can schedule follow-up calls and draft emails for you. It can also answer questions based on information it finds in your company's meeting database.


OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction

Huang, Huang, Liu, Fangchen, Fu, Letian, Wu, Tingfan, Mukadam, Mustafa, Malik, Jitendra, Goldberg, Ken, Abbeel, Pieter

arXiv.org Artificial Intelligence

Vision-Language-Action (VLA) models aim to predict robotic actions based on visual observations and language instructions. Existing approaches require fine-tuning pre-trained visionlanguage models (VLMs) as visual and language features are independently fed into downstream policies, degrading the pre-trained semantic alignments. We propose OTTER, a novel VLA architecture that leverages these existing alignments through explicit, text-aware visual feature extraction. Instead of processing all visual features, OTTER selectively extracts and passes only task-relevant visual features that are semantically aligned with the language instruction to the policy transformer. This allows OTTER to keep the pre-trained vision-language encoders frozen. Thereby, OTTER preserves and utilizes the rich semantic understanding learned from large-scale pre-training, enabling strong zero-shot generalization capabilities. In simulation and real-world experiments, OTTER significantly outperforms existing VLA models, demonstrating strong zeroshot generalization to novel objects and environments. Video, code, checkpoints, and dataset: https://ottervla.github.io/.


Otter: Generating Tests from Issues to Validate SWE Patches

Ahmed, Toufique, Ganhotra, Jatin, Pan, Rangeet, Shinnar, Avraham, Sinha, Saurabh, Hirzel, Martin

arXiv.org Artificial Intelligence

While there has been plenty of work on generating tests from existing code, there has been limited work on generating tests from issues. A correct test must validate the code patch that resolves the issue. In this work, we focus on the scenario where the code patch does not exist yet. This approach supports two major use-cases. First, it supports TDD (test-driven development), the discipline of "test first, write code later" that has well-documented benefits for human software engineers. Second, it also validates SWE (software engineering) agents, which generate code patches for resolving issues. This paper introduces Otter, an LLM-based solution for generating tests from issues. Otter augments LLMs with rule-based analysis to check and repair their outputs, and introduces a novel self-reflective action planning stage. Experiments show Otter outperforming state-of-the-art systems for generating tests from issues, in addition to enhancing systems that generate patches from issues. We hope that Otter helps make developers more productive at resolving issues and leads to more robust, well-tested code.


Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model

Yuan, Chenhan, Huang, Fei, Peng, Ru, Lu, Keming, Yu, Bowen, Zhou, Chang, Zhou, Jingren

arXiv.org Artificial Intelligence

Transformer-based large language models (LLMs) exhibit limitations such as generating unsafe responses, unreliable reasoning, etc. Existing inference intervention approaches attempt to mitigate these issues by finetuning additional models to produce calibration signals (such as rewards) that guide the LLM's decoding process. However, this solution introduces substantial time and space overhead due to the separate models required. This work proposes Non-disruptive parameters insertion (Otter), inserting extra parameters into the transformer architecture to predict calibration signals along with the original LLM output. Otter offers state-of-the-art performance on multiple demanding tasks while saving up to 86.5\% extra space and 98.5\% extra time. Furthermore, Otter seamlessly integrates with existing inference engines, requiring only a one-line code change, and the original model response remains accessible after the parameter insertion. Our code is publicly available at \url{https://github.com/chenhan97/Otter}


OTTER: Improving Zero-Shot Classification via Optimal Transport

Shin, Changho, Zhao, Jitian, Cromp, Sonia, Vishwakarma, Harit, Sala, Frederic

arXiv.org Artificial Intelligence

Popular zero-shot models suffer due to artifacts inherited from pretraining. A particularly detrimental artifact, caused by unbalanced web-scale pretraining data, is mismatched label distribution. Existing approaches that seek to repair the label distribution are not suitable in zero-shot settings, as they have incompatible requirements such as access to labeled downstream task data or knowledge of the true label balance in the pretraining distribution. We sidestep these challenges and introduce a simple and lightweight approach to adjust pretrained model predictions via optimal transport. Our technique requires only an estimate of the label distribution of a downstream task. Theoretically, we characterize the improvement produced by our procedure under certain mild conditions and provide bounds on the error caused by misspecification. Empirically, we validate our method in a wide array of zero-shot image and text classification tasks, improving accuracy by 4.8% and 15.9% on average, and beating baselines like Prior Matching -- often by significant margins -- in 17 out of 21 datasets.


MLLMs-Augmented Visual-Language Representation Learning

Liu, Yanqing, Wang, Kai, Shao, Wenqi, Luo, Ping, Qiao, Yu, Shou, Mike Zheng, Zhang, Kaipeng, You, Yang

arXiv.org Artificial Intelligence

Visual-language pre-training (VLP) has achieved remarkable success in multi-modal tasks, largely attributed to the availability of large-scale image-text datasets. In this work, we demonstrate that multi-modal large language models (MLLMs) can enhance visual-language representation learning by improving data quality. Our approach is simple, utilizing MLLMs to extend multiple captions for each image. To prevent the bias introduced by MLLMs' hallucinations and intrinsic caption styles, we propose "text shearing" to maintain the same length for extended captions as that of the original captions. In image-text retrieval, our method consistently obtains 5.6 ~ 35.0% and 16.8 ~ 46.1% improvement on R@1 under the fine-tuning and zero-shot settings, respectively. Notably, we obtain zero-shot results that are comparable to fine-tuning on target datasets, which encourages more exploration of the versatile use of MLLMs.