lite
Memory Efficient Meta-Learning with Large Images
Meta learning approaches to few-shot classification are computationally efficient at test time, requiring just a few optimization steps or single forward pass to learn a new task, but they remain highly memory-intensive to train. This limitation arises because a task's entire support set, which can contain up to 1000 images, must be processed before an optimization step can be taken. Harnessing the performance gains offered by large images thus requires either parallelizing the meta-learner across multiple GPUs, which may not be available, or trade-offs between task and image size when memory constraints apply. We improve on both options by proposing LITE, a general and memory efficient episodic training scheme that enables meta-training on large tasks composed of large images on a single GPU. We achieve this by observing that the gradients for a task can be decomposed into a sum of gradients over the task's training images. This enables us to perform a forward pass on a task's entire training set but realize significant memory savings by back-propagating only a random subset of these images which we show is an unbiased approximation of the full gradient. We use LITE to train meta-learners and demonstrate new state-of-the-art accuracy on the real-world ORBIT benchmark and 3 of the 4 parts of the challenging VTAB+MD benchmark relative to leading meta-learners. LITE also enables meta-learners to be competitive with transfer learning approaches but at a fraction of the test-time computational cost, thus serving as a counterpoint to the recent narrative that transfer learning is all you need for few-shot classification.
Appendix: Memory Efficient Meta-Learning with Large Images A Applying LITE to meta-learners
FiLM parameter generator so that the feature extractor can be configured for the task. The meta-testing flow is similar, with the exception of the loss computation.Figure A.1: CNAP Euclidean distance from each query set image embedding to each of the class prototypes is computed. The predicted class is the one with the minimum distance. Figure B.3: (Left) A FiLM layer operating on convolutional feature maps indexed by channel FiLM layer added to the feature extractor. Each task is composed of clips sampled from a single user's objects (random We then adapt the trained model to a task by using the task's support clips to: i) perform a These were chosen based on the number of learnable parameters in each model.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Asia > India (0.04)
Gear News of the Week: Withings Launches Its Pee Scanner, and Samsung Shows Off a Trifold Phone
Plus: Supercute kei cars from Honda and BYD, Insta360 has a cheaper 360 camera, and Nothing's latest phone won't be coming to the US, while the OnePlus 15 gets a launch date. All products featured on WIRED are independently selected by our editors. However, we may receive compensation from retailers and/or from purchases of products through these links. A few weeks ago, bathroom and plumbing company Kohler debuted the Dekoda, a health and wellness sensor that lives on your toilet bowl and records signs of your gut health and hydration. Now, Withings has launched the U-Scan.
- North America > United States > California (0.14)
- North America > United States > New York (0.05)
- Asia > South Korea (0.05)
- (5 more...)
- Information Technology > Artificial Intelligence (1.00)
- Information Technology > Communications > Mobile (0.72)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Asia > India (0.04)
UniMIC: Token-Based Multimodal Interactive Coding for Human-AI Collaboration
Mao, Qi, Yang, Tinghan, Li, Jiahao, Li, Bin, Jin, Libiao, Lu, Yan
The rapid progress of Large Multimodal Models (LMMs) and cloud-based AI agents is transforming human-AI collaboration into bidirectional, multimodal interaction. However, existing codecs remain optimized for unimodal, one-way communication, resulting in repeated degradation under conventional compress-transmit-reconstruct pipelines. To address this limitation, we propose UniMIC, a Unified token-based Multimodal Interactive Coding framework that bridges edge devices and cloud AI agents. Instead of transmitting raw pixels or plain text, UniMIC employs compact tokenized representations as the communication medium, enabling efficient low-bitrate transmission while maintaining compatibility with LMMs. To further enhance compression, lightweight Transformer-based entropy models with scenario-specific designs-generic, masked, and text-conditioned-effectively minimize inter-token redundancy. Extensive experiments on text-to-image generation, text-guided inpainting, outpainting, and visual question answering show that UniMIC achieves substantial bitrate savings and remains robust even at ultra-low bitrates (<0.05bpp), without compromising downstream task performance. These results establish UniMIC as a practical and forward-looking paradigm for next-generation multimodal interactive communication.
Dissecting the SWE-Bench Leaderboards: Profiling Submitters and Architectures of LLM- and Agent-Based Repair Systems
Martinez, Matias, Franch, Xavier
The rapid progress in Automated Program Repair (APR) has been driven by advances in AI, particularly large language models (LLMs) and agent-based systems. SWE-Bench is a recent benchmark designed to evaluate LLM-based repair systems using real issues and pull requests mined from 12 popular open-source Python repositories. Its public leaderboards -- SWE-Bench Lite and SWE-Bench Verified -- have become central platforms for tracking progress and comparing solutions. However, because the submission process does not require detailed documentation, the architectural design and origin of many solutions remain unclear. In this paper, we present the first comprehensive study of all submissions to the SWE-Bench Lite (79 entries) and Verified (99 entries) leaderboards, analyzing 80 unique approaches across dimensions such as submitter type, product availability, LLM usage, and system architecture. Our findings reveal the dominance of proprietary LLMs (especially Claude 3.5), the presence of both agentic and non-agentic designs, and a contributor base spanning from individual developers to large tech companies.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- (4 more...)
- Workflow (1.00)
- Research Report > New Finding (1.00)
LITE: A Learning-Integrated Topological Explorer for Multi-Floor Indoor Environments
Chen, Junhao, Zhang, Zhen, Zhu, Chengrui, Hou, Xiaojun, Hu, Tianyang, Wu, Huifeng, Liu, Yong
-- This work focuses on multi-floor indoor exploration, which remains an open area of research. Compared to traditional methods, recent learning-based explorers have demonstrated significant potential due to their robust environmental learning and modeling capabilities, but most are restricted to 2D environments. In this paper, we proposed a learning-integrated topological explorer, LITE, for multi-floor indoor environments. As we incrementally build floor-stair topology in exploration using YOLO11-based instance segmentation model, the agent can transition between floors through a finite state machine. Additionally, we implement an attention-based 2D exploration policy that utilizes an attention mechanism to capture spatial dependencies between different regions, thereby determining the next global goal for more efficient exploration. Extensive comparison and ablation studies conducted on the HM3D and MP3D datasets demonstrate that our proposed 2D exploration policy significantly outperforms all baseline explorers in terms of exploration efficiency. Furthermore, experiments in several 3D multi-floor environments indicate that our framework is compatible with various 2D exploration methods, facilitating effective multi-floor indoor exploration. I. INTRODUCTION Autonomous exploration is a fundamental problem in the development of embodied intelligence and plays a crucial role in uncertain scenarios such as search and rescue [1], scene reconstruction [2], and extraterrestrial planetary exploration [3].
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Vision (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.46)
Reflections on the Nintendo Switch, the hybrid console that changed gaming
The Switch 2 is nearly here, which means the original Switch is entering its twilight years. It's been eight years since Nintendo released its revolutionary hybrid console, and while many fans have spent the last couple of those itching for the device to be replaced, now seems like an opportune time to look back at what its legacy may wind up being (while acknowledging that it still has some life ahead of it). Instead of bleating on myself, though, I turned to the rest of the Engadget staff to see what comes to mind when they think of the Switch, as just about everyone on the team has played with the console. We've collected our reflections below -- some take a bigger-picture view, some are more personal, some contradict others' experiences entirely. There's plenty more that went unsaid. But I think that's part of the Switch's beauty; it's a device that's resonated with so many, in so many different ways, in its near-decade on the market.
- Information Technology > Artificial Intelligence > Games (0.69)
- Information Technology > Hardware (0.47)