top-1
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Italy > Veneto > Venice (0.04)
- (8 more...)
5bd529d5b07b647a8863cf71e98d651a-Supplemental.pdf
Kinetics-400 [1] is a large scale action recognition dataset with trimmed video clips of around 10-second durations. It is collected from realistic YouTube videos, which covers 400 categories of human activities. In total, it contains around240K training videos and20K validation videos. Specifically whentraining Kinetics-200/-400 from scratch, we adopt the cosine schedule of learning rate decaying with an initiallearningrateof0.1. The initial learning rate is 0.005anddecaysby 0.1atepoch20and40.
LearningEfficientVisionTransformersvia Fine-GrainedManifoldDistillation
In the past few years, transformers have achieved promising performance on various computer vision tasks. Unfortunately, the immense inference overhead of most existing vision transformers withholds them from being deployed on edge devices such ascellphones andsmart watches. Knowledge distillation isa widely used paradigm for compressing cumbersome architectures into compact students via transferring information.
General Transform: A Unified Framework for Adaptive Transform to Enhance Representations
Budiutama, Gekko, Daimon, Shunsuke, Nishi, Hirofumi, Matsushita, Yu-ichiro
Discrete transforms, such as the discrete Fourier transform, a re widely used in machine learning to improve model performance by extracting mea ningful features. However, with numerous transforms available, selectin g an appropriate one often depends on understanding the dataset's proper ties, making the approach less effective when such knowledge is unavailable. In th is work, we propose General Transform (GT), an adaptive transform-ba sed representation designed for machine learning applications. Unlike convent ional transforms, GT learns data-driven mapping tailored to the datase t and task of interest. Here, we demonstrate that models incorporating GT o utperform conventional transform-based approaches across computer v ision and natural language processing tasks, highlighting its effectiveness in diverse learning scenarios. Keywords: machine learning, deep learning, feature extraction 1. Introduction Deep neural networks have consistently pushed the boundaries o f performance on tasks in computer vision, natural language processing, a nd beyond. Corresponding author Email address: bgekko@quemix.com
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
- North America > United States > Washington > King County > Seattle (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
Fraesormer: Learning Adaptive Sparse Transformer for Efficient Food Recognition
Zou, Shun, Zou, Yi, Zhang, Mingya, Luo, Shipeng, Chen, Zhihao, Gao, Guangwei
In recent years, Transformer has witnessed significant progress in food recognition. However, most existing approaches still face two critical challenges in lightweight food recognition: (1) the quadratic complexity and redundant feature representation from interactions with irrelevant tokens; (2) static feature recognition and single-scale representation, which overlook the unstructured, non-fixed nature of food images and the need for multi-scale features. To address these, we propose an adaptive and efficient sparse Transformer architecture (Fraesormer) with two core designs: Adaptive Top-k Sparse Partial Attention (ATK-SPA) and Hierarchical Scale-Sensitive Feature Gating Network (HSSFGN). ATK-SPA uses a learnable Gated Dynamic Top-K Operator (GDTKO) to retain critical attention scores, filtering low query-key matches that hinder feature aggregation. It also introduces a partial channel mechanism to reduce redundancy and promote expert information flow, enabling local-global collaborative modeling. HSSFGN employs gating mechanism to achieve multi-scale feature representation, enhancing contextual semantic information. Extensive experiments show that Fraesormer outperforms state-of-the-art methods. code is available at https://zs1314.github.io/Fraesormer.
ASRank: Zero-Shot Re-Ranking with Answer Scent for Document Retrieval
Abdallah, Abdelrahman, Mozafari, Jamshid, Piryani, Bhawna, Jatowt, Adam
Retrieval-Augmented Generation (RAG) models have drawn considerable attention in modern open-domain question answering. The effectiveness of RAG depends on the quality of the top retrieved documents. However, conventional retrieval methods sometimes fail to rank the most relevant documents at the top. In this paper, we introduce ASRank, a new re-ranking method based on scoring retrieved documents using zero-shot answer scent which relies on a pre-trained large language model to compute the likelihood of the document-derived answers aligning with the answer scent. Our approach demonstrates marked improvements across several datasets, including NQ, TriviaQA, WebQA, ArchivalQA, HotpotQA, and Entity Questions. Notably, ASRank increases Top-1 retrieval accuracy on NQ from $19.2\%$ to $46.5\%$ for MSS and $22.1\%$ to $47.3\%$ for BM25. It also shows strong retrieval performance on several datasets compared to state-of-the-art methods (47.3 Top-1 by ASRank vs 35.4 by UPR by BM25).
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > United States > Arizona (0.04)
- (11 more...)
- Research Report > Promising Solution (0.47)
- Research Report > New Finding (0.46)
- Media (1.00)
- Leisure & Entertainment > Sports (0.67)
- Energy > Power Industry > Utilities > Nuclear (0.67)
Continual Learning Using Only Large Language Model Prompting
Qiu, Jiabao, Ke, Zixuan, Liu, Bing
We introduce CLOB, a novel continual learning (CL) paradigm wherein a large language model (LLM) is regarded as a black box. Learning is done incrementally via only verbal prompting. CLOB does not fine-tune any part of the LLM or add any trainable parameters to it. It is particularly suitable for LLMs that are accessible via APIs. We also propose a new CL technique, called CIS, based on incremental summarization that also overcomes the LLM's input length limit. Experiments show CIS outperforms baselines by a very large margin.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Mexico > Mexico City > Mexico City (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- (10 more...)
- Research Report (0.64)
- Overview (0.46)