mote
MOTE-NAS: Multi-Objective Training-based Estimate for Efficient Neural Architecture Search
Neural Architecture Search (NAS) methods seek effective optimization toward performance metrics regarding model accuracy and generalization while facing challenges regarding search costs and GPU resources. Recent Neural Tangent Kernel (NTK) NAS methods achieve remarkable search efficiency based on a training-free model estimate; however, they overlook the non-convex nature of the DNNs in the search process. In this paper, we develop Multi-Objective Training-based Estimate (MOTE) for efficient NAS, retaining search effectiveness and achieving the new state-of-the-art in the accuracy and cost trade-off. To improve NTK and inspired by the Training Speed Estimation (TSE) method, MOTE is designed to model the actual performance of DNNs from macro to micro perspective by draw loss landscape and convergence speed simultaneously. Using two reduction strategies, the MOTE is generated based on a reduced architecture and a reduced dataset. Inspired by evolutionary search, our iterative ranking-based, coarse-to-fine architecture search is highly effective. Experiments on NASBench-201 show MOTE-NAS achieves 94.32% accuracy on CIFAR-10, 72.81% on CIFAR-100, and 46.38% on ImageNet-16-120, outperforming NTK-based NAS approaches. An evaluation-free (EF) version of MOTE-NAS delivers high efficiency in only 5 minutes, delivering a model more accurate than KNAS.
MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer
Transferring visual-language knowledge from large-scale foundation models for video recognition has proved to be effective. To bridge the domain gap, additional parametric modules are added to capture the temporal information. However, zero-shot generalization diminishes with the increase in the number of specialized parameters, making existing works a trade-off between zero-shot and close-set performance. In this paper, we present MoTE, a novel framework that enables generalization and specialization to be balanced in one unified model. Our approach tunes a mixture of temporal experts to learn multiple task views with various degrees of data fitting. To maximally preserve the knowledge of each expert, we propose Weight Merging Regularization, which regularizes the merging process of experts in weight space.
This brain implant is smaller than a grain of rice
The wireless neural transmitter safely delivers brain signals like a microchip. Breakthroughs, discoveries, and DIY tips sent every weekday. Today's neural implants are smaller than ever, but often remain cumbersome and prone to complications . According to researchers at Cornell University, a new iteration detailed this week in the journal may offer a novel path forward for brain implants. Small enough to fit on a grain of rice, the microscale optoelectronic tetherless electrode (or MOTE) is vastly smaller than similar implants and its design could be adapted to work in other delicate areas of the body.
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Health Care Technology (1.00)
Concentration and excess risk bounds for imbalanced classification with synthetic oversampling
Ahmad, Touqeer, Kalan, Mohammadreza M., Portier, François, Stupfler, Gilles
Synthetic oversampling of minority examples using SMOTE and its variants is a leading strategy for addressing imbalanced classification problems. Despite the success of this approach in practice, its theoretical foundations remain underexplored. We develop a theoretical framework to analyze the behavior of SMOTE and related methods when classifiers are trained on synthetic data. We first derive a uniform concentration bound on the discrepancy between the empirical risk over synthetic minority samples and the population risk on the true minority distribution. We then provide a nonparametric excess risk guarantee for kernel-based classifiers trained using such synthetic data. These results lead to practical guidelines for better parameter tuning of both SMOTE and the downstream learning algorithm. Numerical experiments are provided to illustrate and support the theoretical findings
- North America > United States > California (0.05)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Netherlands (0.04)
- Europe > France > Brittany > Ille-et-Vilaine > Rennes (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Beyond instruction-conditioning, MoTE: Mixture of Task Experts for Multi-task Embedding Models
Romero, Miguel, Ding, Shuoyang, Barret, Corey D., Dinu, Georgiana, Karypis, George
Dense embeddings are fundamental to modern machine learning systems, powering Retrieval-Augmented Generation (RAG), information retrieval, and representation learning. While instruction-conditioning has become the dominant approach for embedding specialization, its direct application to low-capacity models imposes fundamental representational constraints that limit the performance gains derived from specialization. In this paper, we analyze these limitations and introduce the Mixture of Task Experts (MoTE) transformer block, which leverages task-specialized parameters trained with Task-Aware Contrastive Learning (\tacl) to enhance the model ability to generate specialized embeddings. Empirical results show that MoTE achieves $64\%$ higher performance gains in retrieval datasets ($+3.27 \rightarrow +5.21$) and $43\%$ higher performance gains across all datasets ($+1.81 \rightarrow +2.60$). Critically, these gains are achieved without altering instructions, training data, inference time, or number of active parameters.
- North America > United States > New York (0.04)
- North America > United States > Minnesota (0.04)
- North America > Dominican Republic (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.69)
MoTE: Mixture of Task-specific Experts for Pre-Trained ModelBased Class-incremental Learning
Li, Linjie, Wu, Zhenyu, Ji, Yang
Class-incremental learning (CIL) requires deep learning models to continuously acquire new knowledge from streaming data while preserving previously learned information. Recently, CIL based on pre-trained models (PTMs) has achieved remarkable success. However, prompt-based approaches suffer from prompt overwriting, while adapter-based methods face challenges such as dimensional misalignment between tasks. While the idea of expert fusion in Mixture of Experts (MoE) can help address dimensional inconsistency, both expert and routing parameters are prone to being overwritten in dynamic environments, making MoE challenging to apply directly in CIL. To tackle these issues, we propose a mixture of task-specific experts (MoTE) framework that effectively mitigates the miscalibration caused by inconsistent output dimensions across tasks. Inspired by the weighted feature fusion and sparse activation mechanisms in MoE, we introduce task-aware expert filtering and reliable expert joint inference during the inference phase, mimicking the behavior of routing layers without inducing catastrophic forgetting. Extensive experiments demonstrate the superiority of our method without requiring an exemplar set. Furthermore, the number of tasks in MoTE scales linearly with the number of adapters. Building on this, we further explore the trade-off between adapter expansion and model performance and propose the Adapter-Limited MoTE. The code is available at https://github.com/Franklilinjie/MoTE.