mone
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
The visual medium (images and videos) naturally contains a large amount of information redundancy, thereby providing a great opportunity for leveraging efficiency in processing. While Vision Transformer (ViT) based models scale effectively to large data regimes, they fail to capitalize on this inherent redundancy, leading to higher computational costs. Mixture of Experts (MoE) networks demonstrate scalability while maintaining same inference-time costs, but they come with a larger parameter footprint. We present Mixture of Nested Experts (MoNE), which utilizes a nested structure for experts, wherein individual experts fall on an increasing compute-accuracy curve. Given a compute budget, MoNE learns to dynamically choose tokens in a priority order, and thus redundant tokens are processed through cheaper nested experts. Using this framework, we achieve equivalent performance as the baseline models, while reducing inference time compute by over two-fold.
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Mixture of Neuron Experts
Cheng, Runxi, Guan, Yuchen, Ding, Yucheng, Hu, Qingguo, Wei, Yongxian, Yuan, Chun, Shen, Yelong, Chen, Weizhu, Gong, Yeyun
In this work, we first explore whether the parameters activated by the MoE layer remain highly sparse at inference. We perform a sparsification study on several representative MoE models. For each expert, we rank parameters by the magnitude of their activations from the gate projection and progressively prune the activated subset. Pruning up to 60% of parameters within that subset causes only negligible task-performance degradation; substantial drops occur only after more than 90% are removed. We further decompose experts into neuron-granular MoE and visualize their activation values, finding that most neuron activations are near zero. This observation motivates us to select only high-activation neuron experts during pretraining. Based on this insight, we propose Mixture of Neuron Experts (MoNE). MoNE achieves neuron-granular expert selection by only applying a simple top-k selection within each expert, incurs negligible latency, and requires no additional routing parameters or inter-expert communication. Extensive experiments demonstrate that MoNE matches traditional MoE performance while activating only 50% of the MoE-layer parameters, and it consistently outperforms traditional MoE when compared at equal numbers of activated parameters. These results suggest that MoNE is a practical approach to improving parameter utilization and inference efficiency in MoE-like models.
MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE
Zhang, Geng, Han, Yuxuan, Lou, Yuxuan, Zhao, Wangbo, Zhang, Yiqi, You, Yang
Mixture-of-Experts (MoE) enables efficient scaling of large language models by activating only a subset of experts per input token. However, deploying MoE-based models incurs significant memory overhead due to the need to retain all experts in memory. While structured pruning is promising to reduce memory costs, existing methods often show suboptimal performance and unstable degradation in three dimensions: model architectures, calibration data sources, and calibration sample sizes. This paper proposes Mixture-of-Novices-and-Experts (MoNE), a novel expert pruning method that replaces redundant experts with lightweight novices to achieve effective and robust model compression. MoNE evaluates expert redundancy based on two metrics: access frequency and output variance. Experts exhibiting low usage and stable outputs are pruned and replaced with lightweight novices-unbiased estimations of their original outputs-minimizing performance degradation. Extensive experiments demonstrate that MoNE consistently outperforms baseline methods with minimal accuracy degradation across the three dimensions, confirming its effectiveness and robustness. Notably, it improves the average zero shot accuracy across nine downstream tasks by up to 2.71 under 25\% pruning ratio and 3.61 under 50\% pruning. The code is available at https://github.com/zxgx/mode-pd.
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
The visual medium (images and videos) naturally contains a large amount of information redundancy, thereby providing a great opportunity for leveraging efficiency in processing. While Vision Transformer (ViT) based models scale effectively to large data regimes, they fail to capitalize on this inherent redundancy, leading to higher computational costs. Mixture of Experts (MoE) networks demonstrate scalability while maintaining same inference-time costs, but they come with a larger parameter footprint. We present Mixture of Nested Experts (MoNE), which utilizes a nested structure for experts, wherein individual experts fall on an increasing compute-accuracy curve. Given a compute budget, MoNE learns to dynamically choose tokens in a priority order, and thus redundant tokens are processed through cheaper nested experts.
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Jain, Gagan, Hegde, Nidhi, Kusupati, Aditya, Nagrani, Arsha, Buch, Shyamal, Jain, Prateek, Arnab, Anurag, Paul, Sujoy
The visual medium (images and videos) naturally contains a large amount of information redundancy, thereby providing a great opportunity for leveraging efficiency in processing. While Vision Transformer (ViT) based models scale effectively to large data regimes, they fail to capitalize on this inherent redundancy, leading to higher computational costs. Mixture of Experts (MoE) networks demonstrate scalability while maintaining same inference-time costs, but they come with a larger parameter footprint. We present Mixture of Nested Experts (MoNE), which utilizes a nested structure for experts, wherein individual experts fall on an increasing compute-accuracy curve. Given a compute budget, MoNE learns to dynamically choose tokens in a priority order, and thus redundant tokens are processed through cheaper nested experts. Using this framework, we achieve equivalent performance as the baseline models, while reducing inference time compute by over two-fold. We validate our approach on standard image and video datasets - ImageNet-21K, Kinetics400, and Something-Something-v2. We further highlight MoNE$'$s adaptability by showcasing its ability to maintain strong performance across different inference-time compute budgets on videos, using only a single trained model.
Millions of Californians are willing to donate organs, but relatively few do. Here's why
The scene at OneLegacy in Asuza on a recent Friday morning would have been familiar to anyone who's been in a hospital intensive care unit. Three adults were tucked into hospital beds, still and apparently asleep, with ventilators and other machines of artificial life doing the work that their bodies couldn't do. If you didn't know better, you'd think all the tubes and wires and boxes and screens were designed to save the lives of these patients, but it was too late for that. Instead, the machines were keeping oxygenated blood circulating through soon-to-be-donated organs of three people who had recently been declared brain dead. OneLegacy, you see, is not a hospital. And the kidneys, livers and other organs in those three bodies could save the lives of up to 24 people.
- North America > United States > California > Los Angeles County > Los Angeles (0.05)
- Asia > Middle East > Iran (0.05)
- North America > United States > Tennessee (0.04)
- (3 more...)
- Health & Medicine > Therapeutic Area > Nephrology (1.00)
- Health & Medicine > Health Care Providers & Services (1.00)
Deep learning features encode interpretable morphologies within histological images - Scientific Reports
Convolutional neural networks (CNNs) are revolutionizing digital pathology by enabling machine learning-based classification of a variety of phenotypes from hematoxylin and eosin (H&E) whole slide images (WSIs), but the interpretation of CNNs remains difficult. Most studies have considered interpretability in a post hoc fashion, e.g. by presenting example regions with strongly predicted class labels. However, such an approach does not explain the biological features that contribute to correct predictions. To address this problem, here we investigate the interpretability of H&E-derived CNN features (the feature weights in the final layer of a transfer-learning-based architecture). While many studies have incorporated CNN features into predictive models, there has been little empirical study of their properties. We show such features can be construed as abstract morphological genes (“mones”) with strong independent associations to biological phenotypes. Many mones are specific to individual cancer types, while others are found in multiple cancers especially from related tissue types. We also observe that mone-mone correlations are strong and robustly preserved across related cancers. Importantly, linear mone-based classifiers can very accurately separate 38 distinct classes (19 tumor types and their adjacent normals, AUC = $$97.1\% \pm 2.8\%$$ for each class prediction), and linear classifiers are also highly effective for universal tumor detection (AUC = $$99.2\% \pm 0.12\%$$ ). This linearity provides evidence that individual mones or correlated mone clusters may be associated with interpretable histopathological features or other patient characteristics. In particular, the statistical similarity of mones to gene expression values allows integrative mone analysis via expression-based bioinformatics approaches. We observe strong correlations between individual mones and individual gene expression values, notably mones associated with collagen gene expression in ovarian cancer. Mone-expression comparisons also indicate that immunoglobulin expression can be identified using mones in colon adenocarcinoma and that immune activity can be identified across multiple cancer types, and we verify these findings by expert histopathological review. Our work demonstrates that mones provide a morphological H&E decomposition that can be effectively associated with diverse phenotypes, analogous to the interpretability of transcription via gene expression values. Our work also demonstrates mones can be interpreted without using a classifier as a proxy.
How AI Is Changing Your Job Hunt
A few years ago, Jason Freeman was confronted by a classic hiring challenge. The 10-person startup he had founded, an online commercial real estate service called 42Floors.com, Suddenly, it seemed, Freedman, who had myriad other duties as CEO, was spending hours at a time sifting through towering stacks of résumés. The solution appeared in the form of artificial intelligence software from a young company called Interviewed. It speeds the vetting process by providing online simulations of what applicants might do on their first day as an employee. The software does much more than grade multiple-choice questions.
- North America > United States > Utah (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- (2 more...)
- Education (1.00)
- Law (0.95)
- Banking & Finance > Real Estate (0.88)