AITopics | moonshine

Collaborating Authors

moonshine

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Moonshine: Distilling with Cheap Convolutions

Elliot J. Crowley, Gavin Gray, Amos J. Storkey

Neural Information Processing SystemsFeb-12-2026, 18:36:37 GMT

Using attention transfer, we provide Pareto curves/tables for distillation of residual networks with four benchmark datasets, indicating the memory versus accuracy payoff.

artificial intelligence, convolution, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > Canada > Ontario > Toronto (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Moonshine: Distilling with Cheap Convolutions

Neural Information Processing SystemsNov-20-2025, 22:09:02 GMT

Many engineers wish to deploy modern neural networks in memory-limited settings; but the development of flexible methods for reducing memory use is in its infancy, and there is little knowledge of the resulting cost-benefit. We propose structural model distillation for memory reduction using a strategy that produces a student architecture that is a simple transformation of the teacher architecture: no redesign is needed, and the same hyperparameters can be used. Using attention transfer, we provide Pareto curves/tables for distillation of residual networks with four benchmark datasets, indicating the memory versus accuracy payoff. We show that substantial memory savings are possible with very little loss of accuracy, and confirm that distillation provides student network performance that is better than training that student architecture directly on data.

distilling, moonshine, name change, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.42)

Add feedback

Flavors of Moonshine: Tiny Specialized ASR Models for Edge Devices

King, Evan, Sabra, Adam, Kudlur, Manjunath, Wang, James, Warden, Pete

arXiv.org Artificial IntelligenceSep-3-2025

We present the Flavors of Moonshine, a suite of tiny automatic speech recognition (ASR) models specialized for a range of underrepresented languages. Prevailing wisdom suggests that multilingual ASR models outperform monolingual counterparts by exploiting cross-lingual phonetic similarities. We challenge this assumption, showing that for sufficiently small models (27M parameters), training monolingual systems on a carefully balanced mix of high-quality human-labeled, pseudo-labeled, and synthetic data yields substantially superior performance. On average, our models achieve error rates 48% lower than the comparably sized Whisper Tiny model, outperform the 9x larger Whisper Small model, and in most cases match or outperform the 28x larger Whisper Medium model. These results advance the state of the art for models of this size, enabling accurate on-device ASR for languages that previously had limited support. We release Arabic, Chinese, Japanese, Korean, Ukrainian, and Vietnamese Moonshine models under a permissive open-source license.

artificial intelligence, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2509.02523

Genre: Research Report (0.82)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.37)

Add feedback

Moonshine: Speech Recognition for Live Transcription and Voice Commands

Jeffries, Nat, King, Evan, Kudlur, Manjunath, Nicholson, Guy, Wang, James, Warden, Pete

arXiv.org Artificial IntelligenceOct-22-2024

This paper introduces Moonshine, a family of speech recognition models optimized for live transcription and voice command processing. Moonshine is based on an encoder-decoder transformer architecture and employs Rotary Position Embedding (RoPE) instead of traditional absolute position embeddings. The model is trained on speech segments of various lengths, but without using zero-padding, leading to greater efficiency for the encoder during inference time. When benchmarked against OpenAI's Whisper tiny-en, Moonshine Tiny demonstrates a 5x reduction in compute requirements for transcribing a 10-second speech segment while incurring no increase in word error rates across standard evaluation datasets. These results highlight Moonshine's potential for real-time and resource-constrained applications.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.15608

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Moonshine: Distilling Game Content Generators into Steerable Generative Models

Nie, Yuhe, Middleton, Michael, Merino, Tim, Kanagaraja, Nidhushan, Kumar, Ashutosh, Zhuang, Zhan, Togelius, Julian

arXiv.org Artificial IntelligenceAug-18-2024

Procedural Content Generation via Machine Learning (PCGML) has enhanced game content creation, yet challenges in controllability and limited training data persist. This study addresses these issues by distilling a constructive PCG algorithm into a controllable PCGML model. We first generate a large amount of content with a constructive algorithm and label it using a Large Language Model (LLM). We use these synthetic labels to condition two PCGML models for content-specific generation, a diffusion model and the five-dollar model. This neural network distillation process ensures that the generation aligns with the original algorithm while introducing controllability through plain text. We define this text-conditioned PCGML as a Text-to-game-Map (T2M) task, offering an alternative to prevalent text-to-image multi-modal tasks. We compare our distilled models with the baseline constructive algorithm. Our analysis of the variety, accuracy, and quality of our generation demonstrates the efficacy of distilling constructive methods into controllable text-conditioned PCGML models.

algorithm, dataset, short description, (14 more...)

arXiv.org Artificial Intelligence

2408.09594

Country:

North America > United States > New York (0.04)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Moonshine: Distilling with Cheap Convolutions

Crowley, Elliot J., Gray, Gavin, Storkey, Amos J.

Neural Information Processing SystemsFeb-14-2020, 11:43:42 GMT

cheap convolution, distilling, moonshine, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

Add feedback

BlockSwap: Fisher-guided Block Substitution for Network Compression

Turner, Jack, Crowley, Elliot J., Gray, Gavin, Storkey, Amos, O'Boyle, Michael

arXiv.org Machine LearningJun-10-2019

The desire to run neural networks on low-capacity edge devices has led to the development of a wealth of compression techniques. Moonshine (Crowley et al., 2018a) is a simple and powerful example of this: one takes a large pre-trained network and substitutes each of its convolutional blocks with a selected cheap alternative block, then distills the resultant network with the original. However, not all blocks are created equally; for a required parameter budget there may exist a potent combination of many different cheap blocks. In this work, we find these by developing BlockSwap: an algorithm for choosing networks with interleaved block types by passing a single minibatch of training data through randomly initialised networks and gauging their Fisher potential. We show that block-wise cheapening yields more accurate networks than single block-type networks across a spectrum of parameter budgets. Code is available at https://github.com/BayesWatch/

artificial intelligence, convolution, machine learning, (16 more...)

arXiv.org Machine Learning

1906.04113

Country: North America (0.28)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

The LICORS Cabinet: Nonparametric Algorithms for Spatio-temporal Prediction

Montanez, George D., Shalizi, Cosma Rohilla

arXiv.org Machine LearningSep-14-2016

Spatio-temporal data is intrinsically high dimensional, so unsupervised modeling is only feasible if we can exploit structure in the process. When the dynamics are local in both space and time, this structure can be exploited by splitting the global field into many lower-dimensional "light cones". We review light cone decompositions for predictive state reconstruction, introducing three simple light cone algorithms. These methods allow for tractable inference of spatio-temporal data, such as full-frame video. The algorithms make few assumptions on the underlying process yet have good predictive performance and can provide distributions over spatio-temporal data, enabling sophisticated probabilistic inference.

artificial intelligence, light cone, machine learning, (15 more...)

arXiv.org Machine Learning

1506.02686

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)

Add feedback

A Master of Umbral Moonshine Toys With String Theory

WIREDAug-20-2016, 12:55:03 GMT

After the Eyjafjallajökull volcano erupted in Iceland in 2010, flight cancellations left Miranda Cheng stranded in Paris. While waiting for the ash to clear, Cheng, then a postdoctoral researcher at Harvard University studying string theory, got to thinking about a paper that had recently been posted online. Its three coauthors had pointed out a numerical coincidence connecting far-flung mathematical objects. "That smells like another moonshine," Cheng recalled thinking. "Could it be another moonshine?" She happened to have read a book about the "monstrous moonshine," a mathematical structure that unfolded out of a similar bit of numerology: In the late 1970s, the mathematician John McKay noticed that 196,884, the first important coefficient of an object called the j-function, was the sum of one and 196,883, the first two dimensions in which a giant collection of symmetries called the monster group could be represented.

artificial intelligence, moonshine, string theory, (16 more...)

WIRED

Country:

Europe > Iceland (0.55)
Europe > Netherlands > North Holland > Amsterdam (0.05)
Europe > France (0.04)
Asia > Taiwan (0.04)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.34)

Add feedback