Jackson, Adrian
MoE-Gen: High-Throughput MoE Inference on a Single GPU with Module-Based Batching
Xu, Tairan, Xue, Leyang, Lu, Zhan, Jackson, Adrian, Mai, Luo
This paper presents MoE-Gen, a high-throughput MoE inference system optimized for single-GPU execution. Existing inference systems rely on model-based or continuous batching strategies, originally designed for interactive inference, which result in excessively small batches for MoE's key modules-attention and expert modules-leading to poor throughput. To address this, we introduce module-based batching, which accumulates tokens in host memory and dynamically launches large batches on GPUs to maximize utilization. Additionally, we optimize the choice of batch sizes for each module in an MoE to fully overlap GPU computation and communication, maximizing throughput. Evaluation demonstrates that MoE-Gen achieves 8-31x higher throughput compared to state-of-the-art systems employing model-based batching (FlexGen, MoE-Lightning, DeepSpeed), and offers even greater throughput improvements over continuous batching systems (e.g., vLLM and Ollama) on popular MoE models (DeepSeek and Mixtral) across offline inference tasks. MoE-Gen's source code is publicly available at https://github.com/EfficientMoE/MoE-Gen
FDM-Bench: A Comprehensive Benchmark for Evaluating Large Language Models in Additive Manufacturing Tasks
Eslaminia, Ahmadreza, Jackson, Adrian, Tian, Beitong, Stern, Avi, Gordon, Hallie, Malhotra, Rajiv, Nahrstedt, Klara, Shao, Chenhui
Fused Deposition Modeling (FDM) is a widely used additive manufacturing (AM) technique valued for its flexibility and cost-efficiency, with applications in a variety of industries including healthcare and aerospace. Recent developments have made affordable FDM machines accessible and encouraged adoption among diverse users. However, the design, planning, and production process in FDM require specialized interdisciplinary knowledge. Managing the complex parameters and resolving print defects in FDM remain challenging. These technical complexities form the most critical barrier preventing individuals without technical backgrounds and even professional engineers without training in other domains from participating in AM design and manufacturing. Large Language Models (LLMs), with their advanced capabilities in text and code processing, offer the potential for addressing these challenges in FDM. However, existing research on LLM applications in this field is limited, typically focusing on specific use cases without providing comprehensive evaluations across multiple models and tasks. To this end, we introduce FDM-Bench, a benchmark dataset designed to evaluate LLMs on FDM-specific tasks. FDM-Bench enables a thorough assessment by including user queries across various experience levels and G-code samples that represent a range of anomalies. We evaluate two closed-source models (GPT-4o and Claude 3.5 Sonnet) and two open-source models (Llama-3.1-70B and Llama-3.1-405B) on FDM-Bench. A panel of FDM experts assess the models' responses to user queries in detail. Results indicate that closed-source models generally outperform open-source models in G-code anomaly detection, whereas Llama-3.1-405B demonstrates a slight advantage over other models in responding to user queries. These findings underscore FDM-Bench's potential as a foundational tool for advancing research on LLM capabilities in FDM.
Deep network series for large-scale high-dynamic range imaging
Aghabiglou, Amir, Terris, Matthieu, Jackson, Adrian, Wiaux, Yves
We propose a new approach for large-scale high-dynamic range computational imaging. Deep Neural Networks (DNNs) trained end-to-end can solve linear inverse imaging problems almost instantaneously. While unfolded architectures provide robustness to measurement setting variations, embedding large-scale measurement operators in DNN architectures is impractical. Alternative Plug-and-Play (PnP) approaches, where the denoising DNNs are blind to the measurement setting, have proven effective to address scalability and high-dynamic range challenges, but rely on highly iterative algorithms. We propose a residual DNN series approach, also interpretable as a learned version of matching pursuit, where the reconstructed image is a sum of residual images progressively increasing the dynamic range, and estimated iteratively by DNNs taking the back-projected data residual of the previous iteration as input. We demonstrate on radio-astronomical imaging simulations that a series of only few terms provides a reconstruction quality competitive with PnP, at a fraction of the cost.
Scalable precision wide-field imaging in radio interferometry: II. AIRI validated on ASKAP data
Wilber, Amanda G., Dabbech, Arwa, Terris, Matthieu, Jackson, Adrian, Wiaux, Yves
Accompanying Part I, this sequel delineates a validation of the recently proposed AI for Regularisation in radio-interferometric Imaging (AIRI) algorithm on observations from the Australian Square Kilometre Array Pathfinder (ASKAP). The monochromatic AIRI-ASKAP images showcased in this work are formed using the same parallelised and automated imaging framework described in Part I: ``uSARA validated on ASKAP data''. Using a Plug-and-Play approach, AIRI differs from uSARA by substituting a trained denoising deep neural network (DNN) for the proximal operator in the regularisation step of the forward-backward algorithm during deconvolution. We build a trained shelf of DNN denoisers which target the estimated image-dynamic-ranges of our selected data. Furthermore, we quantify variations of AIRI reconstructions when selecting the nearest DNN on the shelf versus using a universal DNN with the highest dynamic range, opening the door to a more complete framework that not only delivers image estimation but also quantifies epistemic model uncertainty. We continue our comparative analysis of source structure, diffuse flux measurements, and spectral index maps of selected target sources as imaged by AIRI and the algorithms in Part I -- uSARA and WSClean. Overall we see an improvement over uSARA and WSClean in the reconstruction of diffuse components in AIRI images. The scientific potential delivered by AIRI is evident in further imaging precision, more accurate spectral index maps, and a significant acceleration in deconvolution time, whereby AIRI is four times faster than its sub-iterative sparsity-based counterpart uSARA.
First AI for deep super-resolution wide-field imaging in radio astronomy: unveiling structure in ESO 137--006
Dabbech, Arwa, Terris, Matthieu, Jackson, Adrian, Ramatsoku, Mpati, Smirnov, Oleg M., Wiaux, Yves
We introduce the first AI-based framework for deep, super-resolution, wide-field radio-interferometric imaging, and demonstrate it on observations of the ESO~137-006 radio galaxy. The algorithmic framework to solve the inverse problem for image reconstruction builds on a recent ``plug-and-play'' scheme whereby a denoising operator is injected as an image regulariser in an optimisation algorithm, which alternates until convergence between denoising steps and gradient-descent data-fidelity steps. We investigate handcrafted and learned variants of high-resolution high-dynamic range denoisers. We propose a parallel algorithm implementation relying on automated decompositions of the image into facets and the measurement operator into sparse low-dimensional blocks, enabling scalability to large data and image dimensions. We validate our framework for image formation at a wide field of view containing ESO~137-006, from 19 gigabytes of MeerKAT data at 1053 and 1399 MHz. The recovered maps exhibit significantly more resolution and dynamic range than CLEAN, revealing collimated synchrotron threads close to the galactic core.