AITopics | nimble

Collaborating Authors

nimble

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

5f0ad4db43d8723d18169b2e4817a160-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 14:36:20 GMT

execution, gpu task, neural network, (12 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

5f0ad4db43d8723d18169b2e4817a160-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-8-2026, 14:36:09 GMT

framework overhead, kernel, nimble, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.58)

Add feedback

Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning

Neural Information Processing SystemsDec-24-2025, 02:26:59 GMT

Deep learning (DL) frameworks take advantage of GPUs to improve the speed of DL inference and training. Ideally, DL frameworks should be able to fully utilize the computation power of GPUs such that the running time depends on the amount of computation assigned to GPUs. Yet, we observe that in scheduling GPU tasks, existing DL frameworks suffer from inefficiencies such as large scheduling overhead and unnecessary serial execution. To this end, we propose Nimble, a DL execution engine that runs GPU tasks in parallel with minimal scheduling overhead. Nimble introduces a novel technique called ahead-of-time (AoT) scheduling.

name change, nimble, parallel gpu task scheduling, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.43)

Add feedback

Supplementary Materials for Nimble: Lightweight and Parallel GPU T ask Scheduling for Deep Learning Appendix A Proofs on the Stream Assignment Algorithm of Nimble

Neural Information Processing SystemsOct-3-2025, 00:58:50 GMT

In this section, we provide detailed proofs on the theorems presented in Section 4.2. We assume that the computation graph of a neural network is given. Here we define important concepts and terminologies used in the following proofs. F or any (u,v) E, f ( u) = f (v) or there exists a path P E from u to v such that P Λ null= . Prior to the proof of Theorem 1-2, we describe and prove Lemma 1 and Lemma 2. Lemma 1. We will prove by contradiction.

artificial intelligence, machine learning, maximum logical concurrency, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning

Neural Information Processing SystemsOct-3-2025, 00:58:43 GMT

Nimble introduces a novel technique called ahead-of-time (AoT) scheduling.

artificial intelligence, gpu task, machine learning, (15 more...)

Neural Information Processing Systems

Genre: Research Report (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Thank you for the insightful comments and the opportunity to follow up

Neural Information Processing SystemsOct-3-2025, 00:58:32 GMT

Thank you for the insightful comments and the opportunity to follow up. PyTorch's native implementation) to Nimble and measure its performance. Note that TensorRT and TVM do not support training for now. Figure 1: Speedup compared to TensorRT on inference workloads (batch size 1) using V100. Figure 2: Speedup compared to Py-Torch on training using V100.

artificial intelligence, machine learning, nimble, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.38)

Add feedback

Review for NeurIPS paper: Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning

Neural Information Processing SystemsJan-24-2025, 21:17:04 GMT

Weaknesses: This work is most applicable on networks with many small kernels, which may not be of broad interest in all cases. Nonetheless, it does help with training MobileNet and similar networks on desktop or server GPUs. I also feel that some parts of the paper overstate the contribution, either by only evaluating on these networks or by leaving out some optimized baselines. The biggest issues here are: - For inference, you should compare against an optimized inference runtime such as TensorRT. This will likely do better than PyTorch or Caffe2 do out of the box, even with TorchScript.

deep learning, neurips paper, parallel gpu task scheduling, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback

Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning

Neural Information Processing SystemsOct-10-2024, 07:59:28 GMT

deep learning, parallel gpu task scheduling, scheduling overhead, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)

Add feedback

The biggest threat your nail salon has ever seen

FOX NewsJan-24-2024, 11:00:47 GMT

Nimble helps avoid the nail salon. Nail salons everywhere may soon face a serious competitor: Nimble, the robot manicurist. The company calls it the world's first smart home nail salon. It is a revolutionary device that lets you get a flawless manicure at home without any hassle. Nimble uses patented pioneering technology to scan, paint and dry your nails with one game-changing device.

artificial intelligence, nail salon, nimble, (11 more...)

FOX News

Genre: Personal > Interview (0.31)

Industry: Consumer Products & Services > Personal Care Services (1.00)

Technology: Information Technology > Artificial Intelligence > Robots (0.65)

Add feedback

Opara: Exploiting Operator Parallelism for Expediting DNN Inference on GPUs

Chen, Aodong, Xu, Fei, Han, Li, Dong, Yuan, Chen, Li, Zhou, Zhi, Liu, Fangming

arXiv.org Artificial IntelligenceDec-16-2023

GPUs have become the defacto hardware devices to accelerate Deep Neural Network (DNN) inference in deep learning(DL) frameworks. However, the conventional sequential execution mode of DNN operators in mainstream DL frameworks cannot fully utilize GPU resources, due to the increasing complexity of DNN model structures and the progressively smaller computational sizes of DNN operators. Moreover, the inadequate operator launch order in parallelized execution scenarios can lead to GPU resource wastage and unexpected performance interference among operators. To address such performance issues above, we propose Opara, a resource- and interference-aware DNN Operator parallel scheduling framework to accelerate the execution of DNN inference on GPUs. Specifically, Opara first employs CUDA Streams and CUDA Graph to automatically parallelize the execution of multiple DNN operators. It further leverages the resource demands of DNN operators to judiciously adjust the operator launch order on GPUs by overlapping the execution of compute-intensive and memory-intensive operators, so as to expedite DNN inference. We implement and open source a prototype of Opara based on PyTorch in a non-intrusive manner. Extensive prototype experiments with representative DNN and Transformer-based models demonstrate that Opara outperforms the default sequential CUDA Graph in PyTorch and the state-of-the-art DNN operator parallelism systems by up to 1.68$\times$ and 1.29$\times$, respectively, yet with acceptable runtime overhead.

execution, opara, operator, (14 more...)

arXiv.org Artificial Intelligence

2312.10351

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Louisiana > Lafayette Parish > Lafayette (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report (0.64)

Industry: Information Technology (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback