device group
Failure-Resilient Distributed Inference with Model Compression over Heterogeneous Edge Devices
Wang, Li, Li, Liang, Xu, Lianming, Peng, Xian, Fei, Aiguo
The distributed inference paradigm enables the computation workload to be distributed across multiple devices, facilitating the implementations of deep learning based intelligent services on extremely resource-constrained Internet of Things (IoT) scenarios. Yet it raises great challenges to perform complicated inference tasks relying on a cluster of IoT devices that are heterogeneous in their computing/communication capacity and prone to crash or timeout failures. In this paper, we present RoCoIn, a robust cooperative inference mechanism for locally distributed execution of deep neural network-based inference tasks over heterogeneous edge devices. It creates a set of independent and compact student models that are learned from a large model using knowledge distillation for distributed deployment. In particular, the devices are strategically grouped to redundantly deploy and execute the same student model such that the inference process is resilient to any local failures, while a joint knowledge partition and student model assignment scheme are designed to minimize the response latency of the distributed inference system in the presence of devices with diverse capacities. Extensive simulations are conducted to corroborate the superior performance of our RoCoIn for distributed inference compared to several baselines, and the results demonstrate its efficacy in timely inference and failure resiliency.
- Asia > China > Beijing > Beijing (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe (0.04)
- Education (1.00)
- Information Technology > Smart Houses & Appliances (0.34)
Exploring the Impact of Table-to-Text Methods on Augmenting LLM-based Question Answering with Domain Hybrid Data
Min, Dehai, Hu, Nan, Jin, Rihui, Lin, Nuo, Chen, Jiaoyan, Chen, Yongrui, Li, Yu, Qi, Guilin, Li, Yun, Li, Nijun, Wang, Qianren
Augmenting Large Language Models (LLMs) for Question Answering (QA) with domain specific data has attracted wide attention. However, domain data often exists in a hybrid format, including text and semi-structured tables, posing challenges for the seamless integration of information. Table-to-Text Generation is a promising solution by facilitating the transformation of hybrid data into a uniformly text-formatted corpus. Although this technique has been widely studied by the NLP community, there is currently no comparative analysis on how corpora generated by different table-to-text methods affect the performance of QA systems. In this paper, we address this research gap in two steps. First, we innovatively integrate table-to-text generation into the framework of enhancing LLM-based QA systems with domain hybrid data. Then, we utilize this framework in real-world industrial data to conduct extensive experiments on two types of QA systems (DSFT and RAG frameworks) with four representative methods: Markdown format, Template serialization, TPLM-based method, and LLM-based method. Based on the experimental results, we draw some empirical findings and explore the underlying reasons behind the success of some methods. We hope the findings of this work will provide a valuable reference for the academic and industrial communities in developing robust QA systems.
- Asia > Singapore (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- (4 more...)
Expediting Distributed DNN Training with Device Topology-Aware Graph Deployment
Zhang, Shiwei, Yi, Xiaodong, Diao, Lansong, Wu, Chuan, Wang, Siyu, Lin, Wei
Abstract--This paper presents TAG, an automatic system to derive optimized DNN training graph and its deployment onto any device topology, for expedited training in device-and topology-heterogeneous ML clusters. We novelly combine both the DNN computation graph and the device topology graph as input to a graph neural network (GNN), and join the GNN with a search-based method to quickly identify optimized distributed training strategies. To reduce communication in a heterogeneous cluster, we further explore a lossless gradient compression technique and solve a combinatorial optimization problem to automatically apply the technique for training time minimization. We evaluate TAG with various representative DNN models and device topologies, showing that it can achieve up to 4.56x training speed-up as compared to existing schemes. TAG can produce efficient deployment strategies for both unseen DNN models and unseen device topologies, without heavy fine-tuning. These Deep learning (DL) has powered a wide range of applications decisions jointly form an exponentially large strategy space. in various areas including computer vision [1], [2], natural Current practice often falls back to heuristics that consider language processing [3], [4], recommendation systems [5], one aspect of the strategy space at a time [17], [18], resulting etc. Recent deep neural network (DNN) models feature a in less efficient or even infeasible solutions. BERT [6] with more than Pioneering works on deploying DNN models onto heterogeneous 340M parameters) to achieve superior performance [3], [6]. However, their models do not generalize these models. This makes them homogeneous cluster, e.g., training Bert using 8 NVIDIA impractical for AI clouds, where new resource configurations V100 GPUs [7].
- North America > United States (0.04)
- Asia > China > Hong Kong (0.04)
vSphere 8 Expands Machine Learning Support: Device Groups for NVIDIA GPUs and NICs
Data scientists and machine learning developers are building and training very large models these days with more extensive GPU memory needs. Many of these larger ML applications need more than one NVIDIA GPU device on the vSphere servers on which they operate or they may need to communicate between separate GPUs over the local network. This can be done for the purpose of expanding the overall GPU framebuffer memory capacity or for other reasons. Servers now exist on the market with eight or more physical GPUs in them and that number of GPUs per server will likely grow over time. With vSphere 8, you have the capability to add up to 8 virtual GPUs (vGPUs) to one VM.
- Information Technology > Hardware (0.63)
- Information Technology > Software (0.40)
- Information Technology > Hardware (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (0.91)
Can Microsoft get smarter? Inside the tech giant's massive bet on AI
Microsoft has so far released its artificial intelligence technologies largely through its well-known software platforms, such as the Cortana voice assistant on Windows 10, automated language translation in Microsoft Office, and AI-powered speech, vision, search and language technologies for developers on Microsoft Azure. Artificial intelligence specialists at the company are now working closely with its devices group, said Harry Shum, the executive vice president of Microsoft's AI and Research group, in a broader interview with GeekWire about the next phase of the company's AI initiatives. Without giving details, Shum said he expects some "very, very exciting devices" to result from the work by the company's AI engineers and devices group. Shum mentioned this as an aside, not to get the gadget blogs buzzing but to underscore the scope of what Microsoft is trying to do. As part of the massive engineering reorganization announced by CEO Satya Nadella last week, the company is attempting to bring artificial intelligence into everything it does.