Goto

Collaborating Authors

 multiple device


Innovation abounds in device charging

MIT Technology Review

No longer peripheral accessories, chargers today are more powerful, portable, and proactive. Consumers can look forward to rapid innovations in the coming years. The changes may be less perceptible than in smartphones, tablets, or wearables, but chargers have also been quietly reinvented over the last decade. At one time a bulky mix of tangled cables and connectors, slow to perform and prone to overheating, they're now smaller, safer, and faster, thanks to a slew of technological advances. These advances include a switch to gallium nitride (GaN), which has now usurped silicon as the preferred semiconductor, capable of handling higher voltages, faster switches, and more efficient conduction. Multi-port chargers, coupled with an industry-wide shift toward USB-C standardization, mean a single charger can handle multiple devices.


Efficient Algorithms for Device Placement of DNN Graph Operators

Neural Information Processing Systems

Modern machine learning workloads use large models, with complex structures, that are very expensive to execute. The devices that execute complex models are becoming increasingly heterogeneous as we see a flourishing of Domain Specific Architectures (DSAs) being offered as hardware accelerators in addition to CPUs.



Efficient Algorithms for Device Placement of DNN Graph Operators

Neural Information Processing Systems

Modern machine learning workloads use large models, with complex structures, that are very expensive to execute. The devices that execute complex models are becoming increasingly heterogeneous as we see a flourishing of Domain Specific Architectures (DSAs) being offered as hardware accelerators in addition to CPUs. Recent work has shown that significant gains can be obtained with model parallelism, i.e, partitioning a neural network's computational graph onto multiple devices. In particular, this form of parallelism assumes a pipeline of devices, which is fed a stream of samples and yields high throughput for training and inference of DNNs. However, for such settings (large models and multiple heterogeneous devices), we require automated algorithms and toolchains that can partition the ML workload across devices.


Avoiding Siri slipups and apologies for butt dials

FOX News

Voice assistants may cause confusion across devices. Tech expert Kurt Knutsson offers some solutions to fix it. When it comes to using voice assistants across multiple devices, things can get a bit tricky. "Mike" from St. George, Utah, found himself in a comical yet frustrating situation with his personal and work iPhones. Let's dive into his predicament and explore some solutions.


Efficient Algorithms for Device Placement of DNN Graph Operators

Neural Information Processing Systems

Modern machine learning workloads use large models, with complex structures, that are very expensive to execute. The devices that execute complex models are becoming increasingly heterogeneous as we see a flourishing of Domain Specific Architectures (DSAs) being offered as hardware accelerators in addition to CPUs. Recent work has shown that significant gains can be obtained with model parallelism, i.e, partitioning a neural network's computational graph onto multiple devices. In particular, this form of parallelism assumes a pipeline of devices, which is fed a stream of samples and yields high throughput for training and inference of DNNs. However, for such settings (large models and multiple heterogeneous devices), we require automated algorithms and toolchains that can partition the ML workload across devices.


GAIA: A General AI Assistant for Intelligent Accelerator Operations

arXiv.org Artificial Intelligence

Particle accelerators are complex machines that consist of a large number of subsystems. Although many processes are automated and feedback systems are in place, experiments and machine supervision need to be performed by a group of operators. These operators usually have an accelerator physics background and mostly know how the technology works. They especially know how to setup and tune the machine parameters for certain working points and experiments using high-level graphical user interfaces, which are connected to low-level machine control software. Due to the complexity of the machine, some subsystems of the machine are taken care of by experts, who the operators can turn to. This work shows that it is possible to support the day-to-day operation of a complex machine like a particle accelerator using a large language model (LLM), an object-oriented high-level machine control system framework, as well as a number of interfaces to knowledge bases such as the electronic logbook. The system is able to assist the operators on many levels, e.g. by producing Python scripts, which when executed perform a task defined by an input prompt to the LLM. To this end, the reasoning and action prompting paradigm (ReAct) [Yao et al., 2023] is implemented. This way a multi-expert system is realized, mimicking the real world, where the complex machine is operated by many subsystem experts.


Crazy flexible phone with a screen that can bend around your wrist

FOX News

Adaptive display can change its shape, mode and color according to your needs. Kurt "The CyberGuy" Knutsson explains. Imagine a phone that can bend to your will, literally. A phone that can transform from a flat screen to a wristband, or a stand, or anything you want. Sounds like science fiction, right?


Qualcomm's Snapdragon X Elite chips promise major PC performance

PCWorld

Qualcomm and its Snapdragon chips are officially back inside the PC. Today, Qualcomm formally launched the Snapdragon X Elite, the flagship platform of its Snapdragon X family that leverages its Oryon CPU core, and promises to double -- yes, double -- the performance of some of the most popular 13th-gen Core chips from AMD and Intel. Qualcomm promised the same with its earlier Snapdragon 8-series chips, and really didn't deliver. But after buying chip designer Nuvia in 2021, Qualcomm is trying again, hoping that its superpowered Arm chips can once again make Windows on Arm PCs a competitor to conventional X86 PCs when they launch in mid-2024. And they're talking some big numbers to prove it.


PARTIME: Scalable and Parallel Processing Over Time with Deep Neural Networks

arXiv.org Artificial Intelligence

In this paper, we present PARTIME, a software library written in Python and based on PyTorch, designed specifically to speed up neural networks whenever data is continuously streamed over time, for both learning and inference. Existing libraries are designed to exploit data-level parallelism, assuming that samples are batched, a condition that is not naturally met in applications that are based on streamed data. Differently, PARTIME starts processing each data sample at the time in which it becomes available from the stream. PARTIME wraps the code that implements a feed-forward multi-layer network and it distributes the layer-wise processing among multiple devices, such as Graphics Processing Units (GPUs). Thanks to its pipeline-based computational scheme, PARTIME allows the devices to perform computations in parallel. At inference time this results in scaling capabilities that are theoretically linear with respect to the number of devices. During the learning stage, PARTIME can leverage the non-i.i.d. nature of the streamed data with samples that are smoothly evolving over time for efficient gradient computations. Experiments are performed in order to empirically compare PARTIME with classic non-parallel neural computations in online learning, distributing operations on up to 8 NVIDIA GPUs, showing significant speedups that are almost linear in the number of devices, mitigating the impact of the data transfer overhead.