Goto

Collaborating Authors

 hardware


A 10K Bounty Awaits Anyone Who Can Hack Ring Cameras to Stop Sharing Data With Amazon

WIRED

The Fulu Foundation, a nonprofit that pays out bounties for removing user-hostile features, is hunting for a way to keep Ring cameras from sending data to Amazon--without breaking the hardware. Usually, when you see a feel-good story about finding a lost dog, you don't immediately react with fear and revulsion. But that was indeed the case in response to a Super Bowl commercial from Amazon-owned security camera company Ring. There's now a group offering to dole out a $10,000 bounty to wrest back control of the user data Ring controls. The ad showed off a new feature from Ring called Search Party.


Musk's SpaceX applies to launch 1m satellites into orbit

BBC News

Elon Musk - the boss of SpaceX as well as Tesla and X - is the world's richest person Elon Musk's SpaceX has applied to launch one million satellites into Earth's orbit to power artificial intelligence (AI). The application claims "orbital data centres" are the most cost and energy-efficient way to meet the growing demand for AI computing power. Traditionally, such centres are large warehouses full of powerful computers which process and store data. Musk's aerospace firm claims processing needs due to the expanding use of AI are already outpacing "terrestrial capabilities". It would increase the number of SpaceX satellites in orbit drastically.


Xbox is cooked. Why your next gaming console will be a PC

PCWorld

PCWorld observes that Xbox appears to be losing ground as gaming shifts toward PC-based living room experiences, with handheld PCs and cloud streaming leading this transformation. Valve's Steam Deck shows 20% of users dock their devices for TV gaming, while rumors suggest Microsoft's next Xbox may actually run Windows rather than proprietary console hardware. This shift matters because it offers superior gaming experiences through services like GeForce Now on smart TVs and versatile handheld PCs that double as home consoles. While wandering the show floor at CES 2026, I was struck by the future of living room gaming. What was once the realm of gaming consoles seems to be turning into PC territory.


Commodore 64 Ultimate Review: An Astonishing Remake

WIRED

The reborn Commodore 64 is an astonishing remake--but daunting if you weren't there the first time around. "Digital detox" approach is compelling. It's hard to overstate just how seismic an impact the Commodore 64 had on home computing. Launched in 1982, the 8-bit machine--iconic in its beige plastic shell with integrated keyboard--went on to become the best-selling personal computer of all time . Despite the success, manufacturer Commodore International folded in 1994, with rights to the name floating around for years.


LoQT: Low-Rank Adapters for Quantized Pretraining

Neural Information Processing Systems

Despite advances using low-rank adapters and quantization, pretraining of large models on consumer hardware has not been possible without model sharding, offloading during training, or per-layer gradient updates. To address these limitations, we propose Low-Rank Adapters for Quantized Training (LoQT), a method for efficiently training quantized models. LoQT uses gradient-based tensor factorization to initialize low-rank trainable weight matrices that are periodically merged into quantized full-rank weight matrices. Our approach is suitable for both pretraining and fine-tuning models. We demonstrate this for language modeling and downstream task adaptation, finding that LoQT enables efficient training of models up to 7B parameters on a 24GB GPU. We also demonstrate the feasibility of training a 13B model using per-layer gradient updates on the same hardware.


Dynamic Sparsity Is Channel-Level Sparsity Learner

Neural Information Processing Systems

Sparse training has received an upsurging interest in machine learning due to its tantalizing saving potential for both the entire training process as well as the inference. Dynamic sparse training (DST) as a leading approach can train deep neural networks at high sparsity from scratch to match the performance of their dense counterparts. However, most if not all DST prior arts demonstrate their effectiveness on unstructured sparsity with highly irregular sparse patterns, which receives limited support in common hardware. This limitation hinders the usage of DST in practice. In this paper, we propose Channel-aware dynamic sparse (Chase), that for the first time seamlessly translates the promise of unstructured dynamic sparsity to GPU-friendly channel-level sparsity (not fine-grained N:M or group sparsity) during one end-to-end training process, without any ad-hoc operations. The resulting small sparse networks can be directly accelerated by commodity hardware, without using any particularly sparsity-aware hardware accelerators. This appealing outcome is partially motivated by a hidden phenomenon of dynamic sparsity: off-the-shelf unstructured DST implicitly involves biased parameter reallocation across channels, with a large fraction of channels (up to 60%) being sparser than others.


Towards Hardware-Aware Tractable Learning of Probabilistic Models

Neural Information Processing Systems

Smart portable applications increasingly rely on edge computing due to privacy and latency concerns. But guaranteeing always-on functionality comes with two major challenges: heavily resource-constrained hardware; and dynamic application conditions. Probabilistic models present an ideal solution to these challenges: they are robust to missing data, allow for joint predictions and have small data needs. In addition, ongoing efforts in field of tractable learning have resulted in probabilistic models with strict inference efficiency guarantees. However, the current notions of tractability are often limited to model complexity, disregarding the hardware's specifications and constraints. We propose a novel resource-aware cost metric that takes into consideration the hardware's properties in determining whether the inference task can be efficiently deployed. We use this metric to evaluate the performance versus resource trade-off relevant to the application of interest, and we propose a strategy that selects the device-settings that can optimally meet users' requirements.


GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration

Neural Information Processing Systems

Despite advances in scalable models, the inference tools used for Gaussian processes (GPs) have yet to fully capitalize on developments in computing hardware. We present an efficient and general approach to GP inference based on Blackbox Matrix-Matrix multiplication (BBMM). BBMM inference uses a modified batched version of the conjugate gradients algorithm to derive all terms for training and inference in a single call. BBMM reduces the asymptotic complexity of exact GP inference from O(n^3) to O(n^2). Adapting this algorithm to scalable approximations and complex GP models simply requires a routine for efficient matrix-matrix multiplication with the kernel and its derivative. In addition, BBMM uses a specialized preconditioner to substantially speed up convergence. In experiments we show that BBMM effectively uses GPU hardware to dramatically accelerate both exact GP inference and scalable approximations. Additionally, we provide GPyTorch, a software platform for scalable GP inference via BBMM, built on PyTorch.


Hardware-adaptive Efficient Latency Prediction for NAS via Meta-Learning

Neural Information Processing Systems

For deployment, neural architecture search should be hardware-aware, in order to satisfy the device-specific constraints (e.g., memory usage, latency and energy consumption) and enhance the model efficiency. Existing methods on hardware-aware NAS collect a large number of samples (e.g., accuracy and latency) from a target device, either builds a lookup table or a latency estimator. However, such approach is impractical in real-world scenarios as there exist numerous devices with different hardware specifications, and collecting samples from such a large number of devices will require prohibitive computational and monetary cost. To overcome such limitations, we propose Hardware-adaptive Efficient Latency Predictor (HELP), which formulates the device-specific latency estimation problem as a meta-learning problem, such that we can estimate the latency of a model's performance for a given task on an unseen device with a few samples. To this end, we introduce novel hardware embeddings to embed any devices considering them as black-box functions that output latencies, and meta-learn the hardware-adaptive latency predictor in a device-dependent manner, using the hardware embeddings. We validate the proposed HELP for its latency estimation performance on unseen platforms, on which it achieves high estimation performance with as few as 10 measurement samples, outperforming all relevant baselines. We also validate end-to-end NAS frameworks using HELP against ones without it, and show that it largely reduces the total time cost of the base NAS method, in latency-constrained settings.


The Grand Illusion: The Myth of Software Portability and Implications for ML Progress.

Neural Information Processing Systems

Pushing the boundaries of machine learning often requires exploring different hardware and software combinations. However, this ability to experiment with different systems can be at odds with the drive for efficiency, which has produced increasingly specialized AI hardware and incentivized consolidation around a narrow set of ML frameworks. Exploratory research can be further restricted if software and hardware are co-evolving, making it even harder to stray away from a given tooling stack. While this friction increasingly impacts the rate of innovation in machine learning, to our knowledge the lack of portability in tooling has not been quantified. In this work we ask: How portable are popular ML software frameworks? We conduct a large scale study of the portability of mainstream ML frameworks across different hardware types. Our findings paint an uncomfortable picture -- frameworks can lose more than 40% of their key functions when ported to other hardware. Worse, even when functions are portable, the slowdown in their performance can be extreme. Collectively, our results reveal how costly straying from a narrow set of hardware-software combinations can be - and thus how specialization incurs an exploration cost that can impede innovation in machine learning research.