AITopics | gpus

Collaborating Authors

gpus

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Orbital AI data centers could work, but they might ruin Earth in the process

EngadgetFeb-19-2026, 17:00:00 GMT

Samsung Galaxy Unpacked 2026 is Feb. 25 A single collision could cause a cascading effect in orbit. Elon Musk's plan to launch millions of AI satellites could be disastrous for the planet. At the start of the month, Elon Musk announced that two of his companies -- SpaceX and xAI -- were merging, and would jointly launch a constellation of 1 million satellites to operate as orbital data centers. Musk's reputation might suggest otherwise, but according to experts, such a plan isn't a complete fantasy. However, if executed at the scale suggested, some of them believe it would have devastating effects on the environment and the sustainability of low Earth Earth orbit.

artificial intelligence, cloud computing, satellite, (19 more...)

Engadget

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > California (0.04)
North America > Canada > British Columbia (0.04)
Asia > China (0.04)

Industry:

Information Technology > Services (1.00)
Government > Regional Government > North America Government > United States Government (0.95)
Aerospace & Defense (0.75)

Technology:

Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Mobile (0.89)

Add feedback

Nvidia's Deal With Meta Signals a New Era in Computing Power

WIREDFeb-18-2026, 19:24:55 GMT

The days of tech giants buying up discrete chips are over. AI companies now need GPUs, CPUs, and everything in between. Ask anyone what Nvidia makes, and they're likely to first say "GPUs." For decades, the chipmaker has been defined by advanced parallel computing, and the emergence of generative AI and the resulting surge in demand for GPUs has been a boon for the company . But Nvidia's recent moves signal that it's looking to lock in more customers at the less compute-intensive end of the AI market--customers who don't necessarily need the beefiest, most powerful GPUs to train AI models, but instead are looking for the most efficient ways to run agentic AI software.

artificial intelligence, machine learning, natural language, (13 more...)

WIRED

Country:

North America > United States > California (0.15)
Europe > Slovakia (0.05)
Europe > Czechia (0.05)
Asia > China (0.05)

Industry: Information Technology > Hardware (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.43)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.39)

Add feedback

Causes and Effects of Unanticipated Numerical Deviations in Neural Network Inference Frameworks

Neural Information Processing SystemsFeb-16-2026, 13:50:46 GMT

Hardware-specific optimizations in machine learning (ML) frameworks can cause numerical deviations of inference results. Quite surprisingly, despite using a fixed trained model and fixed input data, inference results are not consistent across platforms, and sometimes not even deterministic on the same platform. We study the causes of these numerical deviations for convolutional neural networks (CNN) on realistic end-to-end inference pipelines and in isolated experiments. Results from 75 distinct platforms suggest that the main causes of deviations on CPUs are differences in SIMD use, and the selection of convolution algorithms at runtime on GPUs. We link the causes and propagation effects to properties of the ML model and evaluate potential mitigations. We make our research code publicly available.

artificial intelligence, deviation, machine learning, (20 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Austria > Tyrol > Innsbruck (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Birder: Communication-Efficient 1-bit Adaptive Optimizer for Practical Distributed DNN Training

Neural Information Processing SystemsFeb-15-2026, 10:41:34 GMT

Therefore, from a system-level perspective, the design ethos of a system-efficient communication-compression algorithm is that we should guarantee that the compression/decompression of the algorithm is computationally light and takes less time, and it should also be friendly to efficient collective communication primitives.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Add feedback

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Neural Information Processing SystemsDec-26-2025, 10:49:47 GMT

Attention, as a core layer of the ubiquitous Transformer architecture, is the bottleneck for large language models and long-context applications.

artificial intelligence, large language model, natural language, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.60)

Add feedback

KOALA: Empirical Lessons Toward Memory-Efficient and Fast Diffusion Models for Text-to-Image Synthesis

Neural Information Processing SystemsDec-26-2025, 02:52:35 GMT

As text-to-image (T2I) synthesis models increase in size, they demand higher inference costs due to the need for more expensive GPUs with larger memory, which makes it challenging to reproduce these models in addition to the restricted access to training datasets. Our study aims to reduce these inference costs and explores how far the generative capabilities of T2I models can be extended using only publicly available datasets and open-source models. To this end, by using the de facto standard text-to-image model, Stable Diffusion XL (SDXL), we present three key practices in building an efficient T2I model: (1) Knowledge distillation: we explore how to effectively distill the generation capability of SDXL into an efficient U-Net and find that self-attention is the most crucial part.

artificial intelligence, machine learning, proceedings, (10 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.95)
Information Technology > Artificial Intelligence > Vision (0.57)

Add feedback

ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer

Neural Information Processing SystemsDec-25-2025, 20:42:03 GMT

Vision Transformers (ViTs) have shown impressive performance and have become a unified backbone for multiple vision tasks. However, both the attention mechanism and multi-layer perceptrons (MLPs) in ViTs are not sufficiently efficient due to dense multiplications, leading to costly training and inference. To this end, we propose to reparameterize pre-trained ViTs with a mixture of multiplication primitives, e.g., bitwise shifts and additions, towards a new type of multiplication-reduced model, dubbed $\textbf{ShiftAddViT}$, which aims to achieve end-to-end inference speedups on GPUs without requiring training from scratch. Specifically, all $\texttt{MatMuls}$ among queries, keys, and values are reparameterized using additive kernels, after mapping queries and keys to binary codes in Hamming space. The remaining MLPs or linear layers are then reparameterized with shift kernels.

artificial intelligence, machine learning, proceedings, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.58)

Add feedback

SpecExec: Massively Parallel Speculative Decoding For Interactive LLM Inference on Consumer Devices

Neural Information Processing SystemsDec-24-2025, 06:32:05 GMT

As large language models gain widespread adoption, running them efficiently becomes a crucial task. Recent works on LLM inference use speculative decoding to achieve extreme speedups. However, most of these works implicitly design their algorithms for high-end datacenter hardware. In this work, we ask the opposite question: how fast can we run LLMs on consumer machines? Consumer GPUs can no longer fit the largest available models and must offload them to RAM or SSD. With parameter offloading, hundreds or thousands of tokens can be processed in batches within the same time as just one token, making it a natural fit for speculative decoding. We propose SpecExec (Speculative Execution), a simple parallel decoding method that can generate up to 20 tokens per target model iteration for popular LLM families. SpecExec takes the most probable continuations from the draft model to build a cache tree for the target model, which then gets validated in a single pass. Using SpecExec, we demonstrate inference of 50B+ parameter LLMs on consumer GPUs with RAM offloading at 4--6 tokens per second with 4-bit quantization or 2--3 tokens per second with 16-bit weights.

artificial intelligence, large language model, natural language, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning

Neural Information Processing SystemsDec-24-2025, 02:26:59 GMT

Deep learning (DL) frameworks take advantage of GPUs to improve the speed of DL inference and training. Ideally, DL frameworks should be able to fully utilize the computation power of GPUs such that the running time depends on the amount of computation assigned to GPUs. Yet, we observe that in scheduling GPU tasks, existing DL frameworks suffer from inefficiencies such as large scheduling overhead and unnecessary serial execution. To this end, we propose Nimble, a DL execution engine that runs GPU tasks in parallel with minimal scheduling overhead. Nimble introduces a novel technique called ahead-of-time (AoT) scheduling.

artificial intelligence, machine learning, proceedings, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.43)

Add feedback

VER: Scaling On-Policy RL Leads to the Emergence of Navigation in Embodied Rearrangement

Neural Information Processing SystemsDec-24-2025, 00:13:26 GMT

We present Variable Experience Rollout (VER), a technique for efficiently scaling batched on-policy reinforcement learning in heterogenous environments (where different environments take vastly different times to generate rollouts) to many GPUs residing on, potentially, many machines. VER combines the strengths of and blurs the line between synchronous and asynchronous on-policy RL methods (SyncOnRL and AsyncOnRL, respectively). Specifically, it learns from on-policy experience (like SyncOnRL) and has no synchronization points (like AsyncOnRL) enabling high throughput.We find that VER leads to significant and consistent speed-ups across a broad range of embodied navigation and mobile manipulation tasks in photorealistic 3D simulation environments. Specifically, for PointGoal navigation and ObjectGoal navigation in Habitat 1.0, VER is 60-100% faster (1.6-2x speedup) than DD-PPO, the current state of art for distributed SyncOnRL, with similar sample efficiency. For mobile manipulation tasks (open fridge/cabinet, pick/place objects) in Habitat 2.0 VER is 150% faster (2.5x speedup) on 1 GPU and 170% faster (2.7x speedup) on 8 GPUs than DD-PPO.

artificial intelligence, machine learning, proceedings, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.54)

Add feedback