AITopics | tpus

Collaborating Authors

tpus

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

25 Tech Insiders on the Innovations Defining American Life Today

TIME - TechMay-5-2026, 13:00:17 GMT

Apple's iPhone is unveiled at a press conference in central London in September 2007. When iPhone arrived in 2007, it didn't just change technology. It changed how we live. That same year, TIME named it the Invention of the Year and called it "the phone that forever changed phones." But what mattered most wasn't just what iPhone was, but what it made possible.

artificial intelligence, press release, smartphone, (14 more...)

TIME - Tech

Country: North America > United States > California (0.47)

Genre: Press Release (0.34)

Industry:

Energy > Renewable > Solar (0.69)
Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Mobile (0.77)

Add feedback

Why Google's custom AI chips are shaking up the tech industry

New ScientistNov-28-2025, 16:00:11 GMT

Why Google's custom AI chips are shaking up the tech industry Ironwood is Google's latest tensor processing unit Nvidia's position as the dominant supplier of AI chips may be under threat from a specialised chip pioneered by Google, with reports suggesting companies like Meta and Anthropic are looking to spend billions on Google's tensor processing units. The success of the artificial intelligence industry has been in large part based on graphical processing units (GPUs), a kind of computer chip that can perform many parallel calculations at the same time, rather than one after the other like the computer processing units (CPUs) that power most computers. 'Flashes of brilliance and frustration': I let an AI agent run my day GPUs were originally developed to assist with computer graphics, as the name suggests, and gaming. "If I have a lot of pixels in a space and I need to do a rotation of this to calculate a new camera view, this is an operation that can be done in parallel, for many different pixels," says Francesco Conti at the University of Bologna in Italy. This ability to do calculations in parallel happened to be useful for training and running AI models, which often use calculations involving vast grids of numbers performed at the same time, called matrix multiplication.

artificial intelligence, machine learning, social media, (17 more...)

New Scientist

Country:

Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.25)
Europe > United Kingdom > England > Bristol (0.05)
Africa (0.05)

Industry: Information Technology (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.72)

Add feedback

Leveraging Compute-in-Memory for Efficient Generative Model Inference in TPUs

Zhu, Zhantong, Li, Hongou, Ren, Wenjie, Wu, Meng, Ye, Le, Huang, Ru, Jia, Tianyu

arXiv.org Artificial IntelligenceMar-1-2025

--With the rapid advent of generative models, efficiently deploying these models on specialized hardware has become critical. T ensor Processing Units (TPUs) are designed to accelerate AI workloads, but their high power consumption necessitates innovations for improving efficiency. Compute-in-memory (CIM) has emerged as a promising paradigm with superior area and energy efficiency. In this work, we present a TPU architecture that integrates digital CIM to replace conventional digital systolic arrays in matrix multiply units (MXUs). We first establish a CIM-based TPU architecture model and simulator to evaluate the benefits of CIM for diverse generative model inference. Building upon the observed design insights, we further explore various CIM-based TPU architectural design choices. Up to 44.2% and 33.8% performance improvement for large language model and diffusion transformer inference, and 27.3 reduction in MXU energy consumption can be achieved with different design choices, compared to the baseline TPUv4i architecture. Generative models, such as large language models (LLMs) and diffusion models (DMs), have exhibited exceptional performance in generating content across various modalities. For example, LLMs have dominated NLP tasks, powering applications like ChatGPT [1].

efficiency, inference, opération, (13 more...)

arXiv.org Artificial Intelligence

2503.00461

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Life-Cycle Emissions of AI Hardware: A Cradle-To-Grave Approach and Generational Trends

Schneider, Ian, Xu, Hui, Benecke, Stephan, Patterson, David, Huang, Keguo, Ranganathan, Parthasarathy, Elsworth, Cooper

arXiv.org Artificial IntelligenceFeb-1-2025

Specialized hardware accelerators aid the rapid advancement of artificial intelligence (AI), and their efficiency impacts AI's environmental sustainability. This study presents the first publication of a comprehensive AI accelerator life-cycle assessment (LCA) of greenhouse gas emissions, including the first publication of manufacturing emissions of an AI accelerator. Our analysis of five Tensor Processing Units (TPUs) encompasses all stages of the hardware lifespan - from raw material extraction, manufacturing, and disposal, to energy consumption during development, deployment, and serving of AI models. Using first-party data, it offers the most comprehensive evaluation to date of AI hardware's environmental impact. We include detailed descriptions of our LCA to act as a tutorial, road map, and inspiration for other computer engineers to perform similar LCAs to help us all understand the environmental impacts of our chips and of AI. A byproduct of this study is the new metric compute carbon intensity (CCI) that is helpful in evaluating AI hardware sustainability and in estimating the carbon footprint of training and inference. This study shows that CCI improves 3x from TPU v4i to TPU v6e. Moreover, while this paper's focus is on hardware, software advancements leverage and amplify these gains.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2502.01671

Country:

North America > United States (1.00)
Europe (0.93)

Genre: Research Report > New Finding (0.93)

Industry:

Information Technology > Services (1.00)
Information Technology > Hardware (1.00)
Energy > Renewable (1.00)
(2 more...)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)

Add feedback

Scalable Machine Learning Training Infrastructure for Online Ads Recommendation and Auction Scoring Modeling at Google

Kurian, George, Sardashti, Somayeh, Sims, Ryan, Berger, Felix, Holt, Gary, Li, Yang, Willcock, Jeremiah, Wang, Kaiyuan, Quiroz, Herve, Salem, Abdulrahman, Grady, Julian

arXiv.org Artificial IntelligenceJan-17-2025

Large-scale Ads recommendation and auction scoring models at Google scale demand immense computational resources. While specialized hardware like TPUs have improved linear algebra computations, bottlenecks persist in large-scale systems. This paper proposes solutions for three critical challenges that must be addressed for efficient end-to-end execution in a widely used production infrastructure: (1) Input Generation and Ingestion Pipeline: Efficiently transforming raw features (e.g., "search query") into numerical inputs and streaming them to TPUs; (2) Large Embedding Tables: Optimizing conversion of sparse features into dense floating-point vectors for neural network consumption; (3) Interruptions and Error Handling: Minimizing resource wastage in large-scale shared datacenters. To tackle these challenges, we propose a shared input generation technique to reduce computational load of input generation by amortizing costs across many models. Furthermore, we propose partitioning, pipelining, and RPC (Remote Procedure Call) coalescing software techniques to optimize embedding operations. To maintain efficiency at scale, we describe novel preemption notice and training hold mechanisms that minimize resource wastage, and ensure prompt error resolution. These techniques have demonstrated significant improvement in Google production, achieving a 116% performance boost and an 18% reduction in training costs across representative models.

artificial intelligence, machine learning, pipeline, (19 more...)

arXiv.org Artificial Intelligence

2501.10546

Country: Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Services (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

On-Device LLMs for SMEs: Challenges and Opportunities

Yee, Jeremy Stephen Gabriel, Ng, Pai Chet, Wang, Zhengkui, McLoughlin, Ian, Ng, Aik Beng, See, Simon

arXiv.org Artificial IntelligenceOct-22-2024

This paper presents a systematic review of the infrastructure requirements for deploying Large Language Models (LLMs) on-device within the context of small and medium-sized enterprises (SMEs), focusing on both hardware and software perspectives. From the hardware viewpoint, we discuss the utilization of processing units like GPUs and TPUs, efficient memory and storage solutions, and strategies for effective deployment, addressing the challenges of limited computational resources typical in SME settings. From the software perspective, we explore framework compatibility, operating system optimization, and the use of specialized libraries tailored for resource-constrained environments. The review is structured to first identify the unique challenges faced by SMEs in deploying LLMs on-device, followed by an exploration of the opportunities that both hardware innovations and software adaptations offer to overcome these obstacles. Such a structured review provides practical insights, contributing significantly to the community by enhancing the technological resilience of SMEs in integrating LLMs.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.1607

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Singapore (0.06)
Europe > United Kingdom > England > Greater Manchester > Salford (0.04)
(2 more...)

Genre:

Overview (0.69)
Research Report (0.64)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Reviews: Task-Driven Convolutional Recurrent Models of the Visual System

Neural Information Processing SystemsOct-7-2024, 12:34:35 GMT

Post author feedback: I am very impressed by the fits at the bottom of the response. There was some discussion amongst the reviewers concerning the relationship between this and what is known about the actual circuits (e.g., inputs arrive to layers 4 and 5, then from layer 4 signals go to layers 2/3, etc.). It would be useful for the authors to relate this to those facts. Also, we discussed whether your model actually fits the data about the quantity of feedback vs. feedforward connections (as much or more feedback as feedforward). It would be useful to inform the reader as to whether your model accounts for this as well.

architecture, task-driven convolutional recurrent model, visual system, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.78)

Add feedback

A Partial Replication of MaskFormer in TensorFlow on TPUs for the TensorFlow Model Garden

Purohit, Vishal, Jiang, Wenxin, Ravikiran, Akshath R., Davis, James C.

arXiv.org Artificial IntelligenceApr-29-2024

This paper undertakes the task of replicating the MaskFormer model -- a universal image segmentation model -- originally developed using the PyTorch framework, within the TensorFlow ecosystem, specifically optimized for execution on Tensor Processing Units (TPUs). Our implementation exploits the modular constructs available within the TensorFlow Model Garden (TFMG), encompassing elements such as the data loader, training orchestrator, and various architectural components, tailored and adapted to meet the specifications of the MaskFormer model. We address key challenges encountered during the replication, non-convergence issues, slow training, adaptation of loss functions, and the integration of TPU-specific functionalities. We verify our reproduced implementation and present qualitative results on the COCO dataset. Although our implementation meets some of the objectives for end-to-end reproducibility, we encountered challenges in replicating the Py-Torch version of MaskFormer in TensorFlow. This replication process is not straightforward and requires substantial engineering efforts.

architecture, implementation, maskformer, (15 more...)

arXiv.org Artificial Intelligence

2404.18801

Country: North America > United States (0.14)

Genre:

Workflow (1.00)
Research Report (0.82)

Industry: Information Technology > Services (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Exploration of TPUs for AI Applications

Carrión, Diego Sanmartín, Prohaska, Vera

arXiv.org Artificial IntelligenceNov-14-2023

Tensor Processing Units (TPUs) are specialized hardware accelerators for deep learning developed by Google. This paper aims to explore TPUs in cloud and edge computing focusing on its applications in AI. We provide an overview of TPUs, their general architecture, specifically their design in relation to neural networks, compilation techniques and supporting frameworks. Furthermore, we provide a comparative analysis of Cloud and Edge TPU performance against other counterpart chip architectures. Our results show that TPUs can provide significant performance improvements in both cloud and edge computing. Additionally, this paper underscores the imperative need for further research in optimization techniques for efficient deployment of AI architectures on the Edge TPU and benchmarking standards for a more robust comparative analysis in edge computing scenarios. The primary motivation behind this push for research is that efficient AI acceleration, facilitated by TPUs, can lead to substantial savings in terms of time, money, and environmental resources.

architecture, retrieved, tpus, (15 more...)

arXiv.org Artificial Intelligence

2309.08918

Country: Europe > Spain > Galicia > Madrid (0.04)

Genre:

Overview (1.00)
Research Report > New Finding (0.54)

Industry: Information Technology > Services (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

HUGE: Huge Unsupervised Graph Embeddings with TPUs

Mayer, Brandon, Tsitsulin, Anton, Fichtenberger, Hendrik, Halcrow, Jonathan, Perozzi, Bryan

arXiv.org Artificial IntelligenceJul-26-2023

Graphs are a representation of structured data that captures the relationships between sets of objects. With the ubiquity of available network data, there is increasing industrial and academic need to quickly analyze graphs with billions of nodes and trillions of edges. A common first step for network understanding is Graph Embedding, the process of creating a continuous representation of nodes in a graph. A continuous representation is often more amenable, especially at scale, for solving downstream machine learning tasks such as classification, link prediction, and clustering. A high-performance graph embedding architecture leveraging Tensor Processing Units (TPUs) with configurable amounts of high-bandwidth memory is presented that simplifies the graph embedding problem and can scale to graphs with billions of nodes Figure 1: HUGE can learn representations on extremely large and trillions of edges. We verify the embedding space quality on graphs (billions of nodes) at Google.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3580305.3599840

2307.1449

Country:

North America > United States > California > Los Angeles County > Long Beach (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > Switzerland (0.04)
Europe > France (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Services (0.49)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.47)

Add feedback