AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-11-2026, 01:33:34 GMT

42c40aff7814e9796266e12053b1c610-Paper-Conference.pdf

portability, pytorch, tensorflow, (15 more...)

Country:

North America > United States > Michigan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Maryland > Montgomery County > Rockville (0.04)

Genre: Research Report (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.77)

Neural Information Processing SystemsOct-8-2025, 13:48:03 GMT

42c40aff7814e9796266e12053b1c610-Supplemental-Conference.pdf

artificial intelligence, machine learning, torch, (17 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsOct-8-2025, 13:47:59 GMT

The Grand Illusion: The Myth of Software Portability and Implications for ML Progress. Fraser Mince

In this work, we ask: How portable are popular ML software frameworks?

artificial intelligence, machine learning, portability, (18 more...)

Country:

North America > United States > Michigan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Maryland > Montgomery County > Rockville (0.04)

Genre: Research Report (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.77)

Reichlin, Neil, Baumann, Nicolas, Ghignone, Edoardo, Magno, Michele

TinyCenterSpeed: Efficient Center-Based Object Detection for Autonomous Racing

arXiv.org Artificial IntelligenceApr-14-2025

Perception within autonomous driving is nearly synonymous with Neural Networks (NNs). Yet, the domain of autonomous racing is often characterized by scaled, computationally limited robots used for cost-effectiveness and safety. For this reason, opponent detection and tracking systems typically resort to traditional computer vision techniques due to computational constraints. This paper introduces TinyCenterSpeed, a streamlined adaptation of the seminal CenterPoint method, optimized for real-time performance on 1:10 scale autonomous racing platforms. This adaptation is viable even on OBCs powered solely by Central Processing Units (CPUs), as it incorporates the use of an external Tensor Processing Unit (TPU). We demonstrate that, compared to Adaptive Breakpoint Detector (ABD), the current State-of-the-Art (SotA) in scaled autonomous racing, TinyCenterSpeed not only improves detection and velocity estimation by up to 61.38% but also supports multi-opponent detection and estimation. It achieves real-time performance with an inference time of just 7.88 ms on the TPU, significantly reducing CPU utilization 8.3-fold.

artificial intelligence, machine learning, tinycenterspeed, (19 more...)

2504.08655

Genre: Research Report (0.64)

Industry:

Transportation (0.49)
Energy (0.49)
Information Technology (0.35)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Elbtity, Mohammed, Chandarana, Peyton, Zand, Ramtin

Flex-TPU: A Flexible TPU with Runtime Reconfigurable Dataflow Architecture

arXiv.org Artificial IntelligenceJul-11-2024

Tensor processing units (TPUs) are one of the most well-known machine learning (ML) accelerators utilized at large scale in data centers as well as in tiny ML applications. TPUs offer several improvements and advantages over conventional ML accelerators, like graphical processing units (GPUs), being designed specifically to perform the multiply-accumulate (MAC) operations required in the matrix-matrix and matrix-vector multiplies extensively present throughout the execution of deep neural networks (DNNs). Such improvements include maximizing data reuse and minimizing data transfer by leveraging the temporal dataflow paradigms provided by the systolic array architecture. While this design provides a significant performance benefit, the current implementations are restricted to a single dataflow consisting of either input, output, or weight stationary architectures. This can limit the achievable performance of DNN inference and reduce the utilization of compute units. Therefore, the work herein consists of developing a reconfigurable dataflow TPU, called the Flex-TPU, which can dynamically change the dataflow per layer during run-time. Our experiments thoroughly test the viability of the Flex-TPU comparing it to conventional TPU designs across multiple well-known ML workloads. The results show that our Flex-TPU design achieves a significant performance increase of up to 2.75x compared to conventional TPU, with only minor area and power overheads.

architecture, dataflow, systolic array, (16 more...)

2407.087

Country:

North America > United States > South Carolina > Richland County > Columbia (0.14)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Services (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Pan, Zhixin, Mishra, Prabhat

Hardware Acceleration of Explainable Artificial Intelligence

arXiv.org Artificial IntelligenceMay-4-2023

Machine learning (ML) is successful in achieving human-level artificial intelligence in various fields. However, it lacks the ability to explain an outcome due to its black-box nature. While recent efforts on explainable AI (XAI) has received significant attention, most of the existing solutions are not applicable in real-time systems since they map interpretability as an optimization problem, which leads to numerous iterations of time-consuming complex computations. Although there are existing hardware-based acceleration framework for XAI, they are implemented through FPGA and designed for specific tasks, leading to expensive cost and lack of flexibility. In this paper, we propose a simple yet efficient framework to accelerate various XAI algorithms with existing hardware accelerators. Specifically, this paper makes three important contributions. (1) The proposed method is the first attempt in exploring the effectiveness of Tensor Processing Unit (TPU) to accelerate XAI. (2) Our proposed solution explores the close relationship between several existing XAI algorithms with matrix computations, and exploits the synergy between convolution and Fourier transform, which takes full advantage of TPU's inherent ability in accelerating matrix computations. (3) Our proposed approach can lead to real-time outcome interpretation. Extensive experimental evaluation demonstrates that proposed approach deployed on TPU can provide drastic improvement in interpretation time (39x on average) as well as energy efficiency (69x on average) compared to existing acceleration techniques.

artificial intelligence, machine learning, natural language, (19 more...)

2305.04887

Country:

North America > United States > Florida > Alachua County > Gainesville (0.14)
North America > United States > California (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceFeb-9-2023

TPU-MLIR: A Compiler For TPU Using MLIR

Hu, Pengchao, Lu, Man, Wang, Lei, Jiang, Guoyue

Multi-level intermediate representations (MLIR) show great promise for reducing the cost of building domain-specific compilers by providing a reusable and extensible compiler infrastructure. This work presents TPU-MLIR, an end-to-end compiler based on MLIR that deploys pre-trained neural network (NN) models to a custom ASIC called a Tensor Processing Unit (TPU). TPU-MLIR defines two new dialects to implement its functionality: 1. a Tensor operation (TOP) dialect that encodes the deep learning graph semantics and independent of the deep learning framework and 2. a TPU kernel dialect to provide a standard kernel computation on TPU. A NN model is translated to the TOP dialect and then lowered to the TPU dialect for different TPUs according to the chip's configuration. We demonstrate how to use the MLIR pass pipeline to organize and perform optimization on TPU to generate machine code. The paper also presents a verification procedure to ensure the correctness of each transform stage.

artificial intelligence, machine learning, opération, (14 more...)

2210.15016

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceOct-6-2022, 18:46:29 GMT

Why Google's new AI chip is a big deal

The Google team has developed a new AI model that can design complex chips in just hours. This is an incredibly difficult task and usually takes months for human engineers to accomplish. Let's look into what this new artificial intelligence microchip is and the potential impact it could make in the technology industry. A microchip is a small electronic device that controls and stores electronic data. It is made up of a silicon chip that has been fabricated into a very small size.

artificial intelligence model, google, new ai chip, (11 more...)

#artificialintelligence

Industry:

Semiconductors & Electronics (0.73)
Information Technology (0.53)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.55)

#artificialintelligenceFeb-3-2022, 19:40:44 GMT

Update Alert: TensorFlow 2.8

Google released TensorFlow 2.8 yesterday which adds a few major features and improvements, and a lot of bug fixes and security updates. The main focus of this release is to extend the functionality of TensorFlow Lite. Highlights include more TFLite support for TensorFlow operations; experimental API that Configures TensorFlow ops to run deterministically; PluggableDevice architecture which offers a plugin mechanism for registering devices with TensorFlow without the need to make changes in TensorFlow code; and more. You can view the full list of changes on the TensorFlow GitHub page (and download and install the latest version): TensorFlow 2.8.0 Let's take a closer look at some of these features. TensorFlow Lite (TFLite) is an open source framework included with TensorFlow (essentially a lightweight version of TensorFlow), and is intended for mobile and IoT devices.

determinism, tensorflow 2, update alert, (5 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)