real-time inference
Real-Time Inference for Distributed Multimodal Systems under Communication Delay Uncertainty
Croisfelt, Victor, de Souza, João Henrique Inacio, Pandey, Shashi Raj, Soret, Beatriz, Popovski, Petar
Connected cyber-physical systems perform inference based on real-time inputs from multiple data streams. Uncertain communication delays across data streams challenge the temporal flow of the inference process. State-of-the-art (SotA) non-blocking inference methods rely on a reference-modality paradigm, requiring one modality input to be fully received before processing, while depending on costly offline profiling. We propose a novel, neuro-inspired non-blocking inference paradigm that primarily employs adaptive temporal windows of integration (TWIs) to dynamically adjust to stochastic delay patterns across heterogeneous streams while relaxing the reference-modality requirement. Our communication-delay-aware framework achieves robust real-time inference with finer-grained control over the accuracy-latency tradeoff. Experiments on the audio-visual event localization (AVEL) task demonstrate superior adaptability to network dynamics compared to SotA approaches.
Real-Time Inference for a Gamma Process Model of Neural Spiking
With simultaneous measurements from ever increasing populations of neurons, there is a growing need for sophisticated tools to recover signals from individual neurons. In electrophysiology experiments, this classically proceeds in a two-step process: (i) threshold the waveforms to detect putative spikes and (ii) cluster the waveforms into single units (neurons). We extend previous Bayesian nonparamet- ric models of neural spiking to jointly detect and cluster neurons using a Gamma process model. Importantly, we develop an online approximate inference scheme enabling real-time analysis, with performance exceeding the previous state-of-the- art. Via exploratory data analysis--using data with partial ground truth as well as two novel data sets--we find several features of our model collectively contribute to our improved performance including: (i) accounting for colored noise, (ii) de- tecting overlapping spikes, (iii) tracking waveform dynamics, and (iv) using mul- tiple channels.
QUART-Online: Latency-Free Large Multimodal Language Model for Quadruped Robot Learning
Tong, Xinyang, Ding, Pengxiang, Wang, Donglin, Zhang, Wenjie, Cui, Can, Sun, Mingyang, Fan, Yiguo, Zhao, Han, Zhang, Hongyin, Dang, Yonghao, Huang, Siteng, Lyu, Shangke
This paper addresses the inherent inference latency challenges associated with deploying multimodal large language models (MLLM) in quadruped vision-language-action (QUAR-VLA) tasks. Our investigation reveals that conventional parameter reduction techniques ultimately impair the performance of the language foundation model during the action instruction tuning phase, making them unsuitable for this purpose. We introduce a novel latency-free quadruped MLLM model, dubbed QUART-Online, designed to enhance inference efficiency without degrading the performance of the language foundation model. By incorporating Action Chunk Discretization (ACD), we compress the original action representation space, mapping continuous action values onto a smaller set of discrete representative vectors while preserving critical information. Subsequently, we fine-tune the MLLM to integrate vision, language, and compressed actions into a unified semantic space. Experimental results demonstrate that QUART-Online operates in tandem with the existing MLLM system, achieving real-time inference in sync with the underlying controller frequency, significantly boosting the success rate across various tasks by 65%. Our project page is https://quart-online.github.io.
Integrating Edge-AI in Structural Health Monitoring domain
Mishra, Anoop, Gangisetti, Gopinath, Khazanchi, Deepak
Structural health monitoring (SHM) tasks like damage detection are crucial for decision-making regarding maintenance and deterioration. For example, crack detection in SHM is crucial for bridge maintenance as crack progression can lead to structural instability. However, most AI/ML models in the literature have low latency and late inference time issues while performing in real-time environments. This study aims to explore the integration of edge-AI in the SHM domain for real-time bridge inspections. Based on edge-AI literature, its capabilities will be valuable integration for a real-time decision support system in SHM tasks such that real-time inferences can be performed on physical sites. This study will utilize commercial edge-AI platforms, such as Google Coral Dev Board or Kneron KL520, to develop and analyze the effectiveness of edge-AI devices. Thus, this study proposes an edge AI framework for the structural health monitoring domain. An edge-AI-compatible deep learning model is developed to validate the framework to perform real-time crack classification. The effectiveness of this model will be evaluated based on its accuracy, the confusion matrix generated, and the inference time observed in a real-time setting.
Real-Time Inference for a Gamma Process Model of Neural Spiking
With simultaneous measurements from ever increasing populations of neurons, there is a growing need for sophisticated tools to recover signals from individual neurons. In electrophysiology experiments, this classically proceeds in a two-step process: (i) threshold the waveforms to detect putative spikes and (ii) cluster the waveforms into single units (neurons). We extend previous Bayesian nonparamet- ric models of neural spiking to jointly detect and cluster neurons using a Gamma process model. Importantly, we develop an online approximate inference scheme enabling real-time analysis, with performance exceeding the previous state-of-the- art. Via exploratory data analysis--using data with partial ground truth as well as two novel data sets--we find several features of our model collectively contribute to our improved performance including: (i) accounting for colored noise, (ii) de- tecting overlapping spikes, (iii) tracking waveform dynamics, and (iv) using mul- tiple channels.
Real-time Health Monitoring of Heat Exchangers using Hypernetworks and PINNs
Majumdar, Ritam, Jadhav, Vishal, Deodhar, Anirudh, Karande, Shirish, Vig, Lovekesh, Runkana, Venkataramana
We demonstrate a Physics-informed Neural Network (PINN) based model for real-time health monitoring of a heat exchanger, that plays a critical role in improving energy efficiency of thermal power plants. A hypernetwork based approach is used to enable the domain-decomposed PINN learn the thermal behavior of the heat exchanger in response to dynamic boundary conditions, eliminating the need to re-train. As a result, we achieve orders of magnitude reduction in inference time in comparison to existing PINNs, while maintaining the accuracy on par with the physics-based simulations. This makes the approach very attractive for predictive maintenance of the heat exchanger in digital twin environments.
CPU Real-time Face Detection With Python
Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. Is it possible to implement real-time performance object detection models without a GPU? MediaPipe face detection is a proof of concept that makes it possible to run single-class face detection in real-time on almost any CPU.
BRIGHT -- Graph Neural Networks in Real-Time Fraud Detection
Lu, Mingxuan, Han, Zhichao, Rao, Susie Xi, Zhang, Zitao, Zhao, Yang, Shan, Yinan, Raghunathan, Ramesh, Zhang, Ce, Jiang, Jiawei
Detecting fraudulent transactions is an essential component to control risk in e-commerce marketplaces. Apart from rule-based and machine learning filters that are already deployed in production, we want to enable efficient real-time inference with graph neural networks (GNNs), which is useful to catch multihop risk propagation in a transaction graph. However, two challenges arise in the implementation of GNNs in production. First, future information in a dynamic graph should not be considered in message passing to predict the past. Second, the latency of graph query and GNN model inference is usually up to hundreds of milliseconds, which is costly for some critical online services. To tackle these challenges, we propose a Batch and Real-time Inception GrapH Topology (BRIGHT) framework to conduct an end-to-end GNN learning that allows efficient online real-time inference. BRIGHT framework consists of a graph transformation module (Two-Stage Directed Graph) and a corresponding GNN architecture (Lambda Neural Network). The Two-Stage Directed Graph guarantees that the information passed through neighbors is only from the historical payment transactions. It consists of two subgraphs representing historical relationships and real-time links, respectively. The Lambda Neural Network decouples inference into two stages: batch inference of entity embeddings and real-time inference of transaction prediction. Our experiments show that BRIGHT outperforms the baseline models by >2\% in average w.r.t.~precision. Furthermore, BRIGHT is computationally efficient for real-time fraud detection. Regarding end-to-end performance (including neighbor query and inference), BRIGHT can reduce the P99 latency by >75\%. For the inference stage, our speedup is on average 7.8$\times$ compared to the traditional GNN.
How to Start a Career in AI
How do I start a career as a deep learning engineer? What are some of the key tools and frameworks used in AI? How do I learn more about ethics in AI? Everyone has questions, but the most common questions in AI always return to this: how do I get involved? Cutting through the hype to share fundamental principles for building a career in AI, a group of AI professionals gathered at NVIDIA's GTC conference in the spring offered what may be the best place to start. Each panelist, in a conversation with NVIDIA's Louis Stewart, head of strategic initiatives for the developer ecosystem, came to the industry from very different places. But the speakers -- Katie Kallot, NVIDIA's former head of global developer relations and emerging areas; David Ajoku, founder of startup aware.ai;
Design Patterns in Machine Learning Code and Systems
Design patterns are not just a way to structure code. They also communicate the problem addressed and how the code or component is intended to be used. Here are some patterns I've observed in machine learning code and systems, mostly from the Gang of Four design patterns book. Most developers have some familiarity with these patterns and having a basic understanding provides a shared vocabulary to discuss ideas on design and implementation. The factory pattern decouples objects, such as training data, from how they are created.