AITopics | Liu, Shih-Chii

Collaborating Authors

Liu, Shih-Chii

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Modulating State Space Model with SlowFast Framework for Compute-Efficient Ultra Low-Latency Speech Enhancement

Cheng, Longbiao, Pandey, Ashutosh, Xu, Buye, Delbruck, Tobi, Ithapu, Vamsi Krishna, Liu, Shih-Chii

arXiv.org Artificial IntelligenceJan-4-2025

Deep learning-based speech enhancement (SE) methods often face significant computational challenges when needing to meet low-latency requirements because of the increased number of frames to be processed. This paper introduces the SlowFast framework which aims to reduce computation costs specifically when low-latency enhancement is needed. The framework consists of a slow branch that analyzes the acoustic environment at a low frame rate, and a fast branch that performs SE in the time domain at the needed higher frame rate to match the required latency. Specifically, the fast branch employs a state space model where its state transition process is dynamically modulated by the slow branch. Experiments on a SE task with a 2 ms algorithmic latency requirement using the Voice Bank + Demand dataset show that our approach reduces computation cost by 70% compared to a baseline single-branch network with equivalent parameters, without compromising enhancement performance. Furthermore, by leveraging the SlowFast framework, we implemented a network that achieves an algorithmic latency of just 62.5 {\mu}s (one sample point at 16 kHz sample rate) with a computation cost of 100 M MACs/s, while scoring a PESQ-NB of 3.12 and SISNR of 16.62.

artificial intelligence, machine learning, slow branch, (14 more...)

arXiv.org Artificial Intelligence

2411.02019

Country: Europe > Switzerland (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Leveraging Recurrent Neural Networks for Predicting Motor Movements from Primate Motor Cortex Neural Recordings

Wang, Yuanxi, Wang, Zuowen, Liu, Shih-Chii

arXiv.org Artificial IntelligenceNov-1-2024

This paper presents an efficient deep learning solution for decoding motor movements from neural recordings in non-human primates. An Autoencoder Gated Recurrent Unit (AEGRU) model was adopted as the model architecture for this task. The autoencoder is only used during the training stage to achieve better generalization. Together with the preprocessing techniques, our model achieved 0.71 $R^2$ score, surpassing the baseline models in Neurobench and is ranked first for $R^2$ in the IEEE BioCAS 2024 Grand Challenge on Neural Decoding. Model pruning is also applied leading to a reduction of 41.4% of the multiply-accumulate (MAC) operations with little change in the $R^2$ score compared to the unpruned model.

artificial intelligence, gru layer, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2410.22283

Country: Europe > Switzerland (0.29)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Text-to-Events: Synthetic Event Camera Streams from Conditional Text Input

Ott, Joachim, Wang, Zuowen, Liu, Shih-Chii

arXiv.org Artificial IntelligenceJun-5-2024

Event cameras are advantageous for tasks that require vision sensors with low-latency and sparse output responses. However, the development of deep network algorithms using event cameras has been slow because of the lack of large labelled event camera datasets for network training. This paper reports a method for creating new labelled event datasets by using a text-to-X model, where X is one or multiple output modalities, in the case of this work, events. Our proposed text-to-events model produces synthetic event frames directly from text prompts. It uses an autoencoder which is trained to produce sparse event frames representing event camera outputs. By combining the pretrained autoencoder with a diffusion model architecture, the new text-to-events model is able to generate smooth synthetic event streams of moving objects. The autoencoder was first trained on an event camera dataset of diverse scenes. In the combined training with the diffusion model, the DVS gesture dataset was used. We demonstrate that the model can generate realistic event sequences of human gestures prompted by different text statements. The classification accuracy of the generated sequences, using a classifier trained on the real dataset, ranges between 42% to 92%, depending on the gesture group. The results demonstrate the capability of this method in synthesizing event datasets.

artificial intelligence, autoencoder, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2406.03439

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Event-Based Eye Tracking. AIS 2024 Challenge Survey

Wang, Zuowen, Gao, Chang, Wu, Zongwei, Conde, Marcos V., Timofte, Radu, Liu, Shih-Chii, Chen, Qinyu, Zha, Zheng-jun, Zhai, Wei, Han, Han, Liao, Bohao, Wu, Yuliang, Wan, Zengyu, Wang, Zhong, Cao, Yang, Tan, Ganchao, Chen, Jinze, Pei, Yan Ru, Brüers, Sasskia, Crouzet, Sébastien, McLelland, Douglas, Coenen, Oliver, Zhang, Baoheng, Gao, Yizhao, Li, Jingyuan, So, Hayden Kwok-Hay, Bich, Philippe, Boretti, Chiara, Prono, Luciano, Lică, Mircea, Dinucu-Jianu, David, Grîu, Cătălin, Lin, Xiaopeng, Ren, Hongwei, Cheng, Bojun, Zhang, Xinan, Vial, Valentin, Yezzi, Anthony, Tsai, James

arXiv.org Artificial IntelligenceApr-17-2024

This survey reviews the AIS 2024 Event-Based Eye Tracking (EET) Challenge. The task of the challenge focuses on processing eye movement recorded with event cameras and predicting the pupil center of the eye. The challenge emphasizes efficient eye tracking with event cameras to achieve good task accuracy and efficiency trade-off. During the challenge period, 38 participants registered for the Kaggle competition, and 8 teams submitted a challenge factsheet. The novel and diverse methods from the submitted factsheets are reviewed and analyzed in this survey to advance future event-based eye tracking research.

artificial intelligence, machine learning, representation, (20 more...)

arXiv.org Artificial Intelligence

2404.1177

Country:

Asia > China (0.28)
Europe > Netherlands > South Holland (0.14)

Genre: Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Information Technology (0.68)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Epilepsy Seizure Detection and Prediction using an Approximate Spiking Convolutional Transformer

Chen, Qinyu, Sun, Congyi, Gao, Chang, Liu, Shih-Chii

arXiv.org Artificial IntelligenceJan-21-2024

Epilepsy is a common disease of the nervous system. Timely prediction of seizures and intervention treatment can significantly reduce the accidental injury of patients and protect the life and health of patients. This paper presents a neuromorphic Spiking Convolutional Transformer, named Spiking Conformer, to detect and predict epileptic seizure segments from scalped long-term electroencephalogram (EEG) recordings. We report evaluation results from the Spiking Conformer model using the Boston Children's Hospital-MIT (CHB-MIT) EEG dataset. By leveraging spike-based addition operations, the Spiking Conformer significantly reduces the classification computational cost compared to the non-spiking model. Additionally, we introduce an approximate spiking neuron layer to further reduce spike-triggered neuron updates by nearly 38% without sacrificing accuracy. Using raw EEG data as input, the proposed Spiking Conformer achieved an average sensitivity rate of 94.9% and a specificity rate of 99.3% for the seizure detection task, and 96.8%, 89.5% for the seizure prediction task, and needs >10x fewer operations compared to the non-spiking equivalent model.

artificial intelligence, machine learning, seizure detection, (14 more...)

arXiv.org Artificial Intelligence

2402.09424

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area > Neurology > Epilepsy (1.00)

Technology:

Information Technology > Data Science (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Exploiting Symmetric Temporally Sparse BPTT for Efficient RNN Training

Chen, Xi, Gao, Chang, Wang, Zuowen, Cheng, Longbiao, Zhou, Sheng, Liu, Shih-Chii, Delbruck, Tobi

arXiv.org Artificial IntelligenceDec-14-2023

Recurrent Neural Networks (RNNs) are useful in temporal sequence tasks. However, training RNNs involves dense matrix multiplications which require hardware that can support a large number of arithmetic operations and memory accesses. Implementing online training of RNNs on the edge calls for optimized algorithms for an efficient deployment on hardware. Inspired by the spiking neuron model, the Delta RNN exploits temporal sparsity during inference by skipping over the update of hidden states from those inactivated neurons whose change of activation across two timesteps is below a defined threshold. This work describes a training algorithm for Delta RNNs that exploits temporal sparsity in the backward propagation phase to reduce computational requirements for training on the edge. Due to the symmetric computation graphs of forward and backward propagation during training, the gradient computation of inactivated neurons can be skipped. Results show a reduction of $\sim$80% in matrix operations for training a 56k parameter Delta LSTM on the Fluent Speech Commands dataset with negligible accuracy loss. Logic simulations of a hardware accelerator designed for the training algorithm show 2-10X speedup in matrix computations for an activation sparsity range of 50%-90%. Additionally, we show that the proposed Delta RNN training will be useful for online incremental learning on edge devices with limited computing resources.

artificial intelligence, machine learning, sparsity, (17 more...)

arXiv.org Artificial Intelligence

2312.09391

Country: Europe (0.28)

Genre: Research Report > New Finding (0.48)

Industry:

Education > Educational Setting > Online (0.54)
Energy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

3ET: Efficient Event-based Eye Tracking using a Change-Based ConvLSTM Network

Chen, Qinyu, Wang, Zuowen, Liu, Shih-Chii, Gao, Chang

arXiv.org Artificial IntelligenceAug-22-2023

Abstract--This paper presents a sparse Change-Based Convolutional Long Short-Term Memory (CB-ConvLSTM) model for event-based eye tracking, key for next-generation wearable healthcare technology such as AR/VR headsets. Utilizing a delta-encoded recurrent path enhancing activation sparsity, CB-ConvLSTM reduces arithmetic operations by approximately 4.7 without losing accuracy when HE process of eye movements often reveals our mental processes and comprehension of the visual realm. Implementing eye tracking technology offers many possibilities in Eye tracking is a significant field in computer vision [8]- augmented reality/virtual reality (AR/VR) domains, enabling [10], yet it's relatively unexplored with event cameras due to techniques like foveated rendering to offer a more compelling the scarcity of relevant event-based datasets [11], [12]. Eye tracking has common approaches guide recent advances in event-based eye potential benefits in wearable healthcare applications. For tracking algorithms, mirroring those of traditional computer instance, it can aid in identifying eye movement disorders associated vision: (1) The 3D model-based method locates key points with diseases like Parkinson's or Alzheimer's, thereby corresponding to the image's geometrical features and fits enabling early diagnosis and regular assessments [3], [4].

artificial intelligence, machine learning, sparsity, (16 more...)

arXiv.org Artificial Intelligence

2308.11771

Country: Europe > Switzerland (0.46)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.89)

Add feedback

Biologically-Inspired Continual Learning of Human Motion Sequences

Ott, Joachim, Liu, Shih-Chii

arXiv.org Artificial IntelligenceMar-15-2023

This work proposes a model for continual learning on tasks involving temporal sequences, specifically, human motions. It improves on a recently proposed brain-inspired replay model (BI-R) by building a biologically-inspired conditional temporal variational autoencoder (BI-CTVAE), which instantiates a latent mixture-of-Gaussians for class representation. We investigate a novel continual-learning-to-generate (CL2Gen) scenario where the model generates motion sequences of different classes. The generative accuracy of the model is tested over a set of tasks. The final classification accuracy of BI-CTVAE on a human motion dataset after sequentially learning all action classes is 78%, which is 63% higher than using no-replay, and only 5.4% lower than a state-of-the-art offline trained GRU model.

artificial intelligence, evolutionary algorithm, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2211.05231

Country: Europe > Switzerland (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.61)

Add feedback

Spartus: A 9.4 TOp/s FPGA-based LSTM Accelerator Exploiting Spatio-temporal Sparsity

Gao, Chang, Delbruck, Tobi, Liu, Shih-Chii

arXiv.org Artificial IntelligenceAug-20-2021

Long Short-Term Memory (LSTM) recurrent networks are frequently used for tasks involving time-sequential data such as speech recognition. However, it is difficult to deploy these networks on hardware to achieve high throughput and low latency because the fully connected structure makes LSTM networks a memory-bounded algorithm. Previous LSTM accelerators either exploited weight spatial sparsity or temporal activation sparsity. This paper proposes a new accelerator called "Spartus" that exploits spatio-temporal sparsity to achieve ultra-low latency inference. The spatial sparsity is induced using our proposed pruning method called Column-Balanced Targeted Dropout (CBTD), which structures sparse weight matrices for balanced workload. It achieved up to 96% weight sparsity with negligible accuracy difference for an LSTM network trained on a TIMIT phone recognition task. To induce temporal sparsity in LSTM, we create the DeltaLSTM by extending the previous DeltaGRU method to the LSTM network. This combined sparsity simultaneously saves on the weight memory access and associated arithmetic operations. Spartus was implemented on a Xilinx Zynq-7100 FPGA. The Spartus per-sample latency for a single DeltaLSTM layer of 1024 neurons averages 1 us. Spartus achieved 9.4 TOp/s effective batch-1 throughput and 1.1 TOp/J energy efficiency, which, respectively, are 4X and 7X higher than the previous state-of-the-art.

deep learning, neural network, sparsity, (19 more...)

arXiv.org Artificial Intelligence

2108.02297

Country:

Europe > Switzerland > Zürich > Zürich (0.15)
North America > United States > California > Monterey County (0.14)
North America > United States > California > Los Angeles County (0.14)

Genre: Research Report (0.50)

Industry: Semiconductors & Electronics (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reducing state updates via Gaussian-gated LSTMs

Thornton, Matthew, Anumula, Jithendar, Liu, Shih-Chii

arXiv.org Machine LearningJan-22-2019

Recurrent neural networks can be difficult to train on long sequence data due to the well-known vanishing gradient problem. Some architectures incorporate methods to reduce RNN state updates, therefore allowing the network to preserve memory over long temporal intervals. To address these problems of convergence, this paper proposes a timing-gated LSTM RNN model, called the Gaussian-gated LSTM (g-LSTM). The time gate controls when a neuron can be updated during training, enabling longer memory persistence and better error-gradient flow. This model captures long-temporal dependencies better than an LSTM and the time gate parameters can be learned even from non-optimal initialization values. Because the time gate limits the updates of the neuron state, the number of computes needed for the network update is also reduced. By adding a computational budget term to the training loss, we can obtain a network which further reduces the number of computes by at least 10x. Finally, by employing a temporal curriculum learning schedule for the g-LSTM, we can reduce the convergence time of the equivalent LSTM network on long sequences.

deep learning, initialization, neural network, (20 more...)

arXiv.org Machine Learning

1901.07334

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Middle East > Qatar (0.14)

Genre: Research Report (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback