signal processing

Signal processing is key to embedded Machine Learning

When we hear about machine learning - whether it's about machines learning to play Go, or computers generating plausible human language - we often think about deep learning. Lots of unstructured data gets thrown in a complex neural network with billions of parameters, and after a very expensive training stage the model learns the task at hand. But this is not always a desirable approach. One of the most interesting places where we can run machine learning is on embedded or IoT devices. These devices already handle a vast amount of high-resolution sensor data, but often need to send the sensor data to the cloud to get analyzed.

Graphon Pooling in Graph Neural Networks

Graph neural networks (GNNs) have been used effectively in different applications involving the processing of signals on irregular structures modeled by graphs. Relying on the use of shift-invariant graph filters, GNNs extend the operation of convolution to graphs. However, the operations of pooling and sampling are still not clearly defined and the approaches proposed in the literature either modify the graph structure in a way that does not preserve its spectral properties, or require defining a policy for selecting which nodes to keep. In this work, we propose a new strategy for pooling and sampling on GNNs using graphons which preserves the spectral properties of the graph. To do so, we consider the graph layers in a GNN as elements of a sequence of graphs that converge to a graphon. In this way we have no ambiguity in the node labeling when mapping signals from one layer to the other and a spectral representation that is consistent throughout the layers. We evaluate this strategy in a synthetic and a real-world numerical experiment where we show that graphon pooling GNNs are less prone to overfitting and improve upon other pooling techniques, especially when the dimensionality reduction ratios between layers is large.

Continuous Silent Speech Recognition using EEG

In this paper we explore continuous silent speech recognition using electroencephalography (EEG) signals. We implemented a connectionist temporal classification (CTC) automatic speech recognition (ASR) model to translate EEG signals recorded in parallel while subjects were reading English sentences in their mind without producing any voice to text. Our results demonstrate the feasibility of using EEG signals for performing continuous silent speech recognition. We demonstrate our results for a limited English vocabulary consisting of 30 unique sentences.

Voice Separation with an Unknown Number of Multiple Speakers

We present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously. The new method employs gated neural networks that are trained to separate the voices at multiple processing steps, while maintaining the speaker in each output channel fixed. A different model is trained for every number of possible speakers, and a the model with the largest number of speakers is employed to select the actual number of speakers in a given sample. Our method greatly outperforms the current state of the art, which, as we show, is not competitive for more than two speakers.

XMOS claims to lower AI cost and raise the performance bar - SmartCitiesElectronics.com

Artificial intelligence (AI), digital signal processing (DSP), control and I/O are delivered in a single device, the xcoer.ai, Single device prices for the xcore.ai It can be used by electronics manufacturers to integrate high-performance processing and intelligence economically into products. According to XMOS, xcore.ai is a new generation of embedded platform. It has fast processing and neural network capabilities to enable data to be processed locally and actions taken on-device within nanoseconds.

Multi-frequency calibration for DOA estimation with distributed sensors

Calibration and Direction-of-Arrival (DoA) estimation is a major issue in array processing [2, 3]. The latter has been studied in several applications, e.g., radar, sonar, satellite, wireless communication and radio interferometric systems [4, 5], where we commonly use largely distributed sensors elements aiming to achieve high resolution. In all these sensor network applications, calibration is required as some parameters are not exactly known due to imperfect instrumentation or propagation conditions [6]. Let us note that calibration algorithms are distinguished by the presence [7] or absence [8] of one or more cooperative sources, named calibrator sources.

DIHARD II is Still Hard: Experimental Results and Discussions from the DKU-LENOVO Team

In this paper, we present the submitted system for the second DIHARD Speech Diarization Challenge from the DKULENOVO team. Our diarization system includes multiple modules, namely voice activity detection (VAD), segmentation, speaker embedding extraction, similarity scoring, clustering, resegmentation and overlap detection. For each module, we explore different techniques to enhance performance. Our final submission employs the ResNet-LSTM based VAD, the Deep ResNet based speaker embedding, the LSTM based similarity scoring and spectral clustering. Variational Bayes (VB) diarization is applied in the resegmentation stage and overlap detection also brings slight improvement. Our proposed system achieves 18.84% DER in Track1 and 27.90% DER in Track2. Although our systems have reduced the DERs by 27.5% and 31.7% relatively against the official baselines, we believe that the diarization task is still very difficult.

Action-Manipulation Attacks Against Stochastic Bandits: Attacks and Defense

Due to the broad range of applications of stochastic multi-armed bandit model, understanding the effects of adversarial attacks and designing bandit algorithms robust to attacks are essential for the safe applications of this model. In this paper, we introduce a new class of attack named action-manipulation attack. In this attack, an adversary can change the action signal selected by the user. We show that without knowledge of mean rewards of arms, our proposed attack can manipulate Upper Confidence Bound (UCB) algorithm, a widely used bandit algorithm, into pulling a target arm very frequently by spending only logarithmic cost. To defend against this class of attacks, we introduce a novel algorithm that is robust to action-manipulation attacks when an upper bound for the total attack cost is given. We prove that our algorithm has a pseudo-regret upper bounded by $\mathcal{O}(\max\{\log T,A\})$, where $T$ is the total number of rounds and $A$ is the upper bound of the total attack cost.

Efficient Trainable Front-Ends for Neural Speech Enhancement

Many neural speech enhancement and source separation systems operate in the time-frequency domain. Such models often benefit from making their Short-Time Fourier Transform (STFT) front-ends trainable. In current literature, these are implemented as large Discrete Fourier Transform matrices; which are prohibitively inefficient for low-compute systems. We present an efficient, trainable front-end based on the butterfly mechanism to compute the Fast Fourier Transform, and show its accuracy and efficiency benefits for low-compute neural speech enhancement models. We also explore the effects of making the STFT window trainable.

Hands-On Machine Learning on Google Cloud Platform: Implementing smart and efficient analytics using Cloud ML Engine: Giuseppe Ciaburro, V Kishore Ayyadevara, Alexis Perrier: 9781788393485: Amazon.com: Books

Alexis Perrier is a data science consultant with a background in signal processing and stochastic algorithms. A former Parisian, Alexis is now actively involved in the D.C. data science community as an instructor, blogger, and presenter. Alexis is also an avid jazz and classical music fan, a book lover and proud owner of a real chalk blackboard on which he regularly tries to share his fascination with mathematical equations with his 3 children. He holds a Master in Mathematics from Université Pierre et Marie Curie Paris VI, a Ph.D. in Signal Processing from Telecom ParisTech and currently resides in Washington D.C. Giuseppe Ciaburro holds a PhD in environmental technical physics and two master's degrees. His research was focused on machine learning applications in the study of the urban sound environments.