AITopics

2410.1607

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > China (0.14)

Genre:

Overview (0.69)
Research Report (0.64)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

arXiv.org Artificial IntelligenceSep-26-2024

Prototype based Masked Audio Model for Self-Supervised Learning of Sound Event Detection

Cai, Pengfei, Song, Yan, Jiang, Nan, Gu, Qing, McLoughlin, Ian

A significant challenge in sound event detection (SED) is the effective utilization of unlabeled data, given the limited availability of labeled data due to high annotation costs. Semi-supervised algorithms rely on labeled data to learn from unlabeled data, and the performance is constrained by the quality and size of the former. In this paper, we introduce the Prototype based Masked Audio Model~(PMAM) algorithm for self-supervised representation learning in SED, to better exploit unlabeled data. Specifically, semantically rich frame-level pseudo labels are constructed from a Gaussian mixture model (GMM) based prototypical distribution modeling. These pseudo labels supervise the learning of a Transformer-based masked audio model, in which binary cross-entropy loss is employed instead of the widely used InfoNCE loss, to provide independent loss contributions from different prototypes, which is important in real scenarios in which multiple labels may apply to unsupervised data frames. A final stage of fine-tuning with just a small amount of labeled data yields a very high performing SED model. On like-for-like tests using the DESED task, our method achieves a PSDS1 score of 62.5\%, surpassing current state-of-the-art models and demonstrating the superiority of the proposed technique.

artificial intelligence, deep learning, machine learning, (15 more...)

2409.17656

Country: Asia (0.14)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Artificial IntelligenceMar-30-2023

Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition

Gao, Zhifu, Zhang, Shiliang, McLoughlin, Ian, Yan, Zhijie

Transformers have recently dominated the ASR field. Although able to yield good performance, they involve an autoregressive (AR) decoder to generate tokens one by one, which is computationally inefficient. To speed up inference, non-autoregressive (NAR) methods, e.g. single-step NAR, were designed, to enable parallel generation. However, due to an independence assumption within the output tokens, performance of single-step NAR is inferior to that of AR models, especially with a large-scale corpus. There are two challenges to improving single-step NAR: Firstly to accurately predict the number of output tokens and extract hidden variables; secondly, to enhance modeling of interdependence between output tokens. To tackle both challenges, we propose a fast and accurate parallel transformer, termed Paraformer. This utilizes a continuous integrate-and-fire based predictor to predict the number of tokens and generate hidden variables. A glancing language model (GLM) sampler then generates semantic embeddings to enhance the NAR decoder's ability to model context interdependence. Finally, we design a strategy to generate negative samples for minimum word error rate training to further improve performance. Experiments using the public AISHELL-1, AISHELL-2 benchmark, and an industrial-level 20,000 hour task demonstrate that the proposed Paraformer can attain comparable performance to the state-of-the-art AR transformer, with more than 10x speedup.

artificial intelligence, machine learning, natural language, (14 more...)

2206.08317

Country: North America > Canada (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

arXiv.org Artificial IntelligenceFeb-25-2023

A Light-weight Deep Learning Model for Remote Sensing Image Classification

Pham, Lam, Le, Cam, Ngo, Dat, Nguyen, Anh, Lampert, Jasmin, Schindler, Alexander, McLoughlin, Ian

In this paper, we present a high-performance and light-weight deep learning model for Remote Sensing Image Classification (RSIC), the task of identifying the aerial scene of a remote sensing image. To this end, we first valuate various benchmark convolutional neural network (CNN) architectures: MobileNet V1/V2, ResNet 50/151V2, InceptionV3/InceptionResNetV2, EfficientNet B0/B7, DenseNet 121/201, ConNeXt Tiny/Large. Then, the best performing models are selected to train a compact model in a teacher-student arrangement. The knowledge distillation from the teacher aims to achieve high performance with significantly reduced complexity. By conducting extensive experiments on the NWPU-RESISC45 benchmark, our proposed teacher-student models outperforms the state-of-the-art systems, and has potential to be applied on a wide rage of edge devices.

artificial intelligence, classification, machine learning, (18 more...)

2302.13028

Country: Europe > Austria > Vienna (0.15)

Genre: Research Report (0.50)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningJul-30-2019

Towards More Accurate Automatic Sleep Staging via Deep Transfer Learning

Phan, Huy, Chén, Oliver Y., Koch, Philipp, Lu, Zongqing, McLoughlin, Ian, Mertins, Alfred, De Vos, Maarten

Abstract--Although large annotated sleep databases are publicly available, and might be used to train automated scorin g algorithms, it might still be a challenge to develop an optim al algorithm for your personal sleep study, which might have fe w subjects or rely on a different recording setup. Both direct ly applying a learned algorithm or retraining the algorithm on your rather small database is suboptimal. And definitely sta te-of- the-art sleep staging algorithms based on deep neural netwo rks demand a large amount of data to be trained. This work present s a deep transfer learning approach to overcome the channel mismatch problem and enable transferring knowledge from a large dataset to a small cohort for automatic sleep staging. We start from a generic end-to-end deep learning framework for sequence-to-sequence sleep staging and derive two netw orks adhering to this framework as a device for transfer learning . The networks are first trained in the source domain (i.e. the large database). The pretrained networks are then finetuned in the target domain, i.e. the small cohort, to complete knowle dge transfer . We employ the Montreal Archive of Sleep Studies (MASS) database consisting of 200 subjects as the source dom ain and study deep transfer learning on four different target do - mains: the Sleep Cassette subset and the Sleep T elemetry sub set of the Sleep-EDF Expanded database, the Surrey-cEEGGrid database, and the Surrey-PSG database. The target domains are purposely adopted to cover different degrees of channel mismatch to the source domain. Our experimental results sho w significant performance improvement on automatic sleep sta ging on the target domains achieved with the proposed deep transf er learning approach and we discuss the impact of various fine tuning approaches. Index T erms --Automatic sleep staging, sequence-to-sequence, deep learning, transfer learning.

deep learning, neural network, target domain, (20 more...)

1907.13177

Country:

North America > Canada > Quebec > Montreal (0.24)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Sleep (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningApr-6-2019

Spatio-Temporal Attention Pooling for Audio Scene Classification

Phan, Huy, Chén, Oliver Y., Pham, Lam, Koch, Philipp, De Vos, Maarten, McLoughlin, Ian, Mertins, Alfred

Acoustic scenes are rich and redundant in their content. In Given the rich content of acoustic scenes, they typically this work, we present a spatiotemporal attention pooling layer contain a lot of irrelevant and redundant information. This fact coupled with a convolutional recurrent neural network to learn naturally gives rise to the question of how to encourage a deep from patterns that are discriminative while suppressing those learning model to automatically discover and focus on discriminative that are irrelevant for acoustic scene classification. The convolutional patterns and suppress irrelevant ones from the acoustic layers in this network learn invariant features from scenes for better classification. We seek to address that question time-frequency input. The bidirectional recurrent layers are in this work using an attention mechanism [15]. To this end, we then able to encode the temporal dynamics of the resulting convolutional propose a spatiotemporal attention pooling layer in combination features. Afterwards, a two-dimensional attention with a convolutional recurrent neural network (CRNN), inspired mask is formed via the outer product of the spatial and temporal by their success in the audio event detection task [16, 17].

classification, deep learning, neural network, (18 more...)

1904.03543

Country: Europe > United Kingdom > England (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningNov-2-2018

Unifying Isolated and Overlapping Audio Event Detection with Multi-Label Multi-Task Convolutional Recurrent Neural Networks

Phan, Huy, Chén, Oliver Y., Koch, Philipp, Pham, Lam, McLoughlin, Ian, Mertins, Alfred, De Vos, Maarten

We propose a multi-label multi-task framework based on a convolutional recurrent neural network to unify detection of isolated and overlapping audio events. The framework leverages the power of convolutional recurrent neural network architectures; convolutional layers learn effective features over which higher recurrent layers perform sequential modelling. Furthermore, the output layer is designed to handle arbitrary degrees of event overlap. At each time step in the recurrent output sequence, an output triple is dedicated to each event category of interest to jointly model event occurrence and temporal boundaries. That is, the network jointly determines whether an event of this category occurs, and when it occurs, by estimating onset and offset positions at each recurrent time step. We then introduce three sequential losses for network training: multi-label classification loss, distance estimation loss, and confidence loss. We demonstrate good generalization on two datasets: ITC-Irst for isolated audio event detection, and TUT-SED-Synthetic-2016 for overlapping audio event detection.

deep learning, event detection, neural network, (21 more...)

1811.01092

Country: Europe (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningApr-29-2016

Learning Compact Structural Representations for Audio Events Using Regressor Banks

Phan, Huy, Maass, Marco, Hertel, Lars, Mazur, Radoslaw, McLoughlin, Ian, Mertins, Alfred

We introduce a new learned descriptor for audio signals which is efficient for event representation. The entries of the descriptor are produced by evaluating a set of regressors on the input signal. The regressors are class-specific and trained using the random regression forests framework. Given an input signal, each regressor estimates the onset and offset positions of the target event. The estimation confidence scores output by a regressor are then used to quantify how the target event aligns with the temporal structure of the corresponding category. Our proposed descriptor has two advantages. First, it is compact, i.e. the dimensionality of the descriptor is equal to the number of event classes. Second, we show that even simple linear classification models, trained on our descriptor, yield better accuracies on audio event classification task than not only the nonlinear baselines but also the state-of-the-art results.

artificial intelligence, descriptor, machine learning, (19 more...)

doi: 10.1109/ICASSP.2016.7471667

1604.08716

Country: Europe > Germany (0.16)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

AAAI ConferencesFeb-22-2012

Evolutionary Clustering and Analysis of User Behaviour in Online Forums

Morrison, Donn (Digital Enterprise Research Institute) | McLoughlin, Ian (Digital Enterprise Research Institute) | Hogan, Alice (Digital Enterprise Research Institute) | Hayes, Conor (Digital Enterprise Research Institute)

In this paper we cluster and analyse temporal user behaviour in online communities. We adapt a simple unsupervised clustering algorithm to an evolutionary setting where we cluster users into prototypical behavioural roles based on features derived from their ego-centric reply-graphs. We then analyse changes in the role membership of the users over time, the change in role composition of forums over time and examine the differences between forums in terms of role composition. We perform this analysis on 200 forums from a popular national bulletin board and 14 enterprise technical support forums.

artificial intelligence, data mining, machine learning, (15 more...)

AAAI Conferences

Sixth International AAAI Conference on Weblogs and Social Media

Country: Europe > Ireland (0.14)

Genre: Research Report (0.31)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.55)