bilstm layer
Dependency Parsing is More Parameter-Efficient with Normalization
Gajo, Paolo, Rosati, Domenic, Sajjad, Hassan, Barrón-Cedeño, Alberto
Dependency parsing is the task of inferring natural language structure, often approached by modeling word interactions via attention through biaffine scoring. This mechanism works like self-attention in Transformers, where scores are calculated for every pair of words in a sentence. However, unlike Transformer attention, biaffine scoring does not use normalization prior to taking the softmax of the scores. In this paper, we provide theoretical evidence and empirical results revealing that a lack of normalization necessarily results in overparameterized parser models, where the extra parameters compensate for the sharp softmax outputs produced by high variance inputs to the biaffine scoring function. We argue that biaffine scoring can be made substantially more efficient by performing score normalization. We conduct experiments on semantic and syntactic dependency parsing in multiple languages, along with latent graph inference on non-linguistic data, using various settings of a $k$-hop parser. We train $N$-layer stacked BiLSTMs and evaluate the parser's performance with and without normalizing biaffine scores. Normalizing allows us to achieve state-of-the-art performance with fewer samples and trainable parameters. Code: https://github.com/paolo-gajo/EfficientSDP
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- South America > Paraguay > Asunción > Asunción (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (14 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)
- Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.91)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
BiLSTM and Attention-Based Modulation Classification of Realistic Wireless Signals
Udaiwal, Rohit, Baishya, Nayan, Gupta, Yash, Manoj, B. R.
This work proposes a novel and efficient quadstream BiLSTM-Attention network, abbreviated as QSLA network, for robust automatic modulation classification (AMC) of wireless signals. The proposed model exploits multiple representations of the wireless signal as inputs to the network and the feature extraction process combines convolutional and BiLSTM layers for processing the spatial and temporal features of the signal, respectively. An attention layer is used after the BiLSTM layer to emphasize the important temporal features. The experimental results on the recent and realistic RML22 dataset demonstrate the superior performance of the proposed model with an accuracy up to around 99%. The model is compared with other benchmark models in the literature in terms of classification accuracy, computational complexity, memory usage, and training time to show the effectiveness of our proposed approach.
- North America > United States > Colorado > Boulder County > Boulder (0.04)
- North America > United States > California > Monterey County > Pacific Grove (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)
- Asia > India > Assam > Guwahati (0.04)
LoRA-like Calibration for Multimodal Deception Detection using ATSFace Data
Hsiao, Shun-Wen, Sun, Cheng-Yuan
Recently, deception detection on human videos is an eye-catching techniques and can serve lots applications. AI model in this domain demonstrates the high accuracy, but AI tends to be a non-interpretable black box. We introduce an attention-aware neural network addressing challenges inherent in video data and deception dynamics. This model, through its continuous assessment of visual, audio, and text features, pinpoints deceptive cues. We employ a multimodal fusion strategy that enhances accuracy; our approach yields a 92\% accuracy rate on a real-life trial dataset. Most important of all, the model indicates the attention focus in the videos, providing valuable insights on deception cues. Hence, our method adeptly detects deceit and elucidates the underlying process. We further enriched our study with an experiment involving students answering questions either truthfully or deceitfully, resulting in a new dataset of 309 video clips, named ATSFace. Using this, we also introduced a calibration method, which is inspired by Low-Rank Adaptation (LoRA), to refine individual-based deception detection accuracy.
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
An End-to-End Breast Tumour Classification Model Using Context-Based Patch Modelling- A BiLSTM Approach for Image Classification
Tripathi, Suvidha, Singh, Satish Kumar, Lee, Hwee Kuan
Researchers working on computational analysis of Whole Slide Images (WSIs) in histopathology have primarily resorted to patch-based modelling due to large resolution of each WSI. The large resolution makes WSIs infeasible to be fed directly into the machine learning models due to computational constraints. However, due to patch-based analysis, most of the current methods fail to exploit the underlying spatial relationship among the patches. In our work, we have tried to integrate this relationship along with feature-based correlation among the extracted patches from the particular tumorous region. For the given task of classification, we have used BiLSTMs to model both forward and backward contextual relationship. RNN based models eliminate the limitation of sequence size by allowing the modelling of variable size images within a deep learning model. We have also incorporated the effect of spatial continuity by exploring different scanning techniques used to sample patches. To establish the efficiency of our approach, we trained and tested our model on two datasets, microscopy images and WSI tumour regions. After comparing with contemporary literature we achieved the better performance with accuracy of 90% for microscopy image dataset. For WSI tumour region dataset, we compared the classification results with deep learning networks such as ResNet, DenseNet, and InceptionV3 using maximum voting technique. We achieved the highest performance accuracy of 84%. We found out that BiLSTMs with CNN features have performed much better in modelling patches into an end-to-end Image classification network. Additionally, the variable dimensions of WSI tumour regions were used for classification without the need for resizing. This suggests that our method is independent of tumour image size and can process large dimensional images without losing the resolution details.
- Research Report > New Finding (0.67)
- Research Report > Promising Solution (0.46)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Diagnostic Medicine (1.00)
IITK at SemEval-2020 Task 10: Transformers for Emphasis Selection
Singhal, Vipul, Dhull, Sahil, Agarwal, Rishabh, Modi, Ashutosh
This paper describes the system proposed for addressing the research problem posed in Task 10 of SemEval-2020: Emphasis Selection For Written Text in Visual Media. We propose an end-to-end model that takes as input the text and corresponding to each word gives the probability of the word to be emphasized. Our results show that transformer-based models are particularly effective in this task. We achieved the best Matchm score (described in section 2.2) of 0.810 and were ranked third on the leaderboard.
DENS-ECG: A Deep Learning Approach for ECG Signal Delineation
Peimankar, Abdolrahman, Puthusserypady, Sadasivan
Objectives: With the technological advancements in the field of tele-health monitoring, it is now possible to gather huge amounts of electro-physiological signals such as electrocardiogram (ECG). It is therefore necessary to develop models/algorithms that are capable of analysing these massive amounts of data in real-time. This paper proposes a deep learning model for real-time segmentation of heartbeats. Methods: The proposed algorithm, named as the DENS-ECG algorithm, combines convolutional neural network (CNN) and long short-term memory (LSTM) model to detect onset, peak, and offset of different heartbeat waveforms such as the P-wave, QRS complex, T-wave, and No wave (NW). Using ECG as the inputs, the model learns to extract high level features through the training process, which, unlike other classical machine learning based methods, eliminates the feature engineering step. Results: The proposed DENS-ECG model was trained and validated on a dataset with 105 ECGs of length 15 minutes each and achieved an average sensitivity and precision of 97.95% and 95.68%, respectively, using a 5-fold cross validation. Additionally, the model was evaluated on an unseen dataset to examine its robustness in QRS detection, which resulted in a sensitivity of 99.61% and precision of 99.52%. Conclusion: The empirical results show the flexibility and accuracy of the combined CNN-LSTM model for ECG signal delineation. Significance: This paper proposes an efficient and easy to use approach using deep learning for heartbeat segmentation, which could potentially be used in real-time tele-health monitoring systems.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York (0.04)
- Europe > Denmark > Capital Region > Kongens Lyngby (0.04)
- (2 more...)
Rhythm, Chord and Melody Generation for Lead Sheets using Recurrent Neural Networks
De Boom, Cedric, Van Laere, Stephanie, Verbelen, Tim, Dhoedt, Bart
Music that is generated by recurrent neural networks often lacks a sense of direction and coherence. We therefore propose a two-stage LSTM-based model for lead sheet generation, in which the harmonic and rhythmic templates of the song are produced first, after which, in a second stage, a sequence of melody notes is generated conditioned on these templates. A subjective listening test shows that our approach outperforms the baselines and increases perceived musical coherence.
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
Automatic Speech Transcription And Speaker Recognition Simultaneously Using Apple AI
Last year, Apple witnessed several controversies regarding its speech recognition technology. To provide quality control in the company's voice assistant Siri, Apple asked its contractors to regularly hear the confidential voice recordings in the name of the "Siri Grading Program". However, to this matter, the company later apologised and published a statement where it announced the changes in the Siri grading program. This year, the tech giant has been gearing up a number of researchers regarding speech recognition technology to upgrade its voice assistant. Recently, the researchers at Apple developed an AI model which can perform automatic speech transcription and speaker recognition simultaneously.
Multi-task Learning for Speaker Verification and Voice Trigger Detection
Sigtia, Siddharth, Marchi, Erik, Kajarekar, Sachin, Naik, Devang, Bridle, John
Automatic speech transcription and speaker recognition are usually treated as separate tasks even though they are interdependent. In this study, we investigate training a single network to perform both tasks jointly. We train the network in a supervised multi-task learning setup, where the speech transcription branch of the network is trained to minimise a phonetic connectionist temporal classification (CTC) loss while the speaker recognition branch of the network is trained to label the input sequence with the correct label for the speaker. We present a large-scale empirical study where the model is trained using several thousand hours of labelled training data for each task. We evaluate the speech transcription branch of the network on a voice trigger detection task while the speaker recognition branch is evaluated on a speaker verification task. Results demonstrate that the network is able to encode both phonetic \emph{and} speaker information in its learnt representations while yielding accuracies at least as good as the baseline models for each task, with the same number of parameters as the independent models.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.04)
- Asia (0.04)