labelled training data
cantnlp@LT-EDI-2024: Automatic Detection of Anti-LGBTQ+ Hate Speech in Under-resourced Languages
Wong, Sidney G. -J., Durward, Matthew
This paper describes our homophobia/transphobia in social media comments detection system developed as part of the shared task at LT-EDI-2024. We took a transformer-based approach to develop our multiclass classification model for ten language conditions (English, Spanish, Gujarati, Hindi, Kannada, Malayalam, Marathi, Tamil, Tulu, and Telugu). We introduced synthetic and organic instances of script-switched language data during domain adaptation to mirror the linguistic realities of social media language as seen in the labelled training data. Our system ranked second for Gujarati and Telugu with varying levels of performance for other language conditions. The results suggest incorporating elements of paralinguistic behaviour such as script-switching may improve the performance of language detection systems especially in the cases of under-resourced languages conditions.
- Oceania > New Zealand (0.05)
- Europe > Bulgaria > Varna Province > Varna (0.05)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- (4 more...)
cantnlp@LT-EDI-2023: Homophobia/Transphobia Detection in Social Media Comments using Spatio-Temporally Retrained Language Models
Wong, Sidney G. -J., Durward, Matthew, Adams, Benjamin, Dunn, Jonathan
This paper describes our multiclass classification system developed as part of the LTEDI@RANLP-2023 shared task. We used a BERT-based language model to detect homophobic and transphobic content in social media comments across five language conditions: English, Spanish, Hindi, Malayalam, and Tamil. We retrained a transformer-based crosslanguage pretrained language model, XLMRoBERTa, with spatially and temporally relevant social media language data. We also retrained a subset of models with simulated script-mixed social media language data with varied performance. We developed the best performing seven-label classification system for Malayalam based on weighted macro averaged F1 score (ranked first out of six) with variable performance for other language and class-label conditions. We found the inclusion of this spatio-temporal data improved the classification performance for all language and task conditions when compared with the baseline. The results suggests that transformer-based language classification systems are sensitive to register-specific and language-specific retraining.
- Europe > Ireland > Leinster > County Dublin > Dublin (0.05)
- Oceania > New Zealand > South Island > Canterbury Region > Christchurch (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > India (0.04)
What You Need to Know About Machine Learning in 2023
Machine learning is the process of enabling computers to tackle different kinds of tasks that have been carried out by people until now. Machine learning algorithms are built in such a way that it helps automate self-driving cars, translate speech and execute many other tasks. Machine learning technology is driving an explosion in the field of artificial intelligence. Let us see what exactly is machine learning. Machine learning is a type of artificial intelligence that allows software applications to become accurate at predicting outcomes without being explicitly programmed.
- Information Technology (0.70)
- Banking & Finance > Trading (0.33)
Supervised vs Unsupervised Learning Explained - Seldon
Machine learning is already an important part of how modern organisation and services function. Whether in social media platforms, healthcare, or finance, machine learning models are deployed in a variety of settings. But the steps needed to train and deploy a model will differ depending on the task at hand and the data that's available. Supervised and unsupervised learning are examples of two different types of machine learning model approach. They differ in the way the models are trained and the condition of the training data that's required.
Cost-effective speech-to-text with weakly- and semi-supervised training
Voice assistants equipped with speech-to-text technology have seen a major boost in performance and usage, thanks to the new powerful machine learning methods based on deep neural networks. These methods follow a supervised learning approach, requiring large amounts of paired speech-text data to train the best performing speech-to-text transcription models. After collecting large amounts of relevant and diverse spoken utterances, the complex and intensive task of annotating and labelling of the collected speech data awaits. To get a feel for a typical scenario, let's look at some estimates. On average a typical user query, for example "Do you have the Christmas edition with Santa?", would last for about 3 seconds.
Twin Neural Network Regression is a Semi-Supervised Regression Algorithm
Wetzel, Sebastian J., Melko, Roger G., Tamblyn, Isaac
Twin neural network regression (TNNR) is a semi-supervised regression algorithm, it can be trained on unlabelled data points as long as other, labelled anchor data points, are present. TNNR is trained to predict differences between the target values of two different data points rather than the targets themselves. By ensembling predicted differences between the targets of an unseen data point and all training data points, it is possible to obtain a very accurate prediction for the original regression problem. Since any loop of predicted differences should sum to zero, loops can be supplied to the training data, even if the data points themselves within loops are unlabelled. Semi-supervised training improves TNNR performance, which is already state of the art, significantly.
- North America > Canada > Ontario > Waterloo Region > Waterloo (0.14)
- North America > Canada > Ontario > National Capital Region > Ottawa (0.14)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > Canada > Ontario > Toronto (0.04)
Machine Learning
Machine learning algorithms all aim to learn and improve their accuracy as they process more datasets. One way that we can classify the tasks that machine learning algorithms solve is by how much feedback they present to the system. In some scenarios, the computer is provided a significant amount of labelled training data is provided, which is called supervised learning. In other cases, no labelled data is provided and this is known as unsupervised learning. Lastly, in semi-supervised learning, some labelled training data is provided, but most of the training data is unlabelled.
Facebook Is Giving Away This Speech Recognition Model For Free
Researchers at Facebook AI recently introduced and open-sourced a new framework for self-supervised learning of representations from raw audio data known as wav2vec 2.0. The company claims that this framework can enable automatic speech recognition models with just 10 minutes of transcribed speech data. Neural network models have gained much traction over the last few years due to its applications across various sectors. The models work with the help of vast quantities of labelled training data. However, most of the time, it is challenging to gather labelled data than unlabelled data.
Automatic Speech Transcription And Speaker Recognition Simultaneously Using Apple AI
Last year, Apple witnessed several controversies regarding its speech recognition technology. To provide quality control in the company's voice assistant Siri, Apple asked its contractors to regularly hear the confidential voice recordings in the name of the "Siri Grading Program". However, to this matter, the company later apologised and published a statement where it announced the changes in the Siri grading program. This year, the tech giant has been gearing up a number of researchers regarding speech recognition technology to upgrade its voice assistant. Recently, the researchers at Apple developed an AI model which can perform automatic speech transcription and speaker recognition simultaneously.
Machine Learning – Introduction to Supervised Learning Vinod Sharma's Blog
Supervised learning – A blessing we have in this machines era. It helps to depict inputs to outputs. It uses labelled training data to deduce a function which has a set of training examples. The majority of practical machine learning uses supervised learning as on date. AILabPage defines Machine Learning as "A focal point where business, data and experience meets emerging technology and decides to work together".