salminen
Data Augmentation for Fake Reviews Detection in Multiple Languages and Multiple Domains
With the growth of the Internet, buying habits have changed, and customers have become more dependent on the online opinions of other customers to guide their purchases. Identifying fake reviews thus became an important area for Natural Language Processing (NLP) research. However, developing high-performance NLP models depends on the availability of large amounts of training data, which are often not available for low-resource languages or domains. In this research, we used large language models to generate datasets to train fake review detectors. Our approach was used to generate fake reviews in different domains (book reviews, restaurant reviews, and hotel reviews) and different languages (English and Chinese). Our results demonstrate that our data augmentation techniques result in improved performance at fake review detection for all domains and languages. The accuracy of our fake review detection model can be improved by 0.3 percentage points on DeRev TEST, 10.9 percentage points on Amazon TEST, 8.3 percentage points on Yelp TEST and 7.2 percentage points on DianPing TEST using the augmented datasets.
Salminen
Online social media platforms generally attempt to mitigate hateful expressions, as these comments can be detrimental to the health of the community. However, automatically identifying hateful comments can be challenging. We manually label 5,143 hateful expressions posted to YouTube and Facebook videos among a dataset of 137,098 comments from an online news media. We then create a granular taxonomy of different types and targets of online hate and train machine learning models to automatically detect and classify the hateful comments in the full dataset. Our contribution is twofold: 1) creating a granular taxonomy for hateful online comments that includes both types and targets of hateful comments, and 2) experimenting with machine learning, including Logistic Regression, Decision Tree, Random Forest, Adaboost, and Linear SVM, to generate a multiclass, multilabel classification model that automatically detects and categorizes hateful comments in the context of online news media. We find that the best performing model is Linear SVM, with an average F1 score of 0.79 using TF-IDF features. We validate the model by testing its predictive ability, and, relatedly, provide insights on distinct types of hate speech taking place on social media.
The future of IoT and machine learning โ what role will humans play? - Industrial Internet Now
"What we are seeing today is that there typically exists a bit of a delay when companies start connecting assets and collecting information to be able to rely on machine learning algorithms and their accuracy," says Salminen. "The training of these algorithms requires large amounts of data and thus time. It takes time for any individual company to move through the cycle of starting with very basic use cases and moving onto more complex algorithms and dependencies, and eventually introducing machine learning." He recognizes companies that require warehouses โ or those whose supply chains do โ currently expect sophisticated IoT solutions from a production and manufacturing point of view. Salminen encourages companies who have examined the cost of IoT solutions for manufacturing or supply chain management over recent years to do so again. "Things are changing at such a pace that it is now very cost efficient even for smaller companies to deploy off-the-shelf IoT solutions for their supply chains as the price of hardware, connectivity and software has dramatically reduced over the last 5 years," he reasons.