Ryan, Conor
Do Weibo platform experts perform better at predicting stock market?
Ma, Ziyuan, Ryan, Conor, Buckley, Jim, Chochlov, Muslim
Sentiment analysis can be used for stock market prediction. However, existing research has not studied the impact of a user's financial background on sentiment-based forecasting of the stock market using artificial neural networks. In this work, a novel combination of neural networks is used for the assessment of sentiment-based stock market prediction, based on the financial background of the population that generated the sentiment. The state-of-the-art language processing model Bidirectional Encoder Representations from Transformers (BERT) is used to classify the sentiment and a Long-Short Term Memory (LSTM) model is used for time-series based stock market prediction. For evaluation, the Weibo social networking platform is used as a sentiment data collection source. Weibo users (and their comments respectively) are divided into Authorized Financial Advisor (AFA) and Unauthorized Financial Advisor (UFA) groups according to their background information, as collected by Weibo. The Hong Kong Hang Seng index is used to extract historical stock market change data. The results indicate that stock market prediction learned from the AFA group users is 39.67% more precise than that learned from the UFA group users and shows the highest accuracy (87%) when compared to existing approaches.
Interpretable Solutions for Breast Cancer Diagnosis with Grammatical Evolution and Data Augmentation
Hasan, Yumnah, de Lima, Allan, Amerehi, Fatemeh, de Bulnes, Darian Reyes Fernandez, Healy, Patrick, Ryan, Conor
Medical imaging diagnosis increasingly relies on Machine Learning (ML) models. This is a task that is often hampered by severely imbalanced datasets, where positive cases can be quite rare. Their use is further compromised by their limited interpretability, which is becoming increasingly important. While post-hoc interpretability techniques such as SHAP and LIME have been used with some success on so-called black box models, the use of inherently understandable models makes such endeavors more fruitful. This paper addresses these issues by demonstrating how a relatively new synthetic data generation technique, STEM, can be used to produce data to train models produced by Grammatical Evolution (GE) that are inherently understandable. STEM is a recently introduced combination of the Synthetic Minority Oversampling Technique (SMOTE), Edited Nearest Neighbour (ENN), and Mixup; it has previously been successfully used to tackle both between class and within class imbalance issues. We test our technique on the Digital Database for Screening Mammography (DDSM) and the Wisconsin Breast Cancer (WBC) datasets and compare Area Under the Curve (AUC) results with an ensemble of the top three performing classifiers from a set of eight standard ML classifiers with varying degrees of interpretability. We demonstrate that the GE-derived models present the best AUC while still maintaining interpretable solutions.
STEM Rebalance: A Novel Approach for Tackling Imbalanced Datasets using SMOTE, Edited Nearest Neighbour, and Mixup
Hasan, Yumnah, Amerehi, Fatemeh, Healy, Patrick, Ryan, Conor
Imbalanced datasets in medical imaging are characterized by skewed class proportions and scarcity of abnormal cases. When trained using such data, models tend to assign higher probabilities to normal cases, leading to biased performance. Common oversampling techniques such as SMOTE rely on local information and can introduce marginalization issues. This paper investigates the potential of using Mixup augmentation that combines two training examples along with their corresponding labels to generate new data points as a generic vicinal distribution. To this end, we propose STEM, which combines SMOTE-ENN and Mixup at the instance level. This integration enables us to effectively leverage the entire distribution of minority classes, thereby mitigating both between-class and within-class imbalances. We focus on the breast cancer problem, where imbalanced datasets are prevalent. The results demonstrate the effectiveness of STEM, which achieves AUC values of 0.96 and 0.99 in the Digital Database for Screening Mammography and Wisconsin Breast Cancer (Diagnostics) datasets, respectively. Moreover, this method shows promising potential when applied with an ensemble of machine learning (ML) classifiers.
Stochastic Model Predictive Controller for the Integration of Building Use and Temperature Regulation
Mady, Alie El-Din (University College Cork) | Provan, Gregory (University College Cork) | Ryan, Conor (University College Cork) | Brown, Kenneth (University College Cork)
The aim of a modern Building Automation System (BAS) is to enhance interactive control strategies for energy efficiency and user comfort. In this context, we develop a novel control algorithm that uses a stochastic building occupancy model to improve mean energy efficiency while minimizing expected discomfort. We compare by simulation our Stochastic Model Predictive Control (SMPC) strategy to the standard heating control method to empirically demonstrate a 4.3% reduction in energy use and 38.3% reduction in expected discomfort.