Ensemble Learning
Assessing the impact of emergency department short stay units using length-of-stay prediction and discrete event simulation
Cevik, Mucahit, Kavaklioglu, Can, Razak, Fahad, Verma, Amol, Basar, Ayse
Accurately predicting hospital length-of-stay at the time a patient is admitted to hospital may help guide clinical decision making and resource allocation. In this study we aim to build a decision support system that predicts hospital length-of-stay for patients admitted to general internal medicine from the emergency department. We conduct an exploratory data analysis and employ feature selection methods to identify the attributes that result in the best predictive performance. We also develop a discrete-event simulation model to assess the performances of the prediction models in a practical setting. Our results show that the recommendation performances of the proposed approaches are generally acceptable and do not benefit from the feature selection. Further, the results indicate that hospital length-of-stay could be predicted with reasonable accuracy (e.g., AUC value for classifying short and long stay patients is 0.69) using patient admission demographics, laboratory test results, diagnostic imaging, vital signs and clinical documentation.
How Good Are SOTA Fake News Detectors
Automatic fake news detection with machine learning can prevent the dissemination of false statements before they gain many views. Several datasets labeling statements as legitimate or false have been created since the 2016 United States presidential election for the prospect of training machine learning models. We evaluate the robustness of both traditional and deep state-of-the-art models to gauge how well they may perform in the real world. We find that traditional models tend to generalize better to data outside the distribution it was trained on compared to more recently-developed large language models, though the best model to use may depend on the specific task at hand.
A Machine Learning Method for Predicting Traffic Signal Timing from Probe Vehicle Data
Ugirumurera, Juliette, Severino, Joseph, Bensen, Erik A., Wang, Qichao, Macfarlane, Jane
Traffic signals play an important role in transportation by enabling traffic flow management, and ensuring safety at intersections. In addition, knowing the traffic signal phase and timing data can allow optimal vehicle routing for time and energy efficiency, eco-driving, and the accurate simulation of signalized road networks. In this paper, we present a machine learning (ML) method for estimating traffic signal timing information from vehicle probe data. To the authors best knowledge, very few works have presented ML techniques for determining traffic signal timing parameters from vehicle probe data. In this work, we develop an Extreme Gradient Boosting (XGBoost) model to estimate signal cycle lengths and a neural network model to determine the corresponding red times per phase from probe data. The green times are then be derived from the cycle length and red times. Our results show an error of less than 0.56 sec for cycle length, and red times predictions within 7.2 sec error on average.
Merging satellite and gauge-measured precipitation using LightGBM with an emphasis on extreme quantiles
Tyralis, Hristos, Papacharalampous, Georgia, Doulamis, Nikolaos, Doulamis, Anastasios
Knowing the actual precipitation in space and time is critical in hydrological modelling applications, yet the spatial coverage with rain gauge stations is limited due to economic constraints. Gridded satellite precipitation datasets offer an alternative option for estimating the actual precipitation by covering uniformly large areas, albeit related estimates are not accurate. To improve precipitation estimates, machine learning is applied to merge rain gauge-based measurements and gridded satellite precipitation products. In this context, observed precipitation plays the role of the dependent variable, while satellite data play the role of predictor variables. Random forests is the dominant machine learning algorithm in relevant applications. In those spatial predictions settings, point predictions (mostly the mean or the median of the conditional distribution) of the dependent variable are issued. The aim of the manuscript is to solve the problem of probabilistic prediction of precipitation with an emphasis on extreme quantiles in spatial interpolation settings. Here we propose, issuing probabilistic spatial predictions of precipitation using Light Gradient Boosting Machine (LightGBM). LightGBM is a boosting algorithm, highlighted by prize-winning entries in prediction and forecasting competitions. To assess LightGBM, we contribute a large-scale application that includes merging daily precipitation measurements in contiguous US with PERSIANN and GPM-IMERG satellite precipitation data. We focus on extreme quantiles of the probability distribution of the dependent variable, where LightGBM outperforms quantile regression forests (QRF, a variant of random forests) in terms of quantile score at extreme quantiles. Our study offers understanding of probabilistic predictions in spatial settings using machine learning.
Morphological Classification of Extragalactic Radio Sources Using Gradient Boosting Methods
Darya, Abdollah Masoud, Fernini, Ilias, Vellasco, Marley, Hussain, Abir
The field of radio astronomy is witnessing a boom in the amount of data produced per day due to newly commissioned radio telescopes. One of the most crucial problems in this field is the automatic classification of extragalactic radio sources based on their morphologies. Most recent contributions in the field of morphological classification of extragalactic radio sources have proposed classifiers based on convolutional neural networks. Alternatively, this work proposes gradient boosting machine learning methods accompanied by principal component analysis as data-efficient alternatives to convolutional neural networks. Recent findings have shown the efficacy of gradient boosting methods in outperforming deep learning methods for classification problems with tabular data. The gradient boosting methods considered in this work are based on the XGBoost, LightGBM, and CatBoost implementations. This work also studies the effect of dataset size on classifier performance. A three-class classification problem is considered in this work based on the three main Fanaroff-Riley classes: class 0, class I, and class II, using radio sources from the Best-Heckman sample. All three proposed gradient boosting methods outperformed a state-of-the-art convolutional neural networks-based classifier using less than a quarter of the number of images, with CatBoost having the highest accuracy. This was mainly due to the superior accuracy of gradient boosting methods in classifying Fanaroff-Riley class II sources, with 3$\unicode{x2013}$4% higher recall.
Machine Learning-Based Diabetes Detection Using Photoplethysmography Signal Features
Oliveira, Filipe A. C., Dias, Felipe M., Toledo, Marcelo A. F., Cardenas, Diego A. C., Almeida, Douglas A., Ribeiro, Estela, Krieger, Jose E., Gutierrez, Marco A.
Diabetes is a prevalent chronic condition that compromises the health of millions of people worldwide. Minimally invasive methods are needed to prevent and control diabetes but most devices for measuring glucose levels are invasive and not amenable for continuous monitoring. Here, we present an alternative method to overcome these shortcomings based on non-invasive optical photoplethysmography (PPG) for detecting diabetes. We classify non-Diabetic and Diabetic patients using the PPG signal and metadata for training Logistic Regression (LR) and eXtreme Gradient Boosting (XGBoost) algorithms. We used PPG signals from a publicly available dataset. To prevent overfitting, we divided the data into five folds for cross-validation. By ensuring that patients in the training set are not in the testing set, the model's performance can be evaluated on unseen subjects' data, providing a more accurate assessment of its generalization. Our model achieved an F1-Score and AUC of $58.8\pm20.0\%$ and $79.2\pm15.0\%$ for LR and $51.7\pm16.5\%$ and $73.6\pm17.0\%$ for XGBoost, respectively. Feature analysis suggested that PPG morphological features contains diabetes-related information alongside metadata. Our findings are within the same range reported in the literature, indicating that machine learning methods are promising for developing remote, non-invasive, and continuous measurement devices for detecting and preventing diabetes.
Artificial-Intelligence-Based Hybrid Extended Phase Shift Modulation for the Dual Active Bridge Converter with Full ZVS Range and Optimal Efficiency
Li, Xinze, Zhang, Xin, Lin, Fanfan, Sun, Changjiang, Mao, Kezhi
Dual active bridge (DAB) converter is the key enabler in many popular applications such as wireless charging, electric vehicle and renewable energy. ZVS range and efficiency are two significant performance indicators for DAB converter. To obtain the desired ZVS and efficiency performance, modulation should be carefully designed. Hybrid modulation considers several single modulation strategies to achieve good comprehensive performance. Conventionally, to design a hybrid modulation, harmonic approach or piecewise approach is used, but they suffer from time-consuming model building process and inaccuracy. Therefore, an artificial-intelligence-based hybrid extended phase shift (HEPS) modulation is proposed. Generally, the HEPS modulation is developed in an automated fashion, which alleviates cumbersome model building process while keeping high model accuracy. In HEPS modulation, two EPS strategies are considered to realize optimal efficiency with full ZVS operation over entire operating ranges. Specifically, to build data-driven models of ZVS and efficiency performance, extreme gradient boosting (XGBoost), which is a state-of-the-art ensemble learning algorithm, is adopted. Afterwards, particle swarm optimization with state-based adaptive velocity limit (PSO-SAVL) is utilized to select the best EPS strategy and optimize modulation parameters. With 1 kW hardware experiments, the feasibility of HEPS has been verified, achieving optimal efficiency with maximum of 97.1% and full-range ZVS operation.
Crowd Safety Manager: Towards Data-Driven Active Decision Support for Planning and Control of Crowd Events
Krishnakumari, Panchamy, Hoogendoorn-Lanser, Sascha, Steenbakkers, Jeroen, Hoogendoorn, Serge
This paper presents novel technology and methodology aimed at enhancing crowd management in both the planning and operational phases. The approach encompasses innovative data collection techniques, data integration, and visualization using a 3D Digital Twin, along with the incorporation of artificial intelligence (AI) tools for risk identification. The paper introduces the Bowtie model, a comprehensive framework designed to assess and predict risk levels. The model combines objective estimations and predictions, such as traffic flow operations and crowdedness levels, with various aggravating factors like weather conditions, sentiments, and the purpose of visitors, to evaluate the expected risk of incidents. The proposed framework is applied to the Crowd Safety Manager project in Scheveningen, where the DigiTwin is developed based on a wealth of real-time data sources. One noteworthy data source is Resono, offering insights into the number of visitors and their movements, leveraging a mobile phone panel of over 2 million users in the Netherlands. Particular attention is given to the left-hand side of the Bowtie, which includes state estimation, prediction, and forecasting. Notably, the focus is on generating multi-day ahead forecasts for event-planning purposes using Resono data. Advanced machine learning techniques, including the XGBoost framework, are compared, with XGBoost demonstrating the most accurate forecasts. The results indicate that the predictions are adequately accurate. However, certain locations may benefit from additional input data to further enhance prediction quality. Despite these limitations, this work contributes to a more effective crowd management system and opens avenues for further advancements in this critical field.
Predictive Maintenance of Armoured Vehicles using Machine Learning Approaches
Sengupta, Prajit, Mehta, Anant, Rana, Prashant Singh
Armoured vehicles are specialized and complex pieces of machinery designed to operate in high-stress environments, often in combat or tactical situations. This study proposes a predictive maintenance-based ensemble system that aids in predicting potential maintenance needs based on sensor data collected from these vehicles. The proposed model's architecture involves various models such as Light Gradient Boosting, Random Forest, Decision Tree, Extra Tree Classifier and Gradient Boosting to predict the maintenance requirements of the vehicles accurately. In addition, K-fold cross validation, along with TOPSIS analysis, is employed to evaluate the proposed ensemble model's stability. The results indicate that the proposed system achieves an accuracy of 98.93%, precision of 99.80% and recall of 99.03%. The algorithm can effectively predict maintenance needs, thereby reducing vehicle downtime and improving operational efficiency. Through comparisons between various algorithms and the suggested ensemble, this study highlights the potential of machine learning-based predictive maintenance solutions.
Improving the Generalization Ability in Essay Coherence Evaluation through Monotonic Constraints
Zheng, Chen, Zhang, Huan, Zhao, Yan, Lai, Yuxuan
Coherence is a crucial aspect of evaluating text readability and can be assessed through two primary factors when evaluating an essay in a scoring scenario. The first factor is logical coherence, characterized by the appropriate use of discourse connectives and the establishment of logical relationships between sentences. The second factor is the appropriateness of punctuation, as inappropriate punctuation can lead to confused sentence structure. To address these concerns, we propose a coherence scoring model consisting of a regression model with two feature extractors: a local coherence discriminative model and a punctuation correction model. We employ gradient-boosting regression trees as the regression model and impose monotonicity constraints on the input features. The results show that our proposed model better generalizes unseen data. The model achieved third place in track 1 of NLPCC 2023 shared task 7. Additionally, we briefly introduce our solution for the remaining tracks, which achieves second place for track 2 and first place for both track 3 and track 4.