AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Rashomon effect in Educational Research: Why More is Better Than One for Measuring the Importance of the Variables?

Kuzilek, Jakub, Çavuş, Mustafa

arXiv.org Artificial IntelligenceDec-2-2024

This study explores how the Rashomon effect influences variable importance in the context of student demographics used for academic outcomes prediction. Our research follows the way machine learning algorithms are employed in Educational Data Mining, focusing on highlighting the so-called Rashomon effect. The study uses the Rashomon set of simple-yet-accurate models trained using decision trees, random forests, light GBM, and XGBoost algorithms with the Open University Learning Analytics Dataset. We found that the Rashomon set improves the predictive accuracy by 2-6%. Variable importance analysis revealed more consistent and reliable results for binary classification than multiclass classification, highlighting the complexity of predicting multiple outcomes. Key demographic variables imd_band and highest_education were identified as vital, but their importance varied across courses, especially in course DDD. These findings underscore the importance of model choice and the need for caution in generalizing results, as different models can lead to different variable importance rankings. The codes for reproducing the experiments are available in the repository: https://anonymous.4open.science/r/JEDM_paper-DE9D.

artificial intelligence, decision tree learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.12115

Country:

Asia > Middle East > Republic of Türkiye > Eskisehir Province > Eskisehir (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > Wales (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Instructional Material (1.00)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
(2 more...)

Add feedback

Predicting the Impact of Scope Changes on Project Cost and Schedule Using Machine Learning Techniques

Sadeghi, Soheila

arXiv.org Artificial IntelligenceDec-2-2024

In the dynamic landscape of project management, scope changes are an inevitable reality that can significantly impact project performance. These changes, whether initiated by stakeholders, external factors, or internal project dynamics, can lead to cost overruns and schedule delays. Accurately predicting the consequences of these changes is crucial for effective project control and informed decision-making. This study aims to develop predictive models to estimate the impact of scope changes on project cost and schedule using machine learning techniques. The research utilizes a comprehensive dataset containing detailed information on project tasks, including the Work Breakdown Structure (WBS), task type, productivity rate, estimated cost, actual cost, duration, task dependencies, scope change magnitude, and scope change timing. Multiple machine learning models are developed and evaluated to predict the impact of scope changes on project cost and schedule. These models include Linear Regression, Decision Tree, Ridge Regression, Random Forest, Gradient Boosting, and XGBoost. The dataset is split into training and testing sets, and the models are trained using the preprocessed data. Model robustness and generalization are assessed using cross-validation techniques. To evaluate the performance of models, we use Mean Squared Error (MSE) and R2. Residual plots are generated to assess the goodness of fit and identify any patterns or outliers. Hyperparameter tuning is performed to optimize the XGBoost model and improve its predictive accuracy. The study identifies the most influential project attributes in determining the magnitude of cost and schedule deviations caused by scope modifications. It is identified that productivity rate, scope change magnitude, task dependencies, estimated cost, actual cost, duration, and specific WBS elements are powerful predictors.

actual cost, magnitude, scope change, (17 more...)

arXiv.org Artificial Intelligence

2412.02041

Country:

North America > United States > Texas > Bexar County > San Antonio (0.04)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)

Genre: Research Report (1.00)

Industry: Construction & Engineering (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.50)

Add feedback

Voice Biomarker Analysis and Automated Severity Classification of Dysarthric Speech in a Multilingual Context

Yeo, Eunjung

arXiv.org Artificial IntelligenceNov-30-2024

Dysarthria, a motor speech disorder, severely impacts voice quality, pronunciation, and prosody, leading to diminished speech intelligibility and reduced quality of life. Accurate assessment is crucial for effective treatment, but traditional perceptual assessments are limited by their subjectivity and resource intensity. To mitigate the limitations, automatic dysarthric speech assessment methods have been proposed to support clinicians on their decision-making. While these methods have shown promising results, most research has focused on monolingual environments. However, multilingual approaches are necessary to address the global burden of dysarthria and ensure equitable access to accurate diagnosis. This thesis proposes a novel multilingual dysarthria severity classification method, by analyzing three languages: English, Korean, and Tamil.

classification, data mining, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2412.12111

Country:

North America > United States (0.14)
North America > Canada > Quebec > Montreal (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area > Musculoskeletal (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.82)
Health & Medicine > Therapeutic Area > Neurology > Parkinson's Disease (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(5 more...)

Add feedback

MetaMetrics: Calibrating Metrics For Generation Tasks Using Human Preferences

Winata, Genta Indra, Anugraha, David, Susanto, Lucky, Kuwanto, Garry, Wijaya, Derry Tanti

arXiv.org Artificial IntelligenceNov-28-2024

Understanding the quality of a performance evaluation metric is crucial for ensuring that model outputs align with human preferences. However, it remains unclear how well each metric captures the diverse aspects of these preferences, as metrics often excel in one particular area but not across all dimensions. To address this, it is essential to systematically calibrate metrics to specific aspects of human preference, catering to the unique characteristics of each aspect. We introduce MetaMetrics, a calibrated meta-metric designed to evaluate generation tasks across different modalities in a supervised manner. MetaMetrics optimizes the combination of existing metrics to enhance their alignment with human preferences. Our metric demonstrates flexibility and effectiveness in both language and vision downstream tasks, showing significant benefits across various multilingual and multi-domain scenarios. MetaMetrics aligns closely with human preferences and is highly extendable and easily integrable into any application. This makes MetaMetrics a powerful tool for improving the evaluation of generation tasks, ensuring that metrics are more representative of human judgment across diverse contexts.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.02381

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > Indonesia > Bali (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
(4 more...)

Add feedback

Living off the Analyst: Harvesting Features from Yara Rules for Malware Detection

Gupta, Siddhant, Lu, Fred, Barlow, Andrew, Raff, Edward, Ferraro, Francis, Matuszek, Cynthia, Nicholas, Charles, Holt, James

arXiv.org Artificial IntelligenceNov-27-2024

A strategy used by malicious actors is to "live off the land," where benign systems and tools already available on a victim's systems are used and repurposed for the malicious actor's intent. In this work, we ask if there is a way for anti-virus developers to similarly re-purpose existing work to improve their malware detection capability. We show that this is plausible via YARA rules, which use human-written signatures to detect specific malware families, functionalities, or other markers of interest. By extracting sub-signatures from publicly available YARA rules, we assembled a set of features that can more effectively discriminate malicious samples from benign ones. Our experiments demonstrate that these features add value beyond traditional features on the EMBER 2018 dataset. Manual analysis of the added sub-signatures shows a power-law behavior in a combination of features that are specific and unique, as well as features that occur often. A prior expectation may be that the features would be limited in being overly specific to unique malware families. This behavior is observed, and is apparently useful in practice. In addition, we also find sub-signatures that are dual-purpose (e.g., detecting virtual machine environments) or broadly generic (e.g., DLL imports).

accuracy, information, selection, (14 more...)

arXiv.org Artificial Intelligence

2411.18516

Country:

North America > United States > New York > New York County > New York City (0.05)
North America > United States > Maryland > Baltimore County (0.04)
North America > United States > Maryland > Baltimore (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Machine Learning and Multi-source Remote Sensing in Forest Carbon Stock Estimation: A Review

Nguyen, Autumn, Saha, Sulagna

arXiv.org Artificial IntelligenceNov-26-2024

Quantifying forest carbon is crucial for informing decisions and policies that will protect the planet. Machine learning (ML) and remote sensing (RS) techniques have been used to do this task more effectively, yet there lacks a systematic review on the most recent ML methods and RS combinations, especially with the consideration of forest characteristics. This study systematically analyzed 25 papers meeting strict inclusion criteria from over 80 related studies, identifying 28 ML methods and key combinations of RS data. Random Forest had the most frequently appearance (88% of studies), while Extreme Gradient Boosting showed superior performance in 75% of the studies in which it was compared with other methods. Sentinel-1 emerged as the most utilized remote sensing source, with multi-sensor approaches (e.g., Sentinel-1, Sentinel-2, and LiDAR) proving especially effective. Our findings provide grounds for recommending best practices in integrating machine learning and remote sensing for accurate and scalable forest carbon stock estimation.

estimation, lidar, ml method, (15 more...)

arXiv.org Artificial Intelligence

2411.17624

Country:

Asia > Pakistan (0.04)
Asia > Middle East > Iran (0.04)
Asia > India (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Understanding GEMM Performance and Energy on NVIDIA Ada Lovelace: A Machine Learning-Based Analytical Approach

Xiaoteng, null, Liu, null, Halim, Pavly

arXiv.org Artificial IntelligenceNov-25-2024

Analytical framework for predicting General Matrix Multiplication (GEMM) performance on modern GPUs, focusing on runtime, power consumption, and energy efficiency. Our study employs two approaches: a custom-implemented tiled matrix multiplication kernel for fundamental analysis, and NVIDIA's CUTLASS library for comprehensive performance data collection across advanced configurations. Using the NVIDIA RTX 4070 as our experimental platform, we developed a Random Forest-based prediction model with multi-output regression capability. Through analysis of both naive tiled matrix multiplication with varying tile sizes (1 to 32) and 16,128 CUTLASS GEMM operations across diverse configurations, we identified critical performance patterns related to matrix dimensions, thread block configurations, and memory access patterns. Our framework achieved exceptional accuracy with an R^2 score of 0.98 for runtime prediction (mean error 15.57%) and 0.78 for power prediction (median error 5.42%). The system successfully predicts performance across matrix sizes, demonstrating robust scaling behavior. Our results show that optimal tile size selection can improve performance by up to 3.2x while reducing power consumption by 22% compared to baseline configurations. Analysis of shared memory utilization and SM occupancy reveals that tile sizes of 16x16 achieve the best balance between parallelism and resource usage. The implementation of our framework, including prediction models and analysis tools, is available as an open-source project at GPPerf [https://github.com/pavlyhalim/GPPerf].

artificial intelligence, machine learning, prediction, (18 more...)

arXiv.org Artificial Intelligence

2411.16954

Country: North America > United States > New York (0.05)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Hardware (0.82)
Energy (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.35)

Add feedback

BERT-Based Approach for Automating Course Articulation Matrix Construction with Explainable AI

Shiferaw, Natenaile Asmamaw, Leandre, Simpenzwe Honore, Sinha, Aman, Rout, Dillip

arXiv.org Artificial IntelligenceNov-21-2024

Course Outcome (CO) and Program Outcome (PO)/Program-Specific Outcome (PSO) alignment is a crucial task for ensuring curriculum coherence and assessing educational effectiveness. The construction of a Course Articulation Matrix (CAM), which quantifies the relationship between COs and POs/PSOs, typically involves assigning numerical values (0, 1, 2, 3) to represent the degree of alignment. In this study, We experiment with four models from the BERT family: BERT Base, DistilBERT, ALBERT, and RoBERTa, and use multiclass classification to assess the alignment between CO and PO/PSO pairs. We first evaluate traditional machine learning classifiers, such as Decision Tree, Random Forest, and XGBoost, and then apply transfer learning to evaluate the performance of the pretrained BERT models. To enhance model interpretability, we apply Explainable AI technique, specifically Local Interpretable Model-agnostic Explanations (LIME), to provide transparency into the decision-making process. Our system achieves accuracy, precision, recall, and F1-score values of 98.66%, 98.67%, 98.66%, and 98.66%, respectively. This work demonstrates the potential of utilizing transfer learning with BERT-based models for the automated generation of CAMs, offering high performance and interpretability in educational outcome assessment.

accuracy, alignment score, dataset, (16 more...)

arXiv.org Artificial Intelligence

2411.14254

Country:

Asia > India > Odisha (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (1.00)
Instructional Material (1.00)

Industry:

Education > Educational Setting (0.68)
Education > Curriculum (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

A Comparison of Machine Learning Algorithms for Predicting Sea Surface Temperature in the Great Barrier Reef Region

Quayesam, Dennis, Akubire, Jacob, Darkwah, Oliveira

arXiv.org Machine LearningNov-19-2024

Predicting Sea Surface Temperature (SST) in the Great Barrier Reef (GBR) region is crucial for the effective management of its fragile ecosystems. This study provides a rigorous comparative analysis of several machine learning techniques to identify the most effective method for SST prediction in this area. We evaluate the performance of ridge regression, Least Absolute Shrinkage and Selection Operator (LASSO), Random Forest, and Extreme Gradient Boosting (XGBoost) algorithms. Our results reveal that while LASSO and ridge regression perform well, Random Forest and XGBoost significantly outperform them in terms of predictive accuracy, as evidenced by lower Mean Squared Error (MSE), Mean Absolute Error (MAE), and Root Mean Squared Prediction Error (RMSPE). Additionally, XGBoost demonstrated superior performance in minimizing Kullback- Leibler Divergence (KLD), indicating a closer alignment of predicted probability distributions with actual observations. These findings highlight the efficacy of using ensemble methods, particularly XGBoost, for predicting sea surface temperatures, making them valuable tools for climatological and environmental modeling.

prediction, predictor, sea surface temperature, (12 more...)

arXiv.org Machine Learning

2411.15202

Country: Oceania > Australia > Queensland (0.04)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.78)

Add feedback

Random Forest-Supervised Manifold Alignment

Rhodes, Jake S., Rustad, Adam G.

arXiv.org Machine LearningNov-18-2024

Manifold alignment is a type of data fusion technique that creates a shared low-dimensional representation of data collected from multiple domains, enabling cross-domain learning and improved performance in downstream tasks. This paper presents an approach to manifold alignment using random forests as a foundation for semi-supervised alignment algorithms, leveraging the model's inherent strengths. We focus on enhancing two recently developed alignment graph-based by integrating class labels through geometry-preserving proximities derived from random forests. These proximities serve as a supervised initialization for constructing cross-domain relationships that maintain local neighborhood structures, thereby facilitating alignment. Our approach addresses a common limitation in manifold alignment, where existing methods often fail to generate embeddings that capture sufficient information for downstream classification. By contrast, we find that alignment models that use random forest proximities or class-label information achieve improved accuracy on downstream classification tasks, outperforming single-domain baselines. Experiments across multiple datasets show that our method typically enhances cross-domain feature integration and predictive performance, suggesting that random forest proximities offer a practical solution for tasks requiring multimodal data alignment.

artificial intelligence, information fusion, machine learning, (15 more...)

arXiv.org Machine Learning

2411.15179

Country:

Africa > Mali (0.06)
North America > United States > Utah > Utah County > Provo (0.05)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.87)

Add feedback