AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Top Python Libraries For Data Science with Free Courses

#artificialintelligenceSep-26-2022, 11:16:23 GMT

Dask is a powerful open-source Python parallel computing framework. Dask scales Python programs from single-core local workstations to huge distributed cloud clusters. Dask provides a familiar user experience by replicating the APIs of other PyData ecosystem programs like Pandas, Scikit-learn, and NumPy. It also offers low-level APIs that allow programmers to execute bespoke algorithms concurrently.

data science, library, tutorial, (15 more...)

#artificialintelligence

Industry: Education > Educational Setting (0.40)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.34)
Information Technology > Data Science > Data Mining > Web Mining (0.31)

Add feedback

Automatic Identification and Classification of Share Buybacks and their Effect on Short-, Mid- and Long-Term Returns

Reintjes, Thilo

arXiv.org Artificial IntelligenceSep-26-2022

This thesis investigates share buybacks, specifically share buyback announcements. It addresses how to recognize such announcements, the excess return of share buybacks, and the prediction of returns after a share buyback announcement. We illustrate two NLP approaches for the automated detection of share buyback announcements. Even with very small amounts of training data, we can achieve an accuracy of up to 90%. This thesis utilizes these NLP methods to generate a large dataset consisting of 57,155 share buyback announcements. By analyzing this dataset, this thesis aims to show that most companies, which have a share buyback announced are underperforming the MSCI World. A minority of companies, however, significantly outperform the MSCI World. This significant overperformance leads to a net gain when looking at the averages of all companies. If the benchmark index is adjusted for the respective size of the companies, the average overperformance disappears, and the majority underperforms even greater. However, it was found that companies that announce a share buyback with a volume of at least 1% of their market cap, deliver, on average, a significant overperformance, even when using an adjusted benchmark. It was also found that companies that announce share buybacks in times of crisis emerge better than the overall market. Additionally, the generated dataset was used to train 72 machine learning models. Through this, it was able to find many strategies that could achieve an accuracy of up to 77% and generate great excess returns. A variety of performance indicators could be improved across six different time frames and a significant overperformance was identified. This was achieved by training several models for different tasks and time frames as well as combining these different models, generating significant improvement by fusing weak learners, in order to create one strong learner.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2209.12863

Country:

North America > United States > New York > New York County > New York City (0.14)
Asia > Japan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(16 more...)

Genre: Research Report (1.00)

Industry:

Banking & Finance > Trading (1.00)
Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.93)
(2 more...)

Add feedback

Cross-lingual Dysarthria Severity Classification for English, Korean, and Tamil

Yeo, Eun Jung, Choi, Kwanghee, Kim, Sunhee, Chung, Minhwa

arXiv.org Artificial IntelligenceSep-26-2022

This paper proposes a cross-lingual classification method for English, Korean, and Tamil, which employs both language-independent features and language-unique features. First, we extract thirty-nine features from diverse speech dimensions such as voice quality, pronunciation, and prosody. Second, feature selections are applied to identify the optimal feature set for each language. A set of shared features and a set of distinctive features are distinguished by comparing the feature selection results of the three languages. Lastly, automatic severity classification is performed, utilizing the two feature sets. Notably, the proposed method removes different features by languages to prevent the negative effect of unique features for other languages. Accordingly, eXtreme Gradient Boosting (XGBoost) algorithm is employed for classification, due to its strength in imputing missing data. In order to validate the effectiveness of our proposed method, two baseline experiments are conducted: experiments using the intersection set of mono-lingual feature sets (Intersection) and experiments using the union set of mono-lingual feature sets (Union). According to the experimental results, our method achieves better performance with a 67.14% F1 score, compared to 64.52% for the Intersection experiment and 66.74% for the Union experiment. Further, the proposed method attains better performances than mono-lingual classifications for all three languages, achieving 17.67%, 2.28%, 7.79% relative percentage increases for English, Korean, and Tamil, respectively. The result specifies that commonly shared features and language-specific features must be considered separately for cross-language dysarthria severity classification.

artificial intelligence, classification, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2209.12942

Country:

Asia > Thailand > Chiang Mai > Chiang Mai (0.05)
Asia > South Korea > Seoul > Seoul (0.05)
North America > Canada > Quebec > Montreal (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Modelling the Frequency of Home Deliveries: An Induced Travel Demand Contribution of Aggrandized E-shopping in Toronto during COVID-19 Pandemics

Liu, Yicong, Wang, Kaili, Loa, Patrick, Habib, Khandker Nurul

arXiv.org Artificial IntelligenceSep-21-2022

The dramatic growth of e-shopping will undoubtedly cause significant impacts on travel demand. As a result, transportation modeller's ability to model e-shopping demand is becoming increasingly important. This study developed models to predict households' weekly home delivery frequencies. We used both classical econometric and machine learning techniques to obtain the best model. It is found that socioeconomic factors such as having an online grocery membership, household members' average age, the percentage of male household members, the number of workers in the household and various land-use factors influence home delivery demand. This study also compared the interpretations and performances of the machine learning models and the classical econometric model. Agreement is found in the variable's effects identified through the machine learning and econometric models. However, with similar recall accuracy, the ordered probit model, a classical econometric model, can accurately predict the aggregate distribution of household delivery demand. In contrast, both machine learning models failed to match the observed distribution.

artificial intelligence, household, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2209.10664

Country:

North America > Canada > Ontario > Toronto (0.51)
North America > United States > California (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Transportation (1.00)
Retail (0.95)
Education > Educational Setting (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

ESTA: An Esports Trajectory and Action Dataset

Xenopoulos, Peter, Silva, Claudio

arXiv.org Artificial IntelligenceSep-20-2022

Sports, due to their global reach and impact-rich prediction tasks, are an exciting domain to deploy machine learning models. However, data from conventional sports is often unsuitable for research use due to its size, veracity, and accessibility. To address these issues, we turn to esports, a growing domain that encompasses video games played in a capacity similar to conventional sports. Since esports data is acquired through server logs rather than peripheral sensors, esports provides a unique opportunity to obtain a massive collection of clean and detailed spatiotemporal data, similar to those collected in conventional sports. To parse esports data, we develop awpy, an open-source esports game log parsing library that can extract player trajectories and actions from game logs. Using awpy, we parse 8.6m actions, 7.9m game frames, and 417k trajectories from 1,558 game logs from professional Counter-Strike tournaments to create the Esports Trajectory and Actions (ESTA) dataset. ESTA is one of the largest and most granular publicly available sports data sets to date. We use ESTA to develop benchmarks for win prediction using player-specific information.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2209.09861

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York (0.04)
(14 more...)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment > Sports (1.00)
Leisure & Entertainment > Games > Computer Games (0.89)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
(2 more...)

Add feedback

Accurate ADMET Prediction with XGBoost

Tian, Hao, Ketkar, Rajas, Tao, Peng

arXiv.org Artificial IntelligenceSep-18-2022

The absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties are important in drug discovery as they define efficacy and safety. In this work, we applied an ensemble of features, including fingerprints and descriptors, and a tree-based machine learning model, extreme gradient boosting, for accurate ADMET prediction. Our model performs well in the Therapeutics Data Commons ADMET benchmark group. For 22 tasks, our model is ranked first in 18 tasks and top 3 in 21 tasks. The trained machine learning models are integrated in ADMETboost, a web server that is publicly available at https://ai-druglab.smu.edu/admet.

descriptor, machine learning, prediction, (16 more...)

arXiv.org Artificial Intelligence

2204.07532

Country:

North America > United States > Texas > Dallas County > Dallas (0.04)
North America > United States > Texas > Collin County > Frisco (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Add feedback

Feature Importance to Predict Mushrooms' Edibility in Python

#artificialintelligenceSep-14-2022, 18:17:52 GMT

This article aims at leveraging feature importance to assess whether all the columns within a dataset need to be used for prediction or not. Imagine you're enjoying a walk in the woods and you find some mushrooms on the side of the path. Wouldn't it be nice to input some of their features into an ML-powered application that can detect with confidence edible qualities? I'm personally not into mushroom hunting but I'm definitely into food, and I can already smell a nice dish of "tagliolini ai funghi" in front of me after a long walk. Mushrooms are fungi, part of a kingdom of their own separate from plants and animals.

algorithm, mushroom, random forest, (15 more...)

#artificialintelligence

Country: North America > United States (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.38)

Add feedback

Machine learning for real-time aggregated prediction of hospital admission for emergency patients - npj Digital Medicine

#artificialintelligenceSep-14-2022, 14:35:53 GMT

Machine learning for hospital operations is under-studied. We present a prediction pipeline that uses live electronic health-records for patients in a UK teaching hospital’s emergency department (ED) to generate short-term, probabilistic forecasts of emergency admissions. A set of XGBoost classifiers applied to 109,465 ED visits yielded AUROCs from 0.82 to 0.90 depending on elapsed visit-time at the point of prediction. Patient-level probabilities of admission were aggregated to forecast the number of admissions among current ED patients and, incorporating patients yet to arrive, total emergency admissions within specified time-windows. The pipeline gave a mean absolute error (MAE) of 4.0 admissions (mean percentage error of 17%) versus 6.5 (32%) for a benchmark metric. Models developed with 104,504 later visits during the Covid-19 pandemic gave AUROCs of 0.68–0.90 and MAE of 4.2 (30%) versus a 4.9 (33%) benchmark. We discuss how we surmounted challenges of designing and implementing models for real-time use, including temporal framing, data preparation, and changing operational conditions.

admission, prediction, real-time aggregated prediction, (13 more...)

#artificialintelligence

Industry:

Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.34)

Add feedback

FLInt: Exploiting Floating Point Enabled Integer Arithmetic for Efficient Random Forest Inference

Hakert, Christian, Chen, Kuan-Hsun, Chen, Jian-Jia

arXiv.org Artificial IntelligenceSep-9-2022

In many machine learning applications, e.g., tree-based ensembles, floating point numbers are extensively utilized due to their expressiveness. Nowadays performing data analysis on embedded devices from dynamic data masses becomes available, but such systems often lack hardware capabilities to process floating point numbers, introducing large overheads for their processing. Even if such hardware is present in general computing systems, using integer operations instead of floating point operations promises to reduce operation overheads and improve the performance. In this paper, we provide \mdname, a full precision floating point comparison for random forests, by only using integer and logic operations. To ensure the same functionality preserves, we formally prove the correctness of this comparison. Since random forests only require comparison of floating point numbers during inference, we implement \mdname~in low level realizations and therefore eliminate the need for floating point hardware entirely, by keeping the model accuracy unchanged. The usage of \mdname~basically boils down to a one-by-one replacement of conditions: For instance, a comparison statement in C: if(pX[3]<=(float)10.074347) becomes if((*(((int*)(pX))+3))<=((int)(0x41213087))). Experimental evaluation on X86 and ARMv8 desktop and server class systems shows that the execution time can be reduced by up to $\approx 30\%$ with our novel approach.

artificial intelligence, implementation, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2209.04181

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.93)

Add feedback

Patient-specific modelling, simulation and real-time processing for respiratory diseases

Nousias, Stavros

arXiv.org Artificial IntelligenceSep-8-2022

Asthma is a common chronic disease of the respiratory system causing significant disability and societal burden. It affects more than 300 million people worldwide, while more than 100 million people will likely have asthma by 2025. The price of asthma varies greatly from nation to nation. Mean yearly cost can be estimated to 1900 EUR in Europe and $3100 in the United States. Managing asthma involves controlling symptoms, preventing exacerbations, and maintaining lung function. Improved asthma control is reduces the risk of exacerbations and lung function impairment while reducing the direct costs of asthma care and indirect costs associated with reduced productivity. Understanding the complex dynamics of the pulmonary system and the lung's response to disease is fundamental to the advancement of Asthma treatment. Computational models of the respiratory system seek to provide a theoretical framework to understand the interaction between structure and function. Their application can improve pulmonary medicine by a patient-specific approach to medicinal methodologies optimizing the delivery given the personalized geometry and personalized ventilation patterns. A three-fold objective is addressed within this dissertation. The first part refers to the comprehension of pulmonary pathophysiology and the mechanics of Asthma and subsequently of constrictive pulmonary conditions in general. The second part refers to the design and implementation of tools that facilitate personalized medicine to improve delivery and effectiveness. Finally, the third part refers to the self-management of the condition, meaning that medical personnel and patients have access to tools and methods that allow the first party to easily track the course of the condition and the second party, i.e. the patient to easily self-manage it alleviating the significant burden from the health system.

data mining, environmental parameter and medication monitoring, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2207.01082

Country:

North America > United States (1.00)
Europe (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (0.92)
Research Report > Promising Solution (0.67)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases > Asthma (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Technology (1.00)
(4 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
(11 more...)

Add feedback