AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Online Machine Learning Techniques for Coq: A Comparison

Zhang, Liao, Blaauwbroek, Lasse, Piotrowski, Bartosz, Černý, Prokop, Kaliszyk, Cezary, Urban, Josef

arXiv.org Artificial IntelligenceApr-12-2021

We present a comparison of several online machine learning techniques for tactical learning and proving in the Coq proof assistant. This work builds on top of Tactician, a plugin for Coq that learns from proofs written by the user to synthesize new proofs. This learning happens in an online manner -- meaning that Tactician's machine learning model is updated immediately every time the user performs a step in an interactive proof. This has important advantages compared to the more studied offline learning systems: (1) it provides the user with a seamless, interactive experience with Tactician and, (2) it takes advantage of locality of proof similarity, which means that proofs similar to the current proof are likely to be found close by. We implement two online methods, namely approximate $k$-nearest neighbors based on locality sensitive hashing forests and random decision forests. Additionally, we conduct experiments with gradient boosted trees in an offline setting using XGBoost. We compare the relative performance of Tactician using these three learning methods on Coq's standard library.

proof state, random forest, tactician, (14 more...)

arXiv.org Artificial Intelligence

2104.05207

Country:

Europe > Italy (0.04)
South America > Brazil > Rio Grande do Norte > Natal (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(8 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Individual Explanations in Machine Learning Models: A Case Study on Poverty Estimation

Carrillo, Alfredo, Cantú, Luis F., Tejerina, Luis, Noriega, Alejandro

arXiv.org Artificial IntelligenceApr-11-2021

A. Relevance of Model Explanations in Real-World Contexts Complex estimation and decision-making tasks have traditionally been analyzed and judged by human experts. Hence, decisions have typically been able to be complemented with human-interpretable justifications, when needed, as experts can normally explain the line-of-thought that led to their own decision-making. However, in the past two decades, algorithmic decision-making has spread increasingly to many relevant societal contexts. Despite the notable enthusiasm for the potential benefit that this type of technology can bring, the underlying methods used are typically not inherently transparent, in the sense that they do not readily provide human-interpretable justifications for their decisions [1]. Moreover, in recent years there is a trend where the most successful algorithms, particularly in complex tasks like machine vision and natural language processing, tend to rely on highly complex models, which has led to a further increase in tension between accuracy and interpretability [2]. Relevant societal contexts where algorithmic decision systems have gained substantial traction include medical diagnosis and treatment [3], counter-terrorism [4], criminal justice [5], and risk assessments for credits and insurance [6]. In such impactful contexts, there is a legitimate need for providing human-interpretable explanations along with the estimations and decisions made. Indeed, lack of interpretability has become a barrier to the adoption of machine learning-based systems in many institutions and companies. Hence the value of complementing ML models with human-interpretable accounts of the statistical rationals behind their estimations, in a way that human decision-makers can more easily understand machine estimations, and even integrate their statistical rationals with qualitative information and human expert judgements.

explanation, household, interpretability, (16 more...)

arXiv.org Artificial Intelligence

2104.04148

Country:

South America (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Costa Rica (0.04)
North America > Central America (0.04)

Genre: Research Report (0.50)

Industry:

Information Technology > Security & Privacy (0.48)
Banking & Finance (0.46)
Law Enforcement & Public Safety (0.34)
Law (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.46)

Add feedback

Utilizing XGBoost training reports to improve your models

#artificialintelligenceApr-8-2021, 16:26:32 GMT

In 2019, AWS unveiled Amazon SageMaker Debugger, a SageMaker capability that enables you to automatically detect a variety of issues that may arise while a model is being trained. SageMaker Debugger captures model state data at specified intervals during a training job. With this data, SageMaker Debugger can detect training issues or anomalies by leveraging built-in or user-defined rules. In addition to detecting issues during the training job, you can analyze the captured state data afterwards to evaluate model performance and identify areas for improvement. This task is made easier with the newly launched XGBoost training report feature.

training job, training report, xgboost training report, (13 more...)

#artificialintelligence

Industry:

Retail > Online (0.40)
Leisure & Entertainment > Sports (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.69)

Add feedback

Random forest regressor sklearn : Step By Step Implementation

#artificialintelligenceApr-5-2021, 21:24:02 GMT

There are various hyperparameter in RandomForestRegressor class but their default values like n_estimators 100, *, criterion'mse', max_depth None, min_samples_split 2 etc. We can choose their optimal values using some hyperparametric tuning techniques like GridSearchCV and RandomSearchCV. Most Importantly, In this article, we will demonstrate you to end to end implementation of Random forest regressor sklearn. Firstly you will package using the import statement. Secondly, We will create the object of the Random forest regressor.

implementation, random forest regressor sklearn, randomforestregressor class, (2 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.91)

Add feedback

XGBoost Algorithm: Long May She Reign!

#artificialintelligenceApr-1-2021, 21:30:24 GMT

Decision Tree: Every hiring manager has a set of criteria such as education level, number of years of experience, interview performance. A decision tree is analogous to a hiring manager interviewing candidates based on his or her own criteria. Bagging: Now imagine instead of a single interviewer, now there is an interview panel where each interviewer has a vote. Bagging or bootstrap aggregating involves combining inputs from all interviewers for the final decision through a democratic voting process. Random Forest: It is a bagging-based algorithm with a key difference wherein only a subset of features is selected at random.

criteria, interviewer, xgboost algorithm, (11 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Decision Trees, Random Forests & Gradient Boosting in R

#artificialintelligenceApr-1-2021, 04:41:54 GMT

Would you like to build predictive models using machine learning? That s precisely what you will learn in this course "Decision Trees, Random Forests and Gradient Boosting in R." My name is Carlos Martínez, I have a Ph.D. in Management from the University of St. Gallen in Switzerland. I have presented my research at some of the most prestigious academic conferences and doctoral colloquiums at the University of Tel Aviv, Politecnico di Milano, University of Halmstad, and MIT. Furthermore, I have co-authored more than 25 teaching cases, some of them included in the case bases of Harvard and Michigan. This is a very comprehensive course that includes presentations, tutorials, and assignments. The course has a practical approach based on the learning-by-doing method in which you will learn decision trees and ensemble methods based on decision trees using a real dataset.

decision tree, random forest, university, (5 more...)

#artificialintelligence

Country:

North America > United States > Michigan (0.27)
Europe > Switzerland (0.27)
Europe > Sweden > Halland County > Halmstad (0.27)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.27)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

MOAI: A methodology for evaluating the impact of indoor airflow in the transmission of COVID-19

Oehmichen, Axel, Guitton, Florian, Wahl, Cedric, Foing, Bertrand, Tziamtzis, Damian, Guo, Yike

arXiv.org Machine LearningMar-31-2021

Epidemiology models play a key role in understanding and responding to the COVID-19 pandemic. In order to build those models, scientists need to understand contributing factors and their relative importance. A large strand of literature has identified the importance of airflow to mitigate droplets and far-field aerosol transmission risks. However, the specific factors contributing to higher or lower contamination in various settings have not been clearly defined and quantified. As part of the MOAI project (https://moaiapp.com), we are developing a privacy-preserving test and trace app to enable infection cluster investigators to get in touch with patients without having to know their identity. This approach allows involving users in the fight against the pandemic by contributing additional information in the form of anonymous research questionnaires. We first describe how the questionnaire was designed, and the synthetic data was generated based on a review we carried out on the latest available literature. We then present a model to evaluate the risk exposition of a user for a given setting. We finally propose a temporal addition to the model to evaluate the risk exposure over time for a given user.

covid-19, exposure, transmission, (15 more...)

arXiv.org Machine Learning

2103.17096

Country:

North America > United States (0.28)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Ireland (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
(2 more...)

Add feedback

Classifying the Unstructured IT Service Desk Tickets Using Ensemble of Classifiers

C, Ramya, P, Paramesh S., S, Shreedhara K

arXiv.org Artificial IntelligenceMar-30-2021

Manual classification of IT service desk tickets may result in routing of the tickets to the wrong resolution group. Incorrect assignment of IT service desk tickets leads to reassignment of tickets, unnecessary resource utilization and delays the resolution time. Traditional machine learning algorithms can be used to automatically classify the IT service desk tickets. Service desk ticket classifier models can be trained by mining the historical unstructured ticket description and the corresponding label. The model can then be used to classify the new service desk ticket based on the ticket description. The performance of the traditional classifier systems can be further improved by using various ensemble of classification techniques. This paper brings out the three most popular ensemble methods ie, Bagging, Boosting and Voting ensemble for combining the predictions from different models to further improve the accuracy of the ticket classifier system. The performance of the ensemble classifier system is checked against the individual base classifiers using various performance metrics. Ensemble of classifiers performed well in comparison with the corresponding base classifiers. The advantages of building such an automated ticket classifier systems are simplified user interface, faster resolution time, improved productivity, customer satisfaction and growth in business. The real world service desk ticket data from a large enterprise IT infrastructure is used for our research purpose.

classifier, classifier model, ticket, (13 more...)

arXiv.org Artificial Intelligence

2103.15822

Country:

Asia > India > Karnataka (0.05)
Asia > Singapore (0.04)

Genre: Research Report (0.83)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.97)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.95)
(2 more...)

Add feedback

Individually Fair Gradient Boosting

Vargo, Alexander, Zhang, Fan, Yurochkin, Mikhail, Sun, Yuekai

arXiv.org Machine LearningMar-30-2021

We consider the task of enforcing individual fairness in gradient boosting. Gradient boosting is a popular method for machine learning from tabular data, which arise often in applications where algorithmic fairness is a concern. At a high level, our approach is a functional gradient descent on a (distributionally) robust loss function that encodes our intuition of algorithmic fairness for the ML task at hand. Unlike prior approaches to individual fairness that only work with smooth ML models, our approach also works with non-smooth models such as decision trees. We show that our algorithm converges globally and generalizes. We also demonstrate the efficacy of our algorithm on three ML problems susceptible to algorithmic bias.

conference paper, fairness, individual fairness, (16 more...)

arXiv.org Machine Learning

2103.16785

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Michigan (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Banking & Finance > Credit (0.68)
Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Trees, Forests, Chickens, and Eggs: When and Why to Prune Trees in a Random Forest

Zhou, Siyu, Mentch, Lucas

arXiv.org Machine LearningMar-30-2021

Due to their long-standing reputation as excellent off-the-shelf predictors, random forests continue remain a go-to model of choice for applied statisticians and data scientists. Despite their widespread use, however, until recently, little was known about their inner-workings and about which aspects of the procedure were driving their success. Very recently, two competing hypotheses have emerged -- one based on interpolation and the other based on regularization. This work argues in favor of the latter by utilizing the regularization framework to reexamine the decades-old question of whether individual trees in an ensemble ought to be pruned. Despite the fact that default constructions of random forests use near full depth trees in most popular software packages, here we provide strong evidence that tree depth should be seen as a natural form of regularization across the entire procedure. In particular, our work suggests that random forests with shallow trees are advantageous when the signal-to-noise ratio in the data is low. In building up this argument, we also critique the newly popular notion of "double descent" in random forests by drawing parallels to U-statistics and arguing that the noticeable jumps in random forest accuracy are the result of simple averaging rather than interpolation.

ensemble, random forest, tree depth, (16 more...)

arXiv.org Machine Learning

2103.167

Country: North America > United States > New York (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback