AITopics | faircutforest

Collaborating Authors

faircutforest

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Imputing missing values with unsupervised random trees

Cortes, David

arXiv.org Machine LearningNov-21-2019

When designing statistical models from tabular data for supervised learning tasks such as regression or classification, oftentimes it happens that some of th e observations available for fitting such models are missing values in one or more variables, usually d ue to reasons such as poor data collection practices, loss of information, participants dropping out of a survey, or similar. Many methods such as [2] or [4] overcome this issue by using heuristics to handle missing information - decision tree methods in particular, due to their splitting nature that takes one variable at a time, are particularly well suited for implicit han dling of missing data without a-priori imputation ([16]), but other methods such as gene ralized linear models or support vector machines cannot handle missing values in the same wa y, and when using them on a dataset with missing entries, these entries have to either be dr opped or imputed. Typical strategies for imputing the missing entries include: replacing them with the column mean or median, determining the most similar observations (nearest neighbors) according to the non-missing variables and taking a simple or weighted average of the m issing variable(s) from them ([11]), producing a latent representation of the data by some low-rank matrix factorization that minimizes errors on the non-missing entries and from which the m issing entries are then reconstructed ([10]), and iterative imputation that starts with so me basic imputation for all values and then cycles through each variable by constructing a mod el to predict the missing values from the non-missing observations, replacing the earlier impu tation with the model prediction and repeating until convergence ([3], [18]).

faircutforest, imputation, iterative, (16 more...)

arXiv.org Machine Learning

1911.06646

Country: North America > United States > California (0.05)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)

Add feedback