Model-based clustering of partial records

Mar-30-2021–arXiv.org Machine Learning

In practice, real data sets may have missing values or otherwise have only partially observed records that complicate the validity and application validity of standard statistical methodology. Missingness may result from diverse causes, with an underlying mechanism of one of three types: missing completely at random (MCAR), missing at random (MAR), or not missing at random (NMAR) [16]. Under MCAR, the probability that a case (record, sample, observation) is missing feature (variable, attribute, dimension) values does not depend on either the observed or missing feature values. When the probability that a case is missing feature values may depend on the observed feature values, but not the missing feature values, the mechanism is MAR. In the more extreme and challenging case of NMAR, the probability that a case is missing feature values depends on both observed and missing feature values. Notably, if the data are MCAR, they are also MAR; if the data are not MAR, then they are NMAR. Strategies for analysis of data with missing values are often critically dependent on the missingness mechanism, and clustering is no exception. For clustering problems, the most common (and often expedient) treatment of missing values is deletion, on either a case or feature basis, or imputation [17], [18].

algorithm, iteration, missingness mechanism, (15 more...)

arXiv.org Machine Learning

Mar-30-2021

arXiv.org PDF

Add feedback

Country:
- Asia > India (0.04)
- North America > United States
  - Iowa (0.04)
  - New York (0.04)
  - Washington > King County
    - Seattle (0.04)

Genre:
- Research Report (1.00)

Industry:
- Health & Medicine (1.00)
- Food & Agriculture (0.93)
- Government > Regional Government
  - North America Government > United States Government (0.68)

Technology:
- Information Technology
  - Data Science (1.00)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Machine Learning > Statistical Learning
      - Clustering (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found