ptype: Probabilistic Type Inference
Ceritli, Taha, Williams, Christopher K. I., Geddes, James
The data type, missing data and, anomalies can be defined in broad terms as follows: The data type is the common characteristic that is expected to be shared by entries in a column, such as integers, strings, IP addresses, dates, etc., while missing data denotes an absence of a data value which can be encoded in various ways, and anomalies refer to values whose types differ from the given column type or the missing type. In order to model above types, we have developed PFSMs that can generate values from the corresponding domains. This, in turn, allows us to calculate the probability of a given data value being generated by a particular PFSM. We then combine these PFSMs in our model such that a data column x can be annotated via probabilistic inference in the proposed model, i.e., given a column of data, we can infer column type, and rows with missing and anomalous values.
Nov-22-2019
- Country:
- Europe > United Kingdom
- England > Greater London > London (0.04)
- North America > United States
- California > San Francisco County
- San Francisco (0.04)
- Iowa > Story County
- Ames (0.04)
- Massachusetts (0.04)
- New York > New York County
- New York City (0.04)
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- California > San Francisco County
- Oceania > Australia
- Australian Capital Territory > Canberra (0.04)
- Europe > United Kingdom
- Genre:
- Research Report (1.00)
- Industry:
- Health & Medicine > Therapeutic Area (1.00)
- Technology: