Applying Machine Learning to SEC Filings to find Anomalous Companies
Contemporary machine learning algorithms are well-suited to the complex, high-dimensional data associated with accounting records. In this short note we apply a simple unsupervised algorithm to find anomalous companies -- those with accounting metrics that don't match the statistical patterns implied by the bulk of the companies. To do this we leverage the SEC structured financial statements data set, a regularly updated collection of the machine-readable numeric core of the financial disclosures regularly filed to the SEC through its EDGAR system. We use the reported company assets as a normalizing factor; while size is of course a variable of interest, we are looking for less obvious, scale-independent patterns and anomalies. Note that axis and values in the graph above are in many ways arbitrary; it's simply a reasonable effort at representing in three dimensions the relative distances between points in the six-dimensional data space for the company fillings.
Apr-20-2018, 19:10:17 GMT
- Country:
- North America > United States (1.00)
- Industry:
- Law > Business Law (1.00)
- Banking & Finance (1.00)
- Government > Regional Government
- Technology: