Durkheim Project Data Analysis Report
–arXiv.org Artificial Intelligence
This report describes the suicidality prediction models created under the DARPA DCAPS program in association with the Durkheim Project [http://durkheimproject.org/]. The models were built primarily from unstructured text (free-format clinician notes) for several hundred patient records obtained from the Veterans Health Administration (VHA). The models were constructed using a genetic programming algorithm applied to bag-of-words and bag-of-phrases datasets. The influence of additional structured data was explored but was found to be minor. Given the small dataset size, classification between cohorts was high fidelity (98%). Cross-validation suggests these models are reasonably predictive, with an accuracy of 50% to 69% on five rotating folds, with ensemble averages of 58% to 67%. One particularly noteworthy result is that word-pairs can dramatically improve classification accuracy; but this is the case only when one of the words in the pair is already known to have a high predictive value. By contrast, the set of all possible word-pairs does not improve on a simple bag-of-words model.
arXiv.org Artificial Intelligence
Oct-24-2013
- Country:
- North America > United States
- District of Columbia > Washington (0.04)
- Europe > United Kingdom
- England > Greater London > London (0.04)
- North America > United States
- Genre:
- Research Report > Experimental Study (0.68)
- Technology: