Is De-identification of Electronic Health Records Possible? OR Can We Use Health Record Corpora for Research?

Dalianis, Hercules (DSV/KTH-Stockholm University) | Nilsson, Gunnar (Department of Neurobiology, Care Sciences and Society, Center for Family and Community Medicine, Karolinska Institutet) | Velupillai, Sumithra (DSV/KTH-Stockholm University)

AAAI Conferences 

Today an immense volume of electronic health records (EHRs) is being produced. These health records contain abundant information, in the form of both structured and unstructured data. It is estimated that EHRs contain on average around 60 percent structured information, and 40 percent unstructured information that is mostly free text (Dalianis et al., 2009). A modern health record is very complex and contains a large and diverse amount of data, such as the patient’s chief complaints, diagnoses and treatment, and very often an epicrisis, or discharge letter, together with ICD-10 codes, (ICD-10, 2009). Moreover, the health record also contains information about the patient’s gender, age, times of health care visits, medication, measure values, general condition as well as social situation, drinking and eating habits. Much of this information is written in natural language. All this information in a health record is currently almost never re-used, in particular the parts that are written in free text. We believe that the information contained in EHR data sets is an invaluable source for the development and evaluation of a number of applications, useful both for research purposes as well as health practitioners. For instance, text mining tools for finding new or hidden relations between diagnoses/treatments and social situation, age and gender could be very useful for epidemiological or medical researchers. Moreover, information concerning the health process over time, per patient, clinic or hospital, can be extracted and used for further research. Another application is the use of this data as input for simulation of the health process and for future health needs. Also, such huge health record databases can be used as corpora for the generation of generalized synonyms from specialized medical terminology constitutes another exciting application. We can also foresee a text summarization system applied to an individual patient’s health record, but using knowledge from all text records and conveying the information in the health record at the right level to the specific patient. The data can also be used for developing methods where clinicians in their daily work get automatic assistance and proposals of ICD-10 codes for assigning symptoms or diagnoses, or for validating the already manually assigned ICD-10 codes.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found