ISSN: 2376-130X
Duncan Wallace
In an era of “big data”, computational solutions through large-scale machine learning (ML) have provided recourse to problems which previously would have proven very difficult to address. In recent years, ML approaches have been successfully applied to analysis of patient symptom data in the context of disease diagnosis, at least where such data is well codified. However, much of the data present in Electronic Health Records (EHR) are unlikely to prove suitable for classic ML approaches. Furthermore, as scores of data are widely spread across both hospitals and individuals, a decentralised, computationally scalable methodology is a priority. Our research is based upon the early identification of a small subsection of patients who are frequent users. These are patients who have underlying conditions which will cause them to repeatedly require medical attention. OOHC act as an ad-hoc delivery of telemedicine and treatment, where interactions occur without recourse to a full medical history of the patient in question. Medical histories, relating to patients contacting an OOHC, may reside in several distinct EHR systems in multiple hospitals or surgeries, which are unavailable to the OOHC in question. As such, although a local solution is a better option for this problem, it follows that the data under investigation is incomplete, heterogeneous, and comprised mostly of noisy textual notes compiled during routine OOHC activities. Through a range of machine learning methodologies, the aim of this research is to provide the means to identify patient cases, upon initial contact, which are likely to relate to such outliers. In particular, deep learning approaches were adopted in the development of a system of classification of these cases. A further aim of this research is to elucidate the discovery of frequent user cases by examining the exact terms which provide strong indication of positive and negative case entries.