Autor:innen:
J. Kirchhoff (Karlsruhe, DE)
F. Stumpe (Karlsruhe, DE)
N. Heilig (Karlsruhe, DE)
L. Flek (Marburg, DE)
J. Plepi (Marburg, DE)
H. Paulheim (Mannheim, DE)
Medical diagnosis is the process of making a prediction of the disease a patient is likely to have, given a set of symptoms and observations. This requires extensive expert knowledge, in particular when covering a large variety of diseases. Such knowledge can be coded in a knowledge graph -- encompassing diseases, symptoms, and diagnosis paths. Since both the knowledge itself and its encoding can be incomplete, refining the knowledge graph with additional information helps physicians making better predictions. At the same time, for deployment in a hospital, the diagnosis must be explainable and transparent. In this paper, we present an approach using diagnosis paths in a medical knowledge graph. We show that those graphs can be refined using latent representations with RDF2vec, while the final diagnosis is still made in an explainable way. Using both an intrinsic as well as an expert-based evaluation, we show that the embedding-based prediction approach is beneficial for refining the graph with additional valid conditions.
In this paper, we have introduced the medicalvalues knowledge graph, which is used for medical diagnosis using so-called diagnosis paths. Those paths allow for a transparent prediction of a patient’s disease. Since the paths are developed
manually, they are notoriously incomplete.To tackle this incompleteness, we have introduced an approach which first
enriches the medicalvalues knowledge graph into a augmented graph, connecting it to a large dataset of patient records. On that augmented graph, we have trained vector embeddings with RDF2vec, which are used to predict completions.
Both in an internal validation as well as in an expert evaluation, we have shown that the prediction of such extensions is possible with high precision. This methodology of enriching the graph and producing predictions therewith is independent of the task and domain at hand. One key limitation of the approach is the external data used, which is data gathered from intensive care units. Therefore, diseases which do rarely lead to treatments in intensive care are not well covered. In order to augment diagnosis paths for as versatile diseases as possible, other external datasets should be considered as well. Here, the connectors to clinic information systems (CIS) and laboratory information systems (LIS) may also add large-scale instance data in the future, which can also be exploited with the same methodology.
So far, drugs are not represented in the medicalvalues knowledge graph. In the future, we would like to include them, both as a part of a patient’s medical history (i.e., existing medication), as well as possible treatments once a diagnosis
is made. To that end, we plan to augment the graph with existing datasets on drugs and drug interactions.
Full paper: https://arxiv.org/pdf/2204.13329.pdf