XAIPath

Modern medicine is producing massive amounts of data in many different formats (images of many types, signals, structured data, free text, genomics, …) making the diagnosis and treatment planning increasingly complex for humans who need to take into account all available data and influences, particularly complex data such as images or genomics. With almost all data now being available in electronic format, digital medicine has become a reality. Histopathology is one of the last domains where work is still partially carried out with analogue devices, notably microscopes that are used to analyze stained tissue samples on glass slides. Digital Whole Slide Image (WSI) scanners and viewing on computer screens are increasingly replacing these analogue devices and open the domain up for more automatic data analysis also in diagnostic workflows and not only in research where digital data processing has become standard. Part of the problem is the large size of these images, both for visualization and storage (standard images are around 100’000×100’000 pixels at a standard magnification of 40x). Often such images are viewed in multiple magnifications both digitally and under a classical microscope. Machine learning has changed many fields in the medical domain, first with the analysis of text and structured data, and in the last 20 years increasingly also for images and signals to help decision making. Within the machine learning techniques used, particularly deep learning has led to extremely good results in many focused areas over the past ten years, for example to classify chest x-rays or eye fundus images into a number of disease classes. For a few applications, human-like performance or even better has been reported, but mainly in retrospective data analysis. Prospective trials with its potentially changing data and possible outliers in the data are more challenging and results have often not been as good as in retrospective analysis, sometimes linked to problems in the model itself or in the underlying data, leading to problems with generalization. Deep learning has also a problem compared to traditional machine learning that uses handcrafted features in that the model is basically a black box, and the decisions taken cannot be explained easily. Decision trees and handcrafted features are often much easier to understand and also allow for finding problems with possibly incorrect decisions. For combing the outcomes of decision support with all other clinical data of a case the physicians thus require good tools for explaining the decisions, also to identify mistakes that an algorithm can make when the data changes or when the model has learned a spurious correlation based on the training data. Interpretable/explainable AI has thus become a very important research domain that has developed many approaches from heat maps to highlight the regions most important for a decision (as in GRAD-CAM or LIME) to models that use human-understandable concepts such as concept activation vectors or regression concept vectors. Most of these approaches are evaluated rather qualitatively, and only few papers exist that run user tests on explainable AI, even though much can be gained and learned from such tests and more quantitative evaluations. All definitions for the terms of explainability or interpretability are based on what the actual end user understands, which means that this needs to be adapted for the varying user groups. User tests are normally the best means to this. Uncertainty quantification is another means for understanding the outcomes of decision support, and we plan to also use this aspect. To properly evaluate a decision support tool with user tests it is important to have a good integration with the used clinical viewing station to use the same environment as clinicians are using in their daily clinical routine. Otherwise, there is a risk that clinicians need to learn a new user interface or modify their behavior. Some histopathology image viewers such as Sectra used in our partner hospital have a plugin mechanism that allows for such an integration. Lung cancer is the second most frequent cancer in both man and women and by far the deadliest, being responsible for about 20/% of the cancer deaths. Histopathology is the gold standard for the detailed diagnoses in lung cancer. Several sub-types of small cell lung cancer and non-small cell lung cancer exist that need to be analysed. The sub-types give good indications for treatment planning and prognosis. This project will tackle several of the current challenges of clinical decision support using deep learning and interpretable deep learning. The use case is in the field of lung cancer research and on histopathology data, a quickly changing and dynamic field in medical data analysis.

More info: https://data.snf.ch/grants/grant/220761