Our paper “DISTANT-CTO: A Zero Cost, Distantly Supervised Approach to Improve Low-Resource Entity Extraction Using Clinical Trials Literature”, by A. Dhrangadhariya and H. Müller is accepted for presentation at the BioNLP workshop at ACL2022 (Dublin, 22-27 May).
The paper takes strides toward democratizing entity/span level PICO recognition by automatically creating a large entity annotated dataset using open-access resources. PICO extraction is a vital but tiring process for writing systematic reviews.
The approach uses gestalt pattern matching and a modified version of the match scoring to flexibly align structured terms from http://clinicaltrials.gov onto the unstructured text and select high-confidence match candidates.
The method is extensible and data-centric which allows for constant extension of the dataset, and for optimization of the automatic annotation method for the recall-oriented PICO recognition.
