Abstract: Cervical cancer is the fourth most common cancer among women worldwide and is especially prevalent in low resource settings due to lack of screening and treatment options. Visual inspection with acetic acid (VIA) is a widespread and cost-effective screening method for cervical pre-cancer lesions, but accuracy depends on the experience level of the health worker. Digital cervicography, capturing images of the cervix, enables review by an off-site expert or potentially a machine learning algorithm. These reviews require images of sufficient quality. However, image quality varies greatly across users. A novel algorithm was developed to evaluate the sharpness of images captured with the MobileODT’s digital cervicography device (EVA System), in order to, eventually provide feedback to the health worker. The key challenges are that the algorithm evaluates only a single image of each cervix, it needs to be robust to the variability in cervix images and fast enough to run in real time on a mobile device, and the machine learning model needs to be small enough to fit on a mobile device’s memory, train on a small imbalanced dataset and run in real-time.
In this paper, the focus scores of a preprocessed image and a Gaussian-blurred version of the image are calculated using established methods and used as features. A feature selection metric is proposed to select the top features which were then used in a random forest classifier to produce the final focus score. The resulting model, based on nine calculated focus scores, achieved significantly better accuracy than any single focus measure when tested on a holdout set of images. The area under the receiver operating characteristics curve was 0.9459.
Keywords: Cervical cancer, low resource setting, VIA, digital cervicography, image focus, random forest classifier, feature selection