Introduction
Despite the development of effective methods of early detection of cervical cancer many years ago, it is still an important cause of mortality among women in a number of countries, accounting for around 8% of all cancer deaths in women [1]. In Poland, cervical cancer is the fourth most common cancer in women and the third cause of mortality due to cancer [2]. It was shown that conventional cytology testing with Papanicolaou staining is an effective method to reduce the incidence of invasive cervical cancer [3]. The impact of the reduction was greater in countries that conducted organised screening [4, 5]. A recent pooled analysis of follow-up data from four randomized controlled trials conducted in Sweden (Swedescreen), the Netherlands (POBASCAM), England (ARTISTIC), and Italy (NTCC) demonstrated that, compared to cytology, human papillomavirus (HPV)-based screening provides 60–70% greater protection against invasive cervical carcinomas; the respective pooled rate ratio for invasive cervical carcinoma among women with a negative screening test at entry was 0.30 (95% CI 0.15–0.60) [4]. Consistent implementation of population screening requires appropriate methods of recruiting patients as well as having adequate resources and trained staff. Recently introduced in a large scale, liquid-based cytology (LBC) gives a chance to evaluate cervical smears and perform HPV-based screening. At the same time, LBC preparations are easier to automatically evaluate because overlapping cell clusters are rare compared to conventional cytological preparations. Recently, the Focal Point Slide Profiler (FSPS)® and the ThinPrep Imaging System® have been approved by the Food and Drug Administration as primary screening tools for cervical cytology smears. However, both of these systems are closed source and use LBC systems prepared by a proprietary method. The usefulness of automatic systems has been proven [6]; however, their sensitivity and specificity can be improved and costs reduced. Recently, researchers have begun to associate the improvement of automated systems using the artificial neural network (U-NET architecture) [7].
Within this project, we built a prototype of a new device together with implemented software using U-NET and CNN architectures of neural networks to convert the currently used optical microscopes into fully independent scanning systems for cytological samples. The use of the device is intended to improve the effectiveness of cytological screening and registration of cytological test results. The features of the software include digital backup as well as transmission and telemedicine evaluation.
To assess the quality of the device we compared 2058 unselected liquid-based cervical cytologies augmented by a computer-assisted image analysis system vs. manually read. The following outcome measures were evaluated: detection rates, relative sensitivity, sensitivity difference, specificity difference.
Material and methods
The software uses the system of two neural networks – the artificial neural network (U-NET architecture) designed to recognise suspicious regions and enhanced CNN neural network (VGG type) allowing the determination of the type of disorder, such as ASC-US, ASC-H, HSIL, LSIL, AGC, and cancer in LBC cytological surveys. After registration of each image, they were normalised according to the formula:
where:
I (x, y) – the value of a pixel in an image for position x, y,
D (x, y) – the value of a pixel in an image (“dark current” image) for the position x, y,
W (x, y) – the value of a pixel in an image (“white image”) for the position x, y.
After normalisation, the images became standardised and could be compared to images from other microscopes (in this way, problems, i.e. those associated with uneven lighting, are avoided). Then, for each recorded image we used an algorithm whose aim was to put all the images together in one that shows the entire sample placed under the microscope. The algorithms like SURF, SWIFT, etc. have been not used for licensing reasons. The methods have been developed and applied, the task of which is to best adapt the images to each other, minimising the error value between image positions. One LBC measurement contains an average of 250 images, and each of them had dimensions of 2592 × 2048 pixels (width × height). After these procedures, a sample image was prepared, which was ready for the analysis carried out by diagnosticians. Their task was to identify all areas with abnormalities like ASC-US, ASC-H, HSIL, LSIL, AGC, cancer, or suspected (in a case when during the analysis the diagnostician could not make a diagnosis because of a blurred image or the analysis was ambiguous). To develop the model for system evaluation 7128 LBC samples were evaluated by trained cyto-screeners, scanned, and archived in the device. Cytological abnormalities like ASC-US, ASC-H, LSIL, HSIL AGC, and cancer were found in 254 (3.6%) cases. Detailed results of 7128 samples evaluation are shown in Table 1.
Table 1
Characteristics | LBC samples evaluated by screener | |
---|---|---|
n | % | |
ASC-US | 95 | 1.333 |
ASC-H | 41 | 0.575 |
LSIL | 74 | 1.038 |
HSIL | 36 | 0.505 |
AGC | 4 | 0.056 |
Adeno Ca | 0 | 0.0 |
Ca Plano | 3 | 0.042 |
HSIL + AGC | 1 | 0.014 |
No abnormality | 6874 | 96.436 |
Total | 7128 | 100.0 |
The 291 selected samples with normal cytological pattern and diagnosed abnormality were a model to teach the artificial neural networks (Table 2). To complete this task a network based on the U-NET architecture was used [8, 9]. The network was implemented using TensorFlow 1.2 and Keras 2 and was taught using an Nvidia Geforce 2080 RTX TI graphics card. Elaboration of this procedure allowed us to create the part of the application that is responsible for screening and preliminary evaluation of samples. Then, another neural network was implemented the purpose of which was to mark the state of the marked fragment by the U-NET network. For this purpose, the VGG-16 network was used. It should be noted that this network has not been taught from the beginning. Only predefined network weights were used (ImageNet), and then the widely described transfer learning method was used [10]. Therefore, there was no need to learn the network from the beginning, which would significantly increase the system’s working time and make it more difficult. The VGG-16 network has been taught based on the description of the diagnostics. Both in the case of U-NET and VGG-16 networks, 15% of the images from the entire learning pool were used to verify the learning process. Then the entire system was applied to evaluate the tests from which it did not learn in order to determine its accuracy.
Table 2
Characteristics | LBC samples used to learn the system | |
---|---|---|
n | % | |
ASC-US | 30 | 10.31 |
ASC-H | 18 | 6.18 |
LSIL | 33 | 11.34 |
HSIL | 8 | 2.75 |
AGC | 0 | 0 |
Adeno Ca | 0 | 0 |
Ca Plano | 2 | 0.69 |
HSIL + AGC | 0 | 0 |
No abnormality | 200 | 68.73 |
Total | 291 | 100 |
To assess the quality of the device we compared the diagnosis of 2058 consecutive liquid-based cervical augmented cytologies obtained by a computer-assisted image analysis system vs. manually read. The study was approved by the Ethics Committee of Pomeranian Medical University in Szczecin, Poland.
Results
The detailed results of the system specificity and sensitivity evaluation based on comparison of diagnosis obtained by the computer-assisted image analysis system vs. manually read are shown in Table 3.
Table 3
Of the 58 abnormal samples diagnosed by cyto-screeners, all were flagged by the software in the same way. Similarly, there was 100% concordance in the evaluation of 2000 normal samples performed by cyto-screeners and the software. We observed slight incompatibility in results obtained by cyto-screeners and the software in the evaluation of the type of abnormality. One case flagged by software as LSIL was diagnosed by cyto-screeners as ASC-US. In one case evaluated by software, ASC-H was indicated, but cyto-screeners diagnosed HSIL, and in one case cancer was flagged with the use of software, but HSIL was diagnosed by cyto-screeners. In this category, 94.8% concordance was observed (Table 3).
Discussion
The use of automatic evaluation of cervical smears gives a chance to increase the efficiency of population screening towards the detection of cervical precancerous lesions and early stages of cervical cancer. Medical professionals require devices to achieve maximum sensitivity and specificity. It is particularly important to avoid false negative results. In such cases, a patient with a precancerous or cancerous lesion could be undiagnosed.
Conventional cytological preparations are more difficult for automatic evaluation because of the segmentation of overlapping cells and separation of cells from neutrophils and background debris. A number of methods trying to solve these problems have been developed [11–13]; however, there are only a few studies that have been conducted on conventional smears. They often point to a high percentage of cases eliminated for further automatic analysis due to the difficulties mentioned above [14, 15]. The software developed for evaluation of conventional cytological preparations are characterised by limited sensitivity [16].
The preparation of liquid-based cervical smears, which are characterised by reproducibility and good accuracy, allows problems typical for conventional cytology to be avoided. Repeatability of the samples has enabled the development and implementation of automated assessment methods. Currently, researchers are working to achieve the highest possible sensitivity and specificity and to reduce costs. Wilbur et al. observed a sensitivity of 77% for HSIL and 86% for cancer with the use of commercially available AutoPap 300 systems when compared with biopsy results [17]. This system is characterised by about a 40% rate of false-positive results in reviewing normal slides. Zhang et al. demonstrated 88.1% sensitivity and 100% specificity (100%) on screening in manual LBC with hematoxylin and eosin staining. They achieved a 93% accuracy for cytoplasm, and an 87.3% measurement for nuclei [18]. Zhao et al. used an algorithm based on block images evaluating whole slide cervical cell images. They observed 95.0% sensitivity according to the images tested in the study, while the specificity was 99.33% [19]. Bora et al. reported an accuracy of 98.11% and a precision of 98.38% in whole smears and 99.01% in single cells [20].
Recently, the improvement of single cell analysis and classification have been achieved by intense methods of image analysis [21] as well as by using the artificial neural network (U-NET architecture) [22]. In this study, we elaborated and evaluated the system based on two neural networks – the artificial neural network (U-NET architecture) designed to recognise suspicious regions and enhanced CNN neural network (VGG type). Observed sensitivity and specificity to distinguish normal and abnormal samples were 100%; however, results were obtained on a relatively small number of abnormal samples. The system teaching process was carried out with the use of a limited number of abnormal samples. Similarly, certain categories of cervical abnormalities were not represented in the teaching process and it will be implemented in further steps of the system development. Preliminary results are very promising, however, the development of the system evaluation on a large number of abnormal cases is necessary. At the same time, it should be noted that the automation of the diagnostic process also involves aspects of the ethical issue. Patients must be informed about these methods of diagnosis and give their consent. So far, devices of this type are used to support the work of the cytologist.
Conclusions
The presented results indicate high efficiency of artificial neural networks in supporting diagnosticians. The use of ANN is promising for increasing the effectiveness of cervical screening. The low cost of neural network usage further increases the potential areas of application of the presented method. Further refinement of neural networks on a larger sample size is required to evaluate the software.