Introduction
The colon is one of the organs most commonly affected by malignancy. In particular, colon cancer is the third most common malignancy in both men and women [1–5]. Since 2020 there have been approximately 2 million new cases diagnosed each year, and despite the existing advanced screening tests, colorectal cancer (CRC) is the second most common cause of death by cancer worldwide, with approximately 1 million deaths globally [6, 7]. Unfortunately, according to the International Agency for Research on Cancer and the World Health Organization, the overall burden of CRC between 2020 and 2040 is expected to increase by about 56%, and the annual number of associated deaths by about 69% [6]. Furthermore, while it was thought to affect most countries that have adopted a western-type lifestyle, new studies suggest that CRC is an emerging public health issue even in other parts of the world, such as sub-Saharan Africa [8]. Therefore, early diagnosis and receiving proper treatment are necessary for the prevention of deaths caused by CRC. Artificial intelligence (AI), and its subtype called “deep learning (DL) in particular, is one of the newest tools for achieving this goal [9–11].
The term AI in general refers to all technological applications that under ordinary circumstances require human skills, such as decision-making and visual perception [12–14]. DL is a subfield of AI, and its function is inspired by the function of the actual animal nervous system: Artificial neurons are trained to find patterns in a limited amount of data in order to progressively extract higher-level features from them, such as histologic image diagnosis and classification. AI and DL have been massively adopted over the years in medicine to assist and/or add precision to diagnosis, patient stratification, drug discovery, and biomedical research in general [12–14]. Diagnosing CRC could not be an exception to this rule [3, 15, 16]. In fact, the increasing interest concerning the use of DL in an attempt to diagnose CRC is reflected in a parallel increase in the number of published research regarding this issue. For instance, although the first paper on applying DL to CRC management appeared on the PubMed database in 2015, there was a significant rise in published articles: 1 in 2015, 3 in 2016, 5 in 2017, 57 in 2021, and 51 in 2022.
DL has been incorporated in all diagnostic processes regarding CRC, from histopathological classification and endoscopic tumour identification to radiological diagnosis through CT scans and CT colonoscopy, and further serological screening tests. This review aims to summarize the achievements of DL application in diagnosing CRC, and discuss the practical benefits along with the eventual drawbacks of applying such technology to CRC diagnosis.
DL and histopathological diagnosis of CRC
Pathology was among the first subfields of medicine to embrace AI to its benefit. Therefore, making a CRC diagnosis from a histological examination would be no exception to this rule. According to current diagnostic guidelines, histologic samples are required to be examined by pathologists to precisely diagnose the presence of malignancy and identify the exact histologic type and grade of the histologic lesion that is under investigation [3, 17, 18]. The usual protocol requires the application of a special stain onto the tissue samples, most commonly haematoxylin and eosin (H&E), before microscopic examination. However, both the tissue sample preparation and microscopic examination can be quite tedious and time-consuming [10, 19–21]. Moreover, the diagnostic verdicts made by pathologists may have significant variability among different examiners [22]. Aiming toward quicker and more precise diagnosis, DL can assist in the pathologic identification of CRC in various ways.
Most studies regarding the diagnosis of CRC with assistance by DL have been conducted through convolutional neural networks (CNN), a form of supervised DL with multiple layers of artificial neurons, in which each “neuron” of one layer is connected directly with all neurons of the upper layer. To achieve diagnosis, it is necessary to train a CNN to identify the presence of malignant tissue. The result would be either diagnosing cancer or not, or diagnosing the exact histological type of CRC [23–25]. Either way, 2 training strategies have been proposed, all with reported sensitivity over 90% [19, 25, 26]: training the CNN with manually selected pre-annotated images extracted from parts of pathology slides (whole slide images – WSI) with previously diagnosed CRC or with a minimum load of information from annotated WSI, because current data from WSI appear to be relatively limited and because manual pre-annotation and selection of pathologic areas (either randomly or intentionally selected sub-regions) in WSI can be extremely time-consuming, like the traditional pathological examination, as a whole [20, 21, 23, 27, 28]. Surveys on both tactics have reported an accuracy of up to 96% [20, 26, 27]. Overall, depending on the dataset of reference, DL has been reported to achieve up to 99.12% sensitivity in diagnosing DL [10, 23, 28]. Interestingly, apart from the extensively researched supervised CNN, Sari et al. proposed in 2019 a totally semi-supervised DL strategy for unsupervised final feature extraction from WSI of pathology slides, reporting sensitivity levels non-inferior to supervised CNN [29]. Further research could yield interesting further data regarding this DL alternative.
However, apart from merely diagnosing CRC from WSI, DL has enabled the extraction of more diagnostically valuable features from histologic slides. First of all, diagnostic protocols with the assistance of DL have been proposed in order to predict directly, and without any further testing and/or staining resulting in time waste, from the H&E slides the underlying molecular status of CRC, something that is crucial for the optimal treatment selection based on accurate patient stratification and the determination of the overall prognosis of a patient at the exact time of the primary diagnosis of CRC [30, 31]. Therefore, thanks to DL, it is feasible to detect if the underlying CRC is related to K-RAS, TP53, and BRAF mutations and define the CRC subtype based on the underlying gene expression profile with the reported area under the receiver operating characteristics (AUROC) from 0.73 to 0.86 depending on each tested mutation, directing clinicians early to the appropriate treatment and follow-up options, even if testing for those mutations is not available at their existing laboratories and hospitals [32–35]. Furthermore, DL has been used to successfully test WSI for microsatellite instability (a status of genetic hypermutation due to severe DNA repair mismatch, which requires administration of immunotherapy as a first-line treatment option), without further preparations on H&E stained slides, with a reported AUROC of up to 0.865 and sensitivity up to 76%, exceeding the AUROC and sensitivity of the control group consisting of experienced pathologists [36–40]. What is more, DL via CNN has been proven efficient in extracting further information regarding the cellular characteristics of CRC and spatial features regarding the tumour microenvironment, such as the stroma:tumour ratio and the presence and/or the type of white blood cells infiltrating the peritumoral region, avoiding the inter-observational variability. For this purpose, to avoid artifacts caused by the H&E staining, according to various researchers, infrared microscopy could be a non-inferior alternative; infrared microscopy could enhance details regarding the molecular background of the existing CRC and information concerning the spatial features of the tumour [41–44]. For this purpose, both direct infrared microscopic observation of H&E slides or image editing through Fourier transformation have yielded quite promising outcomes, reporting an AUROC up to 0.83 and sensitivity ranging from 69% to 83% depending on the tumour grade [22]. Taking advantage of the data derived from the intervention of DL in extracting data associated with the tumour architecture, such as the characteristics of the submucosal adipose tissue, could also provide estimations regarding patient survival; in 2021 Wulczyn et al. developed a DL protocol able to predict patient survival with an accuracy up to 87–95.5% [45–47].
Finally, DL has enabled the estimation of the possibility of distant lymph node metastasis directly from WSI; Schiele et al. in 2021 reported a CNN architecture using binary images from H&E slides and clinical data from patients with metastasized CC, achieving a sensitivity of about 75.6% in successfully predicting metastatic disease [48]. Interestingly, one year earlier Tsirikoglou et al. reported that it is also feasible to detect lymph node metastases in CRC with minimal or absolutely no clinical data of metastasis, only by taking advantage of the histological characteristics of the pathologic sample [49]. Further research could provide a deeper insight into metastasis prediction of CRC in the future. Overall, the interference of DL and CNN in examining pathology slides has transformed CRC diagnostics.
DL and endoscopic diagnosis of CRC
To obtain a histologic specimen of an underlying CRC it is necessary first to locate the exact area of the lesion and then to perform a biopsy on it. This is achieved through a colonoscopy and endoscopic process that is considered the gold standard for making a diagnosis of CRC possible. Although the exact time of performing a colonoscopy varies according to current guidelines, it is commonly suggested as a check-up in adults above the age of 50 years, and the interval between each colonoscopy session is shorter if a large number of lesions and/or lesions of considerable size (more than 10 mm) are diagnosed initially [1, 7, 50, 51]. In general, CRC presents as lesions in colonic mucosa protruding into the enteric lumen, called polyps, and less frequently as mucosal ulcers [50–54]. Most polyps are benign. Malignant or pre-malignant polyps are called adenomas. Overall, larger lesions are easier to detect. However, it is crucial not to miss even smaller lesions, particularly smaller than 5 mm, from which up to 16% might be endoscopically missed; it is estimated that every 1% increase in adenoma detection rate (ADR) is associated with a 3% drop in CRC incidence [55]. The experience of the person who performs the colonoscopy session is an inevitable factor contributing to successful adenoma diagnosis and localization. However, even experienced endoscopists cannot perform with 100% diagnostic success; in fact, up to 26% of polyps remained undiagnosed within a single colonoscopy session [51, 56]. In addition, CRC can develop during the interval between 2 colonoscopy sessions according to standard guidelines, leading to almost 58% of CRC patients with advanced disease at the time of diagnosis, even with compliance with the screening guidelines [57]. Therefore, it crucial to maximize ADR and polyp detection capacity and minimize the existing diagnostic variability among endoscopic doctors, and DL has made some significant contributions to this [51, 58].
Overall, DL can assist in accurate CRC diagnosis in 2 different ways: either lesion localization and/or diagnosing whether it is adenomatous or not, a challenging task even for experienced endoscopists. Most of the conducted studies have focused on polyp detection and localization, and for this purpose they have proposed numerous CNN protocols. The reported results are quite promising, despite the initial concern that CNNs were mostly trained with static images from recorded colonoscopies instead of a real-time video: In all studies, DL has demonstrated a decrease in adenoma miss rate and an increase of up to 80% in ADR, which was higher when the CNN was compared to less experienced and/or trainee endoscopists, exactly as if DL functioned as a “second pair of eyes” for doctors [58–61]. The reported CNN models appear to have reached an accuracy ranging from 93.6% to 96% in detecting polyps in real-time and a negative predictive value of up to 92.6% [16, 62–65]. In addition to all the above, DL has also been applied to improving polyp detection in water-exchange colonoscopy, a relatively new endoscopic technique that uses water currents instead of air-assisted bowel distension; in a review published by Hsieh et al. in 2019, combining DL protocols and water-exchange leads to overall increased ADR, providing a diagnostic modality that could be of particular benefit for less experienced and/or trainee doctors [66]. And apart from mere polyp detection, DL has the capacity to progress diagnostically one step further: In 2021 Minami et al. developed a DL protocol able to diagnose submucosal invasion directly from endoscopic data, with sensitivity up to 87.2% and AUROC 0.758; this information provides valuable insight for planning and managing future operative management of CRC patients even in the early stages of CRC [45].
At the same time, quite promising results come from the application of CNNs in perfectionating optical biopsy of polyps; in other words, differentiating adenomatous from non-adenomatous lesions. Beginning in 2017, Komeda et al. proposed a CNN architecture for primary optical biopsy, resulting in 70% accuracy and an AUROC of about 0.751 in a non-real-time polyp-detection study [67]. One step further was achieved in 2018 by Misawa et al., who reported a DL protocol of real-time polyp optical biopsy, with a sensitivity reaching up to 90% and AUROC of about 0.87 [56].
Furthermore, along with the above studies that have been conducted with RGB-coloured images, binary in grey-scale editing of colonoscopy images can be an interesting alternative; in 2021, Hsu et al. reported a CNN architecture trained with grey-scale colonoscopy images, resulting in higher detection and diagnostic real-time accuracy in comparison to RGB images (95.1% to 94.1%, respectively), taking advantage of the higher processing speed when computers and DL receive grey-scale images [68].
Interestingly, DL can assist in improving CRC diagnosis through a parallel path, by dealing directly with the traditional technical parameters, such as the level of colon preparation and distension, the size of the inspected colon surface, and the quality of the existing endoscopic view. For this purpose, in 2020 Thakkar et al. proposed a CNN model that could provide real-time metrics regarding endoscopic technical parameters and interact with the performing endoscopist by giving real-time feedback regarding his/her actions for obtaining ideal procedure conditions [69]. The reported result was an increase in ADR; however, further research is required in this direction.
Finally, DL has also been applied to CRC diagnosis made by using an endoscopic capsule, a diagnostic alternative for patients unable to comply with colonoscopy requirements or when traditional endoscopy is contraindicated for them [70–72]. Overall, each swallowed capsule produces approximately 50,000-100,000 image frames, the evaluation of which is of course time-consuming and prone to diagnostic errors [73]. Beginning in 2021 and up to the time of this writing, various CNN diagnostic protocols have been developed, with AUROC up to 0.99 and sensitivity up to 96.9% [73].
In conclusion, DL appears to be a valuable ally in the accurate endoscopic diagnosis of CRC.
Application of DL in other diagnostic tests for CRC
This section will feature the outcome of applying DL protocols in other tests attempting to diagnose CRC. Overall, DL has been successfully implemented in the following:
1. CT colonography: CT colonography is an alternative but increasingly popular process for detecting polyps within the bowel lumen, especially for people unable to cooperate and/or tolerate the conventional colonoscopy and/or with strong contraindications against it, in which a depiction of the bowel lumen is reconstructed after extensive bowel scanning through a differentiated CT scan protocol [52, 57, 74–77]. The outcome leads to polyp detection through X-ray-derived images, which is prone to all limitations related to patient preparation as the conventional colonoscopy. Attempts have been made to maximize ADR with DL through CT colonoscopy; beginning in 2019, Cao et al. reported a CNN model able to characterize colon polyps as benign or not with an AUROC of about 0.945 but with a high standard deviation (more than 25%) [78–80]. In the subsequent years, Wesp et al. and Hedge et al. published trials with DL attempting to perform the same polyp characterization, achieving an AUROC up to 0.83 and sensitivity of about 80%. Interestingly, all researchers pointed out that both the training sample and the patient sample datasets in their trials were small (less than 100 patients), suggesting that further research is required to establish the role of DL in optimizing CRC diagnosis through CT colonoscopy after the initial promising results [80].
2. CT scan: CT scans are not a first-line test for diagnosing CRC compared to endoscopy and pathologic examination, but they are the gold standard for identifying the extent of disease staging at the time of preliminary diagnosis [81]. However, because a CT scan is usually performed before any surgical intervention, on a CRC patient, the need to extract the maximal amount of diagnostic information from a CT scan has encouraged various researchers to explore the outcome of the potential involvement of DL in data extraction from CT scans. In 2022, Wang et al. managed through 3D-reconstruction of CT scan images of patients in disease stage II and III in a correlation of the serum levels of CEA (carcinoembryonic antigen), along with the involvement of a DL protocol to increase the rate of diagnosing extra-peritoneal CRC, reporting having reached a sensitivity of about 95%. Further future relative research would be of particular interest in the pursuit of further diagnostic data from CT scan images [82].
3. Although OCT has been traditionally used for diagnosing diseases of the retina, recent research data suggest that it could be further expanded in other tissues and organs by defining benign and malignant tissue layers [83]. In particular, OCT is applied to CRC diagnosis, and in 2020 using DL for real-time CRC diagnosis was reported, by training a CNN with 26,000 OCT images. According to the published outcome, the CNN achieved 100% sensitivity and AUROC of about 0.998, demonstrating a promising potential of evolving to another competent tool for CRC optical diagnosis [84].
Applying DL for CRC optimization: current aspects and future challenges
Each problem requires a proper solution, and because CRC is a worldwide growing public health issue, it is inevitable that it will be tackled properly without implementing novel technologies. Applying the features made available by AI and DL in particular appears to have a deeply transforming effect in medicine and biomedical research in general, including CRC. Overall, DL protocols appear to optimize CRC diagnosis by increasing diagnostic accuracy, adding to the value of clinicians’ experience, minimizing diagnostic variability among doctors, and maximizing and/or expanding the extracted clinical data without performing additional conventional diagnostic tests on the existing diagnostic material (histologic samples, endoscopy images, etc.) [11, 28, 85].
However, although deducing that clinicians could be substituted by computer-assisted AI systems might appear inevitable, in fact, DL could prove to be a valuable helping hand in the effort to achieve fast CRC diagnosis. As wel as helping trainee doctors gain experience in CRC diagnosis during their internship years, DL could first of all function as a quick and efficient confirmatory test after the initial traditional clinical investigation. In addition to this, adding DL to CRC diagnosis could be beneficial for health systems on a global scale. The number of CRC patients is rising without an equivalent increase in the relative number of field doctors, leading to an increase in the diagnostic workload for doctors, something that leads to diagnostic mistakes [86, 87]. Furthermore, the number of new CRC cases tends to increase even in poorer countries, where neither experienced clinicians nor the patients can afford an extensive diagnostic work-up. However, DL can speed up the CRC diagnostic process (using computer-assisted data processing technology) with minimal misdiagnosis, allowing clinicians to save valuable time for other, probably more urgent clinical tasks, and make its benefits available worldwide even in less developed regions, with a considerable fund-saving capacity that is worth considering.
On the other hand, it appears that DL is currently far from being accepted as a standard diagnostic modality for CRC screening despite the initial promising results of the existing trials. It is necessary to establish globally accepted standards of CNN training (each trial created diverse CNN training conditions) based on larger patient and/or image and data sets. Furthermore, especially for DL in bowel endoscopic investigation, it is necessary to standardize not only the DL protocols but also the performance technique that the DL is asked to reinforce because marked differences were noticed when the first related trials from Europe started to show up among the overwhelming majority that have been conducted in China [15, 58]. What is more, concerns have been widely expressed regarding the proper way of adapting medical training to the new conditions dictated by widely used DL systems, the potential danger of personal patient data leakage, and the economic impact that the increased computing power for performing DL projects on a wide scale would have on the current capacity of our health systems [11, 15, 58, 88]. Still, further research might also contribute to the improvement of the collateral technical issues.