Publications > Journals > Exploratory Research and Hypothesis in Medicine > Article Full Text


Microbial Biomarkers for Colorectal Cancer Identified with Random Forest Model

  • Weili Sun1,
  • Lili Wang2,
  • Qiuyue Zhang2 and
  • Quanjiang Dong1,2,*
 Author information
Exploratory Research and Hypothesis in Medicine 2020;():-

DOI: 10.14218/ERHM.2019.00026


Colorectal cancer (CRC) is one of the most common cancers and a leading cause of cancer-related death. Gut microbiota are part of a complex microbe-based ecosystem of the human body, and changes in the microbiota can lead to a variety of diseases. All currently used CRC detection methods, including endoscopy, guaiac-based fecal occult blood test and fecal immunochemical test, have many limitations. Therefore, establishing novel screening methods which are accurate, inexpensive and non-invasive is indicated. Random forest models, as a superiority machine learning model, are increasingly used in research to select biomarkers. In this review, we summarized progressions of the diagnoses of CRC based on the random forest model of gut microbiota. We concluded that some cancer-associated bacteria in gut microbiota could be used as biomarkers for detecting early CRC. We also aimed to discuss how to select possible markers of colorectal diseases based on gut microbiota using the random forest model.


Gut microbiota, Colorectal cancer, Colorectal adenoma, Random forest model


Colorectal cancer (CRC) is one of the most common cancers and a leading cause of cancer-related death. The global annual incidence of CRC is approximately 1.2 million, and the annual death toll reaches 600,000.1 CRC is characterized by a long preclinical phase, progressing over years from early adenoma to invasive cancer.2,3 Diagnosis of early-stage CRC can significantly improve patient prognosis, with a survival rate higher than 80%,4 which makes early diagnosis of CRC more critical for better patient outcomes.

Screening methods for CRC, including fecal occult blood test (referred to as FOBT), colonoscopy, determination of genetic mutation and gut microbiota tests, have received widespread attention; although, these methods significantly differ in specificity, accuracy, convenience, and universality with regard to clinical diagnosis. Colonoscopy, the gold standard for the diagnosis of CRC, can facilitate early prevention through adenoma resection; however, it is an invasive endoscopy, which causes patient discomfort and reduces their compliance to clinical examinations, relies on endoscopist skill, and requires patient bowel cleansing.5,6 The traditional guaiac-based FOBT, which is inexpensive and noninvasive, is the first-choice screening test for CRC. Both guaiac-based FOBT and fecal immunochemical test (FIT) can improve the positive rate of diagnosis by repeated detection over a long period of time. However, cancer bleeding is usually intermittent, which leads to lower sensitivity, specificity, and false-negative results with these two tests.7 As the abovementioned methods have many limitations, it is necessary to establish novel screening methods that are accurate, inexpensive, and noninvasive.

The human microbiota is a very large and complex microbial system. Gut microbiota includes bacteria, viruses, archaea, and fungi, but the main component is bacteria; thus, most studies of the microbiota focus on bacteria, as it is composed of approximately 1014 bacteria, which equates to 10 times the total number of human cells.8 The gut microbiota contains nearly 11,500 common bacteria.9 The total number of bacterial genes in the gut microbiota is 150 times the total number of human genes.1012 Coincidentally, the gut microbiota has been called a “forgotten organ”.1012

Dysbacteriosis plays an important role in the pathogenesis of several diseases. For example, the gut microbiota is associated with many diseases, such as obesity, type 2 diabetes, and atherosclerosis.1317 Moreover, it is well-known that genomic alterations of the APC/Wnt pathway can potentially lead to carcinoma.18 Recent studies have increasingly indicated that the gut microbiota is closely related to CRC,19 with many differences in gut microbiota between healthy people and patients with CRC. Studies have also shown that biomarkers from gut microbiota may be used for the diagnosis of CRC.2023 This review focuses on the relationship between gut microbiota and CRC as well as the possibility that gut microbiota­related markers may be novel biological screening markers for CRC.

Random forest model

Gut microbiota can be used as a marker to screen for CRC, but the screening capacity of a single flora is limited, and multibacterial models are needed to increase the screening capacity. If too many bacterial species are present, this model will be difficult to apply in the clinical setting; therefore, we need to screen for the most discriminating bacteria. As an integrated machine-learning algorithm, random forest (RF) is a classifier with multiple decision trees that are established through randomly repeated sampling, and a final voting result is obtained.24 The current classifier algorithms mainly include, Bayes net, simple logistic, TB-line (TB pedigree), machine learning, sequential minimum optimization algorithm for support vector, and RF.2532

In many applications, the RF algorithm, by far, has the best accuracy. Moreover, compared to other technologies, RF has many advantages, such as anti-interference, simple optimization, and efficient parallel processing, in processing highly nonlinear biological data.33,34 The RF model considers the interaction between nonlinear data and characteristics of the model, and can also carry out internal cross-validation within the model to prevent overfitting.35 A recent article reported that an RF model successfully screened microbial markers for early cirrhotic liver cancer.36 Another study on hepatocellular carcinoma used the same method to obtain predictive biomarkers for advanced hepatocellular carcinoma.37 An RF model based on intestinal gut microbiota was established to successfully predict the variation of Bifidobacteria after probiotic treatment, and it revealed the effects of probiotics on intestinal flora.38 Thus, RF models, as a superior machine-learning model, are increasingly used in research for the selection of biomarkers. Therefore, we reviewed the recent progress in the diagnosis of colorectal diseases by using an RF model based on gut microbiota as well as the selection of possible predictive markers of colorectal diseases.

Dysbiosis of gut microbiota in CRC

The gut microbiota and its hosts coparticipate in the establishment of a symbiotic relationship to maintain homeostasis in the digestive system. In healthy individuals, at the phyla level, the Firmicutes and Bacteroidetes phyla are predominant in the gut microbiota, despite remarkable interindividual differences.39 At the genus level, a meta-analysis indicated that a high abundance of the genera Barnesiella, Ruminococcaceae UCG-005, Alistipes, Christensenellaceae R-7 group, and an unclassified member of the Lachnospiraceae family correlated with the healthy state in their subjects.40 However, the most abundant bacterial genera in the gut microbiota include Prevotella, Bacteroides, and Ruminococcus. Based on genus compositional variations, the gut microbiota could be classified into different enterotypes, namely the Prevotella predominant enterotype, Bacteroides predominant, and Ruminococcus-related enterotype, respectively. Simultaneously, this research shows that the human intestinal microbiota has commonalities.41

As usual, the features of bacterial populations are specified by tissue, colonic lumen, and feces, which themselves have different cellular and physiologic features. In the proximal colon, Bacteroides, Actinomyces, Pseudomonas, and Enterobacteriaceae show differential abundance between the lumen and mucosa.42 Similarly, Enterobacteriaceae, Bacteroides, and Pseudomonas enrich the proximal colonic mucosa, whereas there is increased relative abundance of Finegoldia, Murdochiella, Peptoniphilus, Porphyromonas, and Anaerococcus in the distal colon.42 The abundance of Turicibacter, Finegoldia, Peptoniphilus, and Anaerococcus was found to be different between the lumen and mucosa microbiota in the distal colon.42 Furthermore, there are differences in the microbiota compositions between different anatomical parts of the colon.42

Colorectal adenoma (CRA) confers a high risk for the development of CRC but the gut microbiota is necessary for the formation of intestinal adenoma, and healthy gut microbiota are associated with a reduced risk of advanced CRA.43 In cases of CRA, the relative abundance of two bacterial genera (Enterococcus and Streptococcus) increased, whereas that of three genera (Clostridium, Roseburia, and Eubacterium) decreased.44 Another study found that some species belonging to Ruminococcaceae, Clostridium, Pseudomonas, and Porphyromonadaceae showed increased numbers in patients with CRA, whereas other species belonging to Bacteroides, Lachnospiraceae, Clostridiales, and Clostridium decreased.20 Analyses of fecal microbiota from 95 patients with CRA revealed substantial changes in the microbiota compositions. In CRA, the Proteobacteria phylum was found to have enriched the microbiota. These bacteria are associated with precancerous lesions.45Lachnospiraceae, a potentially beneficial bacteria genus, was depleted, and the relative abundance of this genus had high accuracy in differentiating patients with CRA from normal individuals.23,46

Dysbiosis of the gut microbiota and its substantial compositional alterations are closely related to the development of CRC.47 The microbial communities in tumor microhabitat are different from those in tumor-adjacent healthy tissue.48 Moreover, antibiotic intervention in the microbiota can significantly reduce the burden of colonic tumors.19 Virulence-associated genes in tumors may potentially depend upon the genomes Fusobacterium and Providencia.48 A study on fecal microbiota from 120 patients with CRC showed the increased abundance of a number of species, including Pophyromonas assaccharolytica, Fusobacterium nucleatum, Parvimonas micra, Peptostreptococcus stomatis, Gemella spp., and Prevotella spp.23 Furthermore, in a mouse model of CRC, the abundance of Lactobacillus negatively correlated with the number of colonic tumors.49 Moreover, fecal microbiota from patients with CRC can promote tumorigenesis in both germ-free mice and conventional mice.50 Taking these findings together, it appears that some cancer-associated bacteria in gut microbiota can serve as biomarkers to detect CRC.

RF model for identification of microbial biomarkers for CRC

Alterations in the relative abundance of bacteria in CRC indicate they are potential predictive or diagnostic biomarkers for CRC or CRA. Escherichia coli, Bacteroides fragilis, and Fusobacterium nucleatum have been shown to directly influence tumor development in the colon.19 A recent small-sample study on the quantification of this bacterium in fecal samples found a great increase in the number of these bacteria in CRC; the conclusive findings of the study supported the use of Fusobacterium nucleatum as biomarker of CRC as well as a marker of early CRC.5153 The supernatant of a Fusobacterium nucleatum culture exhibited strong bactericidal activity against some probiotics, such as Faecalibacterium prausnitzii and Bifidobacterium strains, that may cause disease.54

A study found a stepwise increase in the abundance of Clostridium from normal tissues to adenoma and, finally, colonic cancer.55 Therefore, Clostridium symbiosum can be used singly as a biomarker for detecting CRC. The results showed that a stepped increase in Clostridium abundance outperformed all other conventional screening methods, such as carcinoembryonic antigen (referred to as CEA) and FIT, both of which were known to have greater sensitivity (area under the curve (AUC) = 0.73 vs. 0.38–0.54 for other methods). In combination with FIT, the predicting accuracy of Clostridium symbiosum increased significantly, with an AUC of 0.803. Moreover, with the combination of Clostridium symbiosum and Fusobacterium nucleatum, FIT (200 ng/mL) and CEA (3.3 ng/mL) achieved a performance an AUC of 0.876.55

A meta-analysis of a publicly available dataset showed that the depletion of Faecalibacterium, Bacteroides, and Romboutsis could be a potential biomarker for CRC.52 A Chinese study supported the identification of CRC and differentiation from the healthy group via 76 fecal potential biomarkers; the CRC group was enriched with 18 operational taxonomic units (OTUs); moreover, fecal metabolites in healthy patients and cancer groups are different.56 Another meta-analysis of eight studies from different countries and regions identified 29 species as biomarkers of CRC.57 Furthermore, the microbial species can predict taxonomic and functional microbiome CRC signatures as a basis for future diagnostics. (These data are summarized in Table 1.)49,5255,5759

Table 1

Characteristics of the bacteria species as potential biomarkers for CRC

FirmicutesClostridialesIncreasedXie YH (2017)55
Unknown ClostridialesIncreasedWirbel J et al. (2019)57
Clostridium bolteae/clostridioformeIncreasedWirbel J et al. (2019)57
Clostridium symbiosumIncreasedWirbel J et al. (2019)57; Thomas AM et al. (2019)58
Clostridium leptumIncreasedThomas AM et al. (2019)58
Clostridium hathewayiIncreasedThomas AM et al. (2019)58
Unknown ClostridialesIncreasedWirbel J et al. (2019)57
Subdoligranulum spp.DecreasedThomas AM et al. (2019)58
Unknown PeptostreptococcaceaeIncreasedWirbel J et al. (2019)57
Peptostreptococcus stomatisIncreasedThomas AM et al. (2019)58; Ai D et al. (2019)59
Anaerococcusobesiensis/vaginalisIncreasedWirbel J et al. (2019)57
Anaertuncus colihominisIncreasedThomas AM et al. (2019)58
Gemella morbillorumIncreasedWirbel J et al. (2019)57; Thomas AM et al. (2019)58
Unknown DialisterIncreasedWirbel J et al. (2019)57
HungatellahathewayiIncreasedWirbel J et al. (2019)57
Parvimonas speciesIncreasedWirbel J et al. (2019)57; Ai D et al. (2019)59
Parvimonas spp.IncreasedThomas AM et al. (2019)58
Pravimonas micraIncreasedThomas AM et al. (2019)58
Ruminococcus torquesIncreasedWirbel J et al. (2019)57
Ruminococcus gnavusDecreasedThomas AM et al. (2019)58
Uubdoligranulum speciesIncreasedWirbel J et al. (2019)57
Lachnospiaceae 3157FAA CT1IncreasedThomas AM et al. (2019)58
Lachnospiaceae 8157FAADecreasedThomas AM et al. (2019)58
Lachnospiaceae5163FAADecreasedThomas AM et al. (2019)58
Alistipes spp.DecreasedSze MA et al. (2018)49
Dialister invisusDecreasedThomas AM et al. (2019)58
Eubacterium eligensDecreasedThomas AM et al. (2019)58
Streptococcus parasanguinisIncreasedThomas AM et al. (2019)58
Streptococcus salivariusDecreasedThomas AM et al. (2019)58
Streptococcus vestibularisIncreasedAi D et al. (2019)59
BacteriodetesBacteroidesDecreasedMangifesta M et al. (2018)53
Unknown PorphyromonasIncreasedWirbel J et al. (2019)57
Porphyromonas uenonisIncreasedWirbel J et al. (2019)57
Porphyromonas someraeIncreasedWirbel J et al. (2019)57
Porphyromonas asaccharolyticaIncreasedAi D et al. (2019)59; Thomas AM et al. (2019)58
Prevotella intermediaIncreasedWirbel J et al. (2019)57
Prevotellan igrescensIncreasedWirbel J et al. (2019)57
Prevotella copriIncreasedThomas AM et al. (2019)58
FlavonifractorIncreasedAi D et al. (2019)59
ProteobacteriaFaecalibacteriumDecreasedMangifesta M et al. (2018)53
Escherichia coliIncreasedThomas AM et al. (2019)58
FusobacteriaFusobacterium nucleatumIncreasedTunsjø HS et al. (2019)52; Mangifesta M et al. (2018)53; Xie YH (2017)55; Bullman S et al. (2017)54; Thomas AM et al. (2019)58; Ai D et al. (2019)59
F. nucleatum subspecies animalisIncreasedWirbel J et al. (2019)57
F. nucleatum subspecies nucleatumIncreasedWirbel J et al. (2019)57
F. nucleatum subspecies vincentiiIncreasedWirbel J et al. (2019)57
Fusobacterium species oral taxon 370IncreasedWirbel J et al. (2019)57
ActinobacteriaActinomyces graevenitziiIncreasedThomas AM et al. (2019)58
Bifidobacterium longumDecreasedThomas AM et al. (2019)58
TenericutesSolobacterium mooreiIncreasedWirbel J et al. (2019)57

There have been attempts to explore whether a combination of bacterial markers could increase the AUC value in predicting CRC. Zackular et al.20 analyzed fecal microbiota from healthy subjects and patients with CRA or CRC; they found substantial alterations in the gut microbiome of patients with CRA or CRC compared to healthy controls, with a classification accuracy for CRC of 0.798 AUC. Therefore, combining microbial markers with known clinical risk factors can significantly improve the differentiation ability of the tests.21

By using a LASSO logistic regression classifier, a model constructed with fecal microbiota could predict CRC with an accuracy of 0.82.22 Another study from China demonstrated that a model constructed with microbiota showed better value than FOBT.60 Sze et al.49 constructed an RF classification model using 8 taxa based on significant odds ratios obtained in a meta-analysis of 14 studies from various geographical regions. Their analysis included 1,737 fecal samples and 492 tissue samples. These encompassed Fusobacterium, Parvimonas, Porphyromonas, Peptostreptococcus, Clostridium XI, Enterobacteriaceae, Ruminococcus, and Escherichia. The combined model had an AUC of 0.75 based on fecal samples. Similarly, the AUC was 0.77 in tissue samples used in a combined model trained by Dorea, Blautia, and Weissella.49 Their model could successfully classify CRC with high accuracy when models trained using one data set were tested on other data sets.49

Nonetheless, a test that objectively reflects the early gut changes in CRA or CRC fully is needed. New noninvasive screening methods are needed to increase the sensitivity and specificity for CRC detection. Baxter et al.23 established an RF classification model by using the relative abundance of gut microbiota and FIT from stool samples of 490 patients. They observed that the sensitivity and specificity of a combination model of bacterial abundance and FIT, which they obtained by incorporating data on hemoglobin concentration (determined by FIT), and bacterial relative abundances (multitarget microbiota test) for CRC and CRA were better than those with FIT alone.23 Their model used 23 OTUs, including Lachnospiraceae (OTU87), Lachnospiraceae (OTU60), Lachnospiraceae (OTU32), Lachnospiraceae (OTU88), Lachnospiraceae (OTU44), Lachnospiraceae (OTU14), Bacteroides (OTU7), Bacteroides (OTU3), Bacteroides (OTU2), Ruminococcus (OTU11), Ruminococcus (OTU16), Ruminococcaceae (OTU29), Blautia (OTU13), Blautia (OTU9), Collinsella (OTU19), Firmicutes (OTU282), Enterobacteriaceae (OTU28), Parabacteroides (OTU49), Roseburia (OTU5), Clostridiales (OTU10), Faecalibacterium (OTU6), Anaerostipes (OTU8), Porphyromonas (OTU105), and FIT with a 100 ng/mL cutoff. We infer that 16 of these were members of the Firmicutes phylum.23 The multitarget microbiota test detected 91.7% of cancers and 45.5% of adenomas, compared to 75.0% and 15.7% by FIT, respectively.23 Thus, screening methods for colorectal lesions need to be continually optimized to find the optimal screening program.

A recent study analyzed a total of 969 fecal mate genomes, including 5 publicly available data sets, 2 new cohorts, and 2 validation cohorts.58 Twenty-four species with high RF accuracy features were selected; these were: Actinomyces graevenitzii, Alistipes spp., Anaertuncus colihominis, Bifidobacterium longum, Clostridium hathewayi, Clostridium leptum, Clostridium symbiosum, Dialister invisus, Eubacterium eligens, Escherichia coli, Fusobacterium nuleatum, Gemella morbillorum, Lachnospiaceae 3157FAA CT1, Lachnospiaceae 8157FAA, Lachnospiaceae5163FAA, Parvimonas spp., Peptostreptococcus stomatis, Porphyromonas assccharolyica, Pravimonas micra, Prevotella copri, Ruminococcus gnavus, Subdoligranulum spp., Streptococcus parasanguinis, and Streptococcus salivarius. The predictive microbiome signatures trained on different data sets consistently showed high accuracy. Nonetheless, it appears their model has lower sensitivity and specificity values for predicting CRA.58

Despite significant advances in the study of the effects of gut microbiota on colorectal lesions, few studies have investigated the gut microbiota after the treatment of patients with colorectal lesions. The tumor-node-metastasis (commonly known as TNM) international staging system has always been considered the gold standard to determine CRC prognosis. In addition, findings of aneuploidy, tumor-infiltrating lymphocytes, allelic loss in DCC, TP53, APC and MCC genes, TP53 gene mutations, CD44 protein expression, high levels of thymidylate synthetase, microsatellite instability, and gene studies of both RAS and BRAF are independent, strong prognostic factors. In addition, C-reactive protein, overexpression of the CEA in tumors, and circulating free DNA are considered to be associated with the prognosis of patients with CRC.19 Ai et al.59 analyzed the composition of fecal microbiota in 124 samples from France and 99 samples from Austria. They excluded unrelated and redundant features during feature selection by mutual information, and trained an RF classifier on a large mate genomic data set of patients with CRC and healthy individuals. The RF classifier assembled from published reports as well as extracted and analyzed information from learned decision trees. Porphyromonas asaccharolytica, Peptostreptococcus stomatis, Fusobacterium, Parvimonas spp., Streptococcus vestibularis, and Flavonifractor plautii were determined to be key microbial species associated with CRCs.59

By using an RF model based on fecal microbiota, Sze et al.61 found significant differences between the pre- and post-treatment samples of 67 individuals, including those with adenoma (n = 22), advanced adenoma (n = 19), and carcinoma (n = 26). Fusobacterium, Porphyromonas, and Parvimonas were significantly decreased in the post-treatment samples.61 Furthermore, in a mouse model, interventions of microbiota with antibiotics led to a dramatic decrease in the tumor burden in the colon.19 In addition, as an important probiotic, Bifidobacterium has been shown to enrich the gut microbiome in healthy individuals.61 Moreover, studies have shown that Bifidobacterium can inhibit the growth of intestinal carcinogenic bacteria and protect intestinal mucosa, which makes it an important probiotic for clinical application (Table 2).20,23,42,44,45,49,59,61,62

Table 2

Important studies in dysbiosis of gut microbiota in CRC, bacterial features, detection methods and models

ParticipantsSample sourcesModelsBacteria, genusBacteria features
Flynn et al. (2018)42Healthy (n = 20)Mucosal, feces and luminal contentsRandom forest classification modelsEnterobacteriaceae, Bacteroides and PseudomonasEnriched in the proximal
Finegoldia, Murdochiella, Peptoniphilus, Porphyromonas and AnaerococcusColonic mucous increased in the distal colon
Chen et al. (2013)44Healthy (n = 344), A-CRA groups (n = 344)FecesNAEnterococcus and StreptococcusIncreased in A-CRA
Clostridium, Roseburia, and EubacteriumDecreased in A-CRA
Zackular et al. (2014)20Healthy (n = 30), colonic adenoma (n = 30), and colonic adenocarcinoma (n = 30)FecesNARuminococcaceae, Clostridium, Pseudomonas, and PorphyromonadaceaeIncrease in CRA patients
Bacteroides, Lachnospiraceae, Clostridiales, and ClostridiumDecreased in CRA patients
Goedert et al. (2015)45Normal patients (n = 24), CRA (n = 20), CRC (n = 2), and other conditions (n = 15)FecesRandom forestProteobacteriaEnriched in CRA
Ai D et al. (2019)59France (n = 124); Austria (n = 99)FecesRandom forestPorphyromonas asaccharolytica, Eubacterium hallii, Parvimonas spp., Fusobacterium 7, Prevotella melaninogenica, Streptococcus vestibularis, Prevotellacopri, Peptostreptococcus stomatis, Fusobacterium nucleatum, Parvimonas micra, Gemella morbillorum, Flavonifractor plautii, Fusobacterium, Clostridium SS2Enriched in CRC
Baxter et al. (2016)23CRC (n = 120), CRA (n = 198), no colonic lesions (n = 172)FecesRandom forestPophyromonas assaccharolytica, Fusobacterium nucleatum, Parvimonas micra, Peptostreptococcus stomatis, Gemella spp. and Prevotella spp.Increased in CRC
Sze MA et al. (2018)49Control (n = 1145), CRA (n = 521), CRC (n = 536)Feces and tissueRandom forestFusobacterium, Parvimonas, Porphyromonas, Peptostreptococcus, Clostridium XI, Enterobacteriaceae, Ruminococcus and Escherichia, Dorea, Blautia and WeissellaSignificant ORs
Sze MA et al. (2017)61Adenoma (n = 22), advanced adenoma (n = 19), CRC (n = 26)FecesRandom forestFusobacterium, Porphyromonas, ParvimonasareDecrease in post-treatment

The gut microbiota includes bacteria, viruses, and fungi. Despite the close relation of bacteria to colorectal lesions, gut microbes can interact with each other. Through RF modeling, Hannigan et al.63 found that viruses indirectly affect cancer progression by altering bacterial host communities. Nakatsu et al.64 conducted a study on the survival prediction of CRC by viruses. Their study found a combination of four classification markers that reduce patient survival in CRC.43 Further research on CRC-related viral group characteristics could lead to the development of new tools to identify individuals with CRC or to predict outcomes.

There are some limitations of our meta-analysis. Despite the available data on CRC detection through the gut microbiome, there is a lack of consensus on which features are most informative. The contradictory reports from some studies could be attributed to differences inherent among study populations, procedures for fecal collection and storage, DNA extraction and amplification, sequencing, and bioinformatics processing methods. Moreover, recent studies used their models only to differentiate CRC from CRA or healthy individuals. Further studies are required to identify methods that can differentiate CRC from other colonic diseases, such as inflammatory bowel diseases.

Future research directions

Our analysis of recent studies on CRC biomarkers and the list of related genera showed the absence of an accepted biomarker for CRC. The RF model has features of anti-interference, simple optimization and efficient parallel processing, all of which imply it may be the best choice for screening biomarkers. Future research to develop a kit to accurately screen for CRA and CRC biomarkers through an RF model for fecal microbiota could accurately, quickly and conveniently improve early detection of these conditions.


In recent years, several studies have demonstrated that the gut microbiota in CRC patients differs substantially from that in healthy individuals. Fecal microbial markers have the potential to provide a noninvasive alternative method to diagnose CRC. RF models or other statistical models based on a collection of bacteria in the gut microbiota could help identify CRC with high accuracy. When combined with other conventional screening markers and clinical risk factors, the predictive accuracy for CRC increases dramatically. The findings in our review provide a new approach to identify powerful biomarkers in the gut microbiota. This will facilitate clinician decision-making for early intervention in CRC.



area under the curve


colorectal adenoma


colorectal cancer


carcinoembryonic antigen


fecal immunochemical tests


fecal occult blood test


operational taxonomic unit


random forest





This work was supported by funding from the National Natural Science Foundation of China (No. 81602144, 31870777)

Conflict of interest

The authors have no financial interests or any conflict of interests to disclose.

Authors’ contributions

Manuscript writing (WLS, QJD); critical revision of the manuscript for important intellectual content (WLS, LLW, QYZ); administrative, technical, or material support, study supervision (QJD).


  1. Jemal A, Bray F, Center MM, Ferlay J, Ward E, Forman D. Global cancer statistics. CA Cancer J Clin 2011;61:69-90 View Article
  2. Brenner H, Hoffmeister M, Stegmaier C, Brenner G, Altenhofen L, Haug U. Risk of progression of advanced adenomas to colorectal cancer by age and sex: estimates based on 840,149 screening colonoscopies. Gut 2007;56:1585-1589 View Article
  3. Kuntz KM, Lansdorp-Vogelaar I, Rutter CM, Knudsen AB, van Ballegooijen M, Savarino JE. A systematic comparison of microsimulation models of colorectal cancer: the role of assumptions about adenoma progression. Med Decis Making 2011;31:530-539 View Article
  4. O’Connell JB, Maggard MA, Ko CY. Colon cancer survival rates with the new American Joint Committee on Cancer sixth edition staging. J Natl Cancer Inst 2004;96:1420-1425 View Article
  5. Kaminski MF, Regula J, Kraszewska E, Polkowski M, Wojciechowska U, Didkowska J. Quality indicators for colonoscopy and the risk of interval cancer. N Engl J Med 2010;362:1795-1803 View Article
  6. Lebwohl B, Kastrinos F, Glick M, Rosenbaum AJ, Wang T, Neugut AI. The impact of suboptimal bowel preparation on adenoma miss rates and the factors associated with early repeat colonoscopy. Gastrointest Endosc 2011;73:1207-1214 View Article
  7. Kuipers EJ, Rösch T. Colorectal cancer screening—optimizing current strategies and new directions. Nat Rev Clin Oncol 2013;10:130-142 View Article
  8. Lozupone CA, Stombaugh JI, Gordon JI, Jansson JK, Knight R. Diversity, stability and resilience of the human gut microbiota. Nature 2012;489:220-230 View Article
  9. Morgan XC, Segata N, Huttenhower C. Biodiversity and functional genomics in the human microbiome. Trends Genet 2013;29:51-58 View Article
  10. Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 2010;464:59-65 View Article
  11. O’Hara AM, Shanahan F. The gut flora as a forgotten organ. EMBO Rep 2006;7:688-693 View Article
  12. Gill SR, Pop M, Deboy RT, Eckburg PB, Turnbaugh PJ, Samuel BS. Metagenomic analysis of the human distal gut microbiome. Science 2006;312:1355-1359 View Article
  13. Karlsson FH, Tremaroli V, Nookaew I, Bergstrom G, Behre CJ, Fagerberg B. Gut meta genome in European women with normal, impaired and diabetic glucose control. Nature ;498:99-103 View Article
  14. Le Chatelier E, Nielsen T, Qin J, Prifti E, Hildebrand F, Falony G. Richness of human gut microbiome correlates with metabolic markers. Nature 2013;500:541-546 View Article
  15. Koeth RA, Wang Z, Levison BS, Buffa JA, Org E, Sheehy BT. Intestinal microbiota metabolism of l-carnitine, a nutrient in red meat, promote satherosclerosis. Nat Med 2013;19:576-585 View Article
  16. Qin J, Li Y, Cai Z, Li S, Zhu J, Zhang F. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 2012;490:55-60 View Article
  17. Turnbaugh PJ, Ley RE, Mahowald MA, Magrini V, Mardis ER, Gordon JI. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature 2006;444:1027-1031 View Article
  18. Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature 2012;487:330-337 View Article
  19. Zackular JP, Baxter NT, Chen GY, Schloss PD. Manipulation of the gut microbiota reveals role in colon tumorigenesis. mSphere 2015;1:e00001-15 View Article
  20. Zackular JP, Rogers MA, Ruffin MT, Schloss PD. The human gut microbiome as a screening tool for colorectal cancer. Cancer Prev Res 2014;7:1112-1121 View Article
  21. Zeller G, Tap J, Voigt AY, Sunagawa S, Kultima JR, Costea PI. Potential of fecal microbiota for early-stage detection of colorectal cancer. Mol Syst Biol 2014;10:766 View Article
  22. Yu J, Feng Q, Wong SH, Zhang D, Liang QY, Qin Y. Metagenomic analysis of faecal microbiome as a tool towards targeted non-invasive biomarkers for colorectal cancer. Gut 2017;66:70-78 View Article
  23. Baxter NT, Ruffin MT, Rogers MA, Schloss PD. Microbiota-based model improves the sensitivity of fecal immunochemical test for detecting colonic lesions. Genome Med 2016;8:37 View Article
  24. Breiman L. Random forests. Mach Learn 2001;45:5-32
  25. Nassif H, Wu Y, Page D, Burnside E. Logical Differential Prediction Bayes Net, improving breast cancer diagnosis for older women. AMIA Annu Symp Proc 2012;2012:1330-1339
  26. Gevaert O, De Smet F, Timmerman D, Moreau Y, De Moor B. Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks. Bioinformatics 2006;22:e184-e190 View Article
  27. Xue X, Zeng N, Gao Z, Du MQ. Diffuse large B-cell lymphoma: sub-classification by massive parallel quantitative RT-PCR. Lab Invest 2015;95:113-120 View Article
  28. Shabbeer A, Cowan LS, Ozcaglar C, Rastogi N, Vandenberg SL, Yener B. TB-Lineage: an online tool for classification and analysis of strains of Mycobacterium tuberculosis complex. Infect Genet Evol 2012;12:789-797 View Article
  29. Habibi S, Ahmadi M, Alizadeh S. Type 2 diabetes mellitus screening and risk factors using decision tree: results of data mining. Glob J Health Sci 2015;7:304-310 View Article
  30. Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI. Machine learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J 2014;13:8-17 View Article
  31. Lebedev AV, Westman E, Van Westen GJ, Kramberger MG, Lundervold A, Aarsland D. Random Forest ensembles for detection and prediction of Alzheimer’s disease with a good between cohort robustness. Neuroimage Clin 2014;6:115-125 View Article
  32. Takahashi N, Guo J, Nishi T. Global convergence of SMO algorithm for support vector regression. IEEE Trans Neural Netw 2008;19:971-982 View Article
  33. De Bruyn T, van Westen GJ, Ijzerman AP, Stieger B, de Witte P, Augustijns PF. Structure-based identification of OATP1B1/3 inhibitors. Mol Pharmacol 2013;83:1257-1267 View Article
  34. Menze BH, Kelm BM, Masuch R, Himmelreich U, Bachert P, Petrich W. A comparison of random forest and its Gini importance with standard che-mometric methods for the feature selection and classification of spectral data. BMC Bioinformatics 2009;10:213 View Article
  35. Liaw A, Wiener M. Classification and regression by random Forest. R News 2002;2:18-22
  36. Ren Z, Li A, Jiang J, Zhou L, Yu Z, Lu H. Gut microbiome analysis as a tool towards targeted non-invasive biomarkers for early hepatocellular carcinoma. Gut 2019;68:1014-1023 View Article
  37. Loomba R, Seguritan V, Li W, Long T, Klitgord N, Bhatt A. Gut microbiome-based metagenomic signature for non-invasive detection of advanced fibrosis in human nonalcoholic fatty liver disease. Cell Metab 2017;25:1054-1062.e5 View Article
  38. Luo YM, Liu FT, Chen MX, Tang WL, Yang YL. A machine learning model based on initial gut microbiome data for predicting changes of Bifidobacterium after prebiotics consumption. Nan Fang Yi Ke Da Xue Xue Bao 2018;38:251-260
  39. Lloyd-Price J, Abu-Ali G, Huttenhower C. The healthy human microbiome. Genome Med 2016;8:51 View Article
  40. Mancabelli L, Milani C, Lugli GA, Turroni F, Cocconi D, van Sinderen D. Identification of universal gut microbial biomarkers of common human intestinal diseases by meta-analysis. FEMS Microbiol Ecol 2017;93:fix153 View Article
  41. Arumugam M, Raes J, Pelletier E, Le Paslier D, Yamada T, Mende DR. Enterotypes of the human gut microbiome. Nature 2011;473:174-180 View Article
  42. Flynn KJ, Ruffin MT4th, Turgeon DK, Schloss PD. Spatial Variation of the Native Colon Microbiota in Healthy Adults. Cancer Prev Res (Phila) 2018;11:393-402 View Article
  43. Dove WF, Clipson L, Gould KA, Loungo C, Marshall DJ, Moser AR. Intestinal neoplasia in the ApcMin mouse: independence from the microbial and natural killer (beige locus) status. Cancer Res 1997;57:812-814
  44. Chen HM, Yu YN, Wang JL, Lin YW, Kong X, Yang CQ. Decreased dietary fiber intake and structural alteration of gut microbiota in patients with advanced colorectal adenoma. Am J Clin Nutr 2013;97:1044-1052 View Article
  45. Goedert JJ, Gong Y, Hua X, Zhong H, He Y, Peng P. Fecal microbiota characteristics of patients with colorectal adenoma detected by screening: a population-based study. EBioMedicine 2015;2:597-603 View Article
  46. Lepage P, Häsler R, Spehlmann ME, Rehman A, Zvirbliene A, Begun A. Twin study indicates loss of interaction between microbiota and mucosa of patients with ulcerative colitis. Gastroenterology 2011;141:227-236 View Article
  47. Arthur JC, Gharaibeh RZ, Mühlbauer M, Perez-Chanona E, Uronis JM, McCafferty J. Microbial genomic analysis reveals the essential role of inflammation in bacteria-induced colorectal cancer. Nat Commun 2014;5:4724 View Article
  48. Burns MB, Lynch J, Starr TK, Knights D, Blekhman R. Virulence genes are a signature of the microbiome in the colorectal tumor microenvironment. Genome Med 2015;7:55 View Article
  49. Sze MA, Schloss PD. Leveraging existing 16S rRNA gene surveys to identify reproducible biomarkers in individuals with colorectal tumors. mBio 2018;9:e00630-18 View Article
  50. Wong SH, Zhao L, Zhang X, Nakatsu G, Han J, Xu W. Gavage of fecal samples from patients with colorectal cancer promotes intestinal carcinogenesis in germ-free and conventional mice. Gastroenterology 2017;153:1621-1633.e6 View Article
  51. Guo SH, Li LF, Xu BL, Li MH, Zeng QY, Xiao H. A simple and novel fecal biomarker for colorectal cancer: ratio of to probiotics populations, based on their antagonistic effect. Clin Chem 2018;64:1327-1337 View Article
  52. Tunsjø HS, Gundersen G, Rangnes F, Noone JC, Endres A, Bemanian V. Detection of Fusobacterium nucleatum in stool and colonic tissues from Norwegian colorectal cancer patients. Eur J Clin Microbiol Infect Dis 2019;38:1367-1376 View Article
  53. Mangifesta M, Mancabelli L, Milani C, Gaiani F, de’Angelis N, de’Angelis GL. Mucosal microbiota of intestinal polyps reveals putative biomarkers of colorectal cancer. Sci Rep 2018;8:13974 View Article
  54. Bullman S, Pedamallu CS, Sicinska E, Clancy TE, Zhang X, Cai D. Analysis of Fusobacterium persistence and antibiotic response in colorectal cancer. Science 2017;358:1443-1448 View Article
  55. Xie YH, Gao QY, Cai GX, Sun XM, Zou TH, Chen HM. Fecal Clostridiumsym biosum for noninvasive detection of early and advanced colorectal cancer: test and validation studies. EBio Medicine 2017;25:32-40 View Article
  56. Yang Y, Misra BB, Liang L, Bi D, Weng W, Wu W. Integrated microbiome and metabolome analysis reveals a novel interplay between commensal bacteria and metabolites in colorectal cancer. Theranostics 2019;9:4101-4114 View Article
  57. Wirbel J, Pyl PT, Kartal E, Zych K, Kashani Al, Milanese A. Meta-analysis of fecal meta genomes reveals global microbial signatures that are specific for colorectal cancer. Nat Med 2019;25:679-689 View Article
  58. Thomas AM, Manghi P, Asnicar F, Pasolli E, Armanini F, Zolfo M. Metagenomic analysis of colorectal cancer datasets identifies cross-cohort microbial diagnostic signatures and a link with choline degradation. Nat Med 2019;25:667-678 View Article
  59. Ai D, Pan H, Han R, Li X, Liu G, Xia LC. Using decision tree aggregation with random forest model to identify gut microbes associated with colorectal cancer. Genes (Basel) 2019;10:E112 View Article
  60. Ai L, Tian H, Chen Z, Chen H, Xu J, Fang JY. Systematic evaluation of supervised classifiers for fecal microbiota-based prediction of colorectal cancer. Oncotarget 2017;8:9546-9556 View Article
  61. Sze MA, Baxter NT, Ruffin MT4th, Rogers MAM, Schloss PD. Normalization of the microbiota in patients after treatment for colonic lesions. Microbiome 2017;5:150 View Article
  62. Pinzone MR, Celesia BM, Di Rosa M, Cacopardo B, Nunnari G. Microbial translocation in chroni liver diseases. Int J Microbiol 2012;2012:694629 View Article
  63. Hannigan GD, Duhaime MB, Ruffin MT4th, Koumpouras CC, Schloss PD. Diagnostic potential and interactive dynamics of the colorectal cancer virome. mBio 2018;9:e02248-18 View Article
  64. Nakatsu G, Zhou H, Wu WKK, Wong SH, Coker OO, Dai Z. Alterations in enteric virome are associated with colorectal cancer and survival outcomes. Gastroenterology 2018;155:529-541.e5 View Article
  • Exploratory Research and Hypothesis in Medicine
  • eISSN 2472-0712