Advanced Search

Publications > Journals > Journal of Translational Gastroenterology > Article Full Text


A Systematic Exploration of Key Candidate Genes and Pathways in the Biogenesis of Human Gastric Cancer: A Comprehensive Bioinformatics Investigation

  • Shuhylul Hannan1 ,
  • Ithmam Hami1 ,
  • Rajib Kumar Dey2  and
  • Shipan Das Gupta1,* 
 Author information
Journal of Translational Gastroenterology   2024;2(1):9-20

doi: 10.14218/JTG.2023.00072


Background and objectives

Gastric cancer (GC) is a prevalent gastrointestinal malignancy, yet its early detection remains hindered due to the lack of available genetic markers. This study aimed to identify alternative genetic markers for the early prognosis and prevention of GC.


This objective was achieved through the analysis of differentially expressed genes (DEGs) from three datasets obtained from the Gene Expression Omnibus (GEO). By doing so, our goal was to identify hub genes associated with gastric adenocarcinoma that could serve as potential biomarkers for the early detection and management of GC. Three GEO datasets (GSE172032, GSE179581, and GSE181492), consisting of 10 normal and 10 GC samples were analyzed using the Galaxy web server. The visualizations of DEGs, including heatmaps, volcano plots, and MD plots, were generated via the same tool. ShinyGO performed Gene Ontology and KEGG enrichment analysis, while NetworkAnalyst identified a protein-protein interaction (PPI) network and screened 10 potential hub genes. Kaplan Meier plotter was used to analyze overall survival analysis for key hub genes, and NetworkAnalyst was used to assess protein-drug interactions for the top 10 hub genes.


A total of 1,079 common DEGs emerged across datasets, concentrating on significant GC-related pathways. Ten hub genes (H2BC21, H3C12, H2BC17, H3C2, H3C10, ERBB4, H2AC8, H3C8, H2BC14, and MAPT) were found to be linked to GC via PPI analysis. Survival analysis revealed that higher expression levels of ERBB4 and MAPT were associated with poor overall survival in GC patients. Furthermore, protein-drug interaction analysis revealed that the protein product of the MAPT gene exhibited a robust connection with drug compounds, specifically docetaxel and paclitaxel. These findings suggested that these drugs have the potential to inhibit the function of MAPT.


In summary, our findings provide putative candidate biomarkers, provide insights into GC treatment strategies, and highlight avenues for further research, contributing to a better understanding of the pathogenesis of GC.


Gastric adenocarcinoma, Survival analysis, Differentially expressed gene, Biomarker


Cancer initiation occurs when cells in the body undergo unregulated growth. Gastric cancer (GC), commonly termed stomach cancer, originates from the uncontrolled growth of cells within the stomach. Approximately 95% of cases involve the stomach lining and exhibit a gradual progression of cell mass. If left untreated, it can progress into a tumor, infiltrating deeper layers of the stomach wall. This tumor has the potential to metastasize to adjacent organs, including the liver and pancreas.1,2 GC is a major contributor to global cancer-related fatalities. Functionally, the stomach aids digestion by secreting enzymes, gastric acid, and the intrinsic factor essential for vitamin B12 absorption. Its lining comprises mucous membrane housing columnar epithelial cells and glands. Unfortunately, these cells are susceptible to inflammation, known as gastritis, which can progress to peptic ulcers and, ultimately, culminate in GC.3 In recent years, stomach cancer has become a prevalent malignancy with significant morbidity and mortality rates making it a pressing concern in global medical research.4

GC is estimated to rank as the fifth most common cancer and the third leading cause of cancer-related deaths worldwide. Each year GC accounts for approximately 783,000 deaths, constituting about 8% of all cancer-related deaths.3,5,6 The notable frequency of late-stage diagnosis, resistance to treatment, and the tendency to metastasize in GC significantly contribute to the low survival rate, with less than 20% achieving 5-year survival, and elevated recurrence rates in GC patients. Current treatment relies primarily on surgical interventions complemented by conventional chemotherapy, yet the outlook for GC patients remains discouraging.7–9 Consequently, there is an urgent need to determine the molecular intricacies and potential biomarkers associated with GC. This approach is crucial not only for diagnosing GC but also for inhibiting metastasis and advancing effective treatment strategies, addressing a substantial and urgent demand in this field.10

Genetic factors, such as polymorphisms, can serve as promising biomarker candidates due to their potential contribution to GC risk. For instance, a study by Jing He et al. revealed that individuals with the rs873601A variant genotype in the nucleotide excision repair gene XPG are at an elevated risk of developing gastric adenocarcinoma.11 Another study investigated the association of eight SNPs in the mammalian target of rapamycin complex 1 gene with GC in a cancer-control study and revealed that one of them (rs1883965A) had a significant correlation.12 Similarly, a study in a Chinese population revealed an association between the rs2298881 CA variant in the nucleotide excision repair pathway gene ERCC1 and an elevated risk of GC.13 However, it is important to note that these studies had limitations, such as a hospital-based case-control design and limited investigation of gene variants. Therefore, further studies are needed to confirm these findings and explore other genetic variants and risk factors. Additionally, the provided sources do not specifically mention the use of these genetic variations as candidate biomarkers.

In the modern landscape of biology, high-throughput data, including gene expression information obtained from RNA sequencing or microarrays, have gained broad utility in deciphering the underlying molecular dynamics driving tumor progression. Among these tools, mRNA expression microarray platforms stand out for their capacity to identify aberrant mRNA expression patterns and uncover differential expression genes (DEGs).14 Recently, many researchers have utilized gene expression microarray platforms to explore the gene expression profiles characterizing various grades of GC tissues, aiming to identify genes intricately linked to the oncogenic processes underlying GC.15 With these platforms, the Gene Expression Omnibus (GEO) database offers methods for the bioinformatics mining of gene expression profiles in a variety of tumors.16 In this study, we identified DEGs between GC tissues and adjacent normal tissues by integrating three microarray datasets from the GEO database to find promising novel biomarkers. These biomarkers may provide new insights into the underlying molecular mechanisms and help understand the occurrence, progression, and pathogenesis of GC. The complete workflow followed to identify DEGs and perform in silico analysis is depicted in Figure 1.

The complete workflow followed to identify DEGs and to perform their <italic>in-silico</italic> analysis.
Fig. 1  The complete workflow followed to identify DEGs and to perform their in-silico analysis.

DEG, differentially expressed gene; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; PPI, protein-protein interaction.

Materials and methods

Retrieval of microarray data

RNA-Seq data from three datasets—GSE172032, GSE179581, and GSE181492—comprising human GC and corresponding adjacent normal tissue specimens, were included in our analysis. The datasets included 20 tissue samples, including 10 gastric carcinoma tissues and 10 adjacent non-tumorous tissues explored in our in-silico analysis. All gene expression profiles were pair-ended secondary data downloaded from the GEO database (http://www.ncbi.nlm.nih.gov/geo/ ) of the National Center for Biotechnology Information.17,18

Expression analysis of DEGs

Galaxy (https://usegalaxy.org/ ) online analysis software was used to analyze the DEGs in the two concerned conditions: human GC and matched adjacent normal tissue specimens.19 Three datasets were uploaded to the Galaxy web server to identify the DEGs. The count table generated in Galaxy after the limma command was subsequently converted into an Excel file and used to identify DEGs between tumor tissues and adjacent non-tumorous tissue samples. A p-value of 0.05 or lower was considered to indicate statistical significance. Genes with log fold change (log2FC) > 1 and log2FC < −1 and a p-value of 0.05 or lower were considered upregulated and downregulated, respectively.

Construction of heatmap, volcano plot, and MD plot of DEGs

Galaxy, a web-based platform, provides tools for researchers, even those lacking informatics expertise, to conduct computational analyses on extensive biomedical datasets.20 In this study, the Galaxy web server’s limma package was used for visualizing heatmaps, volcano plots, and MD plots.21,22

Gene ontology (GO) and KEGG enrichment analysis of DEGs

ShinyGO (http://bioinformatics.sdstate.edu/go/ ) served as a web-based tool for exploring GO term enrichment in genomic datasets. It enables the comparison of uploaded data to reference sets of gene or protein annotations. The tool visualizes the results of the enrichment analysis in an interactive and user-friendly way, making it easy for researchers to identify overrepresented functional categories in their data. ShinyGO is built on the R programming language and can be run locally or accessed through a web interface. ShinyGO online software was used for GO and KEGG enrichment analysis of DEGs.23

Protein-protein interaction (PPI) network construction and module analysis

NetworkAnalyst (https://www.networkanalyst.ca ) is a user-friendly online tool that interprets gene expression data in the context of PPI networks. NetworkAnalyst 3.0 includes features for meta-analysis, allowing users to visually compare multiple gene lists through interactive heatmaps, enrichment networks, and Venn diagrams.24,25 It is a powerful internet tool with a natural online interface that enables researchers to perform PPIs easily.25,26 This online tool was used to construct the PPI network in our analysis.24,26

Prediction of the hub genes

PPIs play a crucial role in biological processes including gene expression, cell growth, proliferation, and apoptosis.27,28 Understanding protein interactions provides an efficient approach for screening hub genes. Hub genes pinpointed through a PPI network-based approach have been documented in various cancers, including breast cancer29 liver cancer30 and GC.31 Hub genes obtained from the PPI subnet were more meaningful than individual genes screened without network information.32 Therefore, potential hub genes of GC were identified using PPI networks. According to the degree levels of PPIs, the top hub nodes were selected as hub genes.

Functional enrichment analysis of the hub genes

ExpressAnalyst is a web-based platform that focuses on gene expression profiling and meta-analysis. Functional enrichment analysis is a commonly used approach to identify the biological functions or pathways associated with a set of genes of interest. In this case, we were interested in performing functional enrichment analysis of the hub genes on https://www.expressanalyst.ca , an online tool for analyzing gene expression and gene network data. ExpressAnalyst visualizes enriched functional categories in a particular network.33

Overall survival (OS) analysis of key Hub genes

The Kaplan Meier Plotter serves as a robust tool for evaluating the association between gene expression (mRNA, miRNA, protein) and survival across a vast dataset encompassing over 30,000 samples derived from 21 distinct tumor types, such as breast, ovarian, lung, and GCs. The information is curated from diverse sources including GEO, the European Genome-phenome Archive, and The Cancer Genome Atlas (TCGA) databases. Its primary utility lies in conducting meta-analysis-driven identification and validation of survival-related biomarkers in cancer research. Utilizing this tool, we conducted an OS analysis of genes linked to these hub genes through the Kaplan–Meier Plotter online database.34

Identification of drug candidates based on hub genes

Understanding drug-protein binding is an essential step and is routinely investigated in the pre-clinical stages of drug discovery for determining the activity and consequences of the drug.35 NetworkAnalysit, a powerful internet tool with a natural online interface, enables researchers to perform protein-drug interactions with ease.25 This online tool was used to construct the protein-drug interactions in our analysis.24


Exploring DEGs in GC: heatmap, volcano plot, and MD plot analysis

Galaxy web analysis identified a total of 1,079 DEGs, including 638 upregulated genes and 441 downregulated genes (Table 1). An expression heatmap, volcano plot, and MD plot (Fig. 2) were constructed to visualize the identified DEGs.

Table 1

Top 100 upregulated and top 100 downregulated gene identified in GC

Upregulated Top 100 genesCXCL8, CXCL1, CCL20, ELF3, FCGR1A, LOC100128770, LGR5, SBSN, H2BC6, SLC26A3, GJB4, H2BC14, ZSCAN10, OVOL1, CFAP276, FUT3, SGK2, NECTIN4, TNFRSF9, TTC24, H2AC18, SLC7A4, QPCT, IL13, H3C2, OR2B6, CXCL2, LRRC25, SLC7A9, IL24, PI3, ALDOB, CILP2, CXCL3, LOC101928844, SOX30, DSG3, SP6, RAB33A, GPR25, GUCY1B2, H2AC13, H2BC7, SLC17A4, SLC43A2, VPREB3, ARMH1, ABCG8, XIRP1, SI, LAG3, PATL2, ADAMTS18, H2BU1, EREG, ZFP42, LINC00528, LUCAT1, HAPLN4, H2BC8, CYP27A1, GJB5, KRT4, TINAG, MAJIN, ASIC4, OR13H1, H2AC19, H2BC17, LINC00520, LHFPL3, H3C10, BCAR4, H3C8, MEFV, H2BC21, H2BC18, GPR84, C6orf52, FUT5, LOC105372412, PAGE2B, TULP2, H2AC17, PKP1, H2AC8, SLC3A1, LINC00628, TRIM54, BAAT, H1-6, ARL14, SLC5A2, PRKCG, H3C12, INHBA, CCL25, CST6, TNNC2, DNAJB5-DT
Differential gene expression in GC.
Fig. 2  Differential gene expression in GC.

(a) Heatmap of the top 10 differentially expressed genes. (b) Volcano plot of Treated-Control. (c) MD plot of Treated-Control. MD, Mean-Difference; LogFC, Log Fold Change.

The heatmap, volcano plot, and MD plot show the expression profiles of the GSE172032, GSE179581, and GSE181492 datasets. A heatmap of DEGs is a useful visualization tool for analyzing gene expression data. The heatmap displays gene expression values as a color-coded matrix, with each row representing a gene and each column representing a sample or experimental condition. The color of each cell in the matrix corresponds to the expression level of a gene in a particular sample or condition, with higher expression levels represented by warmer colors (e.g., red) and lower expression levels by cooler colors (e.g., blue).36Figure 2a shows the heatmap of the top 10 DEGs in the three datasets. Gene expression levels are indicated by colors, as shown by the red arrow representing a high expression level and blue representing a low expression level. The top 10 DEGs based on log2FC and p-value obtained from the heatmap are presented in Table 2.

Table 2

The top 10 DEGs based on log2FC and p-value obtained from the heatmap

Gene IDGene Namelog2FCp-value

The ENSG00000077684 gene, also known as JADE1, was excluded from the table due to no statistical significance, as indicated by a log2FC of 0.862011258 and a p-value of 2.06E-05.

A volcano plot is a graphical representation commonly used to visualize the results of differential expression analysis. The x-axis of the volcano plot represents the log2FC in expression levels between two groups (such as treatment vs. control). The y-axis represents the negative logarithm of the p-value or the adjusted p-value, reflecting the statistical significance of the differential expression.

Figure 2b presents the volcano plot for the three aforementioned datasets. Each dot within the plot corresponds to a gene. Dots situated towards the positive end of the log2FC spectrum denote genes exhibiting elevated expression levels, while those positioned towards the negative end signify genes with reduced expression levels. Dots situated precisely at a log2FC score of zero indicate genes that, based on the criteria of a p-value < 0.05 and |log2 FC| > 1, show no significant differential expression.

Figure 2c shows the MD plot of DEGs in the three datasets. A red dot indicates genes with high levels of expression, a blue dot indicates genes with low levels of expression, and a black dot indicates genes with no differential expression based on the criteria of p-value < 0.05 and |log2 FC| > 1.

Functional enrichment analysis reveals diverse biological signatures of DEGs in GC

To identify the pathways that had the most significant involvement in the genes identified, the top 100 upregulated and top 100 downregulated DEGs were submitted to ShinyGO for GO and KEGG pathway analysis. GO analysis revealed that in biological process terms, the DEGs were mainly enriched in the interleukin-7-mediated signaling pathway, innate immune response in the mucosa, DNA replication-dependent nucleosome assembly, presynaptic organization, antimicrobial humoral immune response mediated by an antimicrobial peptide, nucleosome assembly, chromatin assembly, nucleosome organization, chemokine-mediated signaling pathway, chromatin assembly or disassembly, antimicrobial humoral response, DNA packaging, negative regulation of inflammatory response to an antigenic stimulus, chromatin remodeling, protein–DNA complex assembly, DNA conformation change, and protein–DNA complex subunit organization (Fig. 3a).

Functional enrichment analysis of DEGs in GC.
Fig. 3  Functional enrichment analysis of DEGs in GC.

GO analysis revealed that DEGs were significantly enriched in (a) biological process terms (b) cellular component terms (c) molecular function terms (d) significantly enriched KEGG terms obtained from KEGG analysis. DEG, differentially expressed gene; KEGG, Kyoto Encyclopedia of Genes and Genomes; GO, Gene Ontology.

The GO analysis further unveiled that, with regard to cellular components, the DEGs exhibited prominent enrichment in various categories. These included Nucleosome, DNA packaging complex, Protein-DNA complex, Cornified envelope, Brush border membrane, GABA-ergic synapse, Integral component of postsynaptic specialization membrane, Postsynaptic specialization membrane, Ion channel complex, Receptor complex, Transmembrane transporter complex, Synaptic membrane, Transporter complex, Integral component of the plasma membrane, Plasma membrane protein complex, Chromatin, Plasma membrane region, Synapse (Fig. 3b).

The molecular functions of DEGs included l-cystine transmembrane transporter activity, 4-galactosyl-N-acetylglucosaminide 3-alpha-L-fucosyltransferase activity, fucosyltransferase activity, CXCR chemokine receptor binding, basic amino acid transmembrane transporter activity, chemokine activity, peptide hormone binding, potassium channel regulator activity, chemokine receptor binding, protein heterodimerization activity, ligand-gated ion channel activity, cytokine activity, receptor ligand activity, signaling receptor activator activity, channel activity, passive transmembrane transporter activity, protein dimerization activity, transmembrane transporter activity, and transporter activity (Fig. 3c).

KEGG pathway analysis demonstrated that DEGs were significantly enriched in systemic lupus erythematosus, glycosphingolipid biosynthesis, neutrophil extracellular trap formation, alcoholism, nicotine addiction, viral protein interaction with cytokine and cytokine receptor, legionellosis, IL-17 signaling pathway, epithelial cell signaling in Helicobacter pylori infection, GABAergic synapse, rheumatoid arthritis, pancreatic secretion, amoebiasis, insulin secretion, retrograde endocannabinoid signaling, necroptosis, chemokine signaling pathway, viral carcinogenesis, cytokine–cytokine receptor interaction, transcriptional misregulation in cancer (Fig. 3d).

PPI network construction and module analysis unveil molecular insights into DEGs

By evaluating the relationships between various DEGs, a PPI network was constructed to assess the significance of these DEGs. This strategy enables researchers to concentrate on the most pertinent interactions and pinpoint crucial functional DEG modules, illuminating the molecular mechanisms underlying the studied illness or disease. Interactions between the identified DEGs revealed a total of 664 nodes and 1,892 edges in 29 subnetworks (Fig. 4).

PPI network of the top 100 upregulated and top 100 downregulated genes identified in GC.
Fig. 4  PPI network of the top 100 upregulated and top 100 downregulated genes identified in GC.

PPI, protein-protein interaction; GC, gastric cancer.

Prediction of top hub genes through PPI network analysis

Hub gene prediction aimed to identify the hub genes based on the PPI network and uncover their clinical value. Hub genes were identified using PPI networks. According to the degree levels of PPIs, the top hub nodes were selected as hub genes. Our study identified a total of 30 hub nodes and among them, the top 10 hub nodes were predicted as hub genes for further analysis as shown in Table 3.

Table 3

The top 20 hub nodes according to degree levels


Functional enrichment analysis of predicted hub genes unveils insights into molecular mechanisms

Subsequent functional enrichment analysis, visualizing functional categories enriched in a network, revealed that the genes in this module were mainly enriched in systemic lupus erythematosus, alcoholism, viral carcinogenesis, necroptosis, transcriptional misregulation in cancer, gastric acid secretion, thyroid hormone synthesis, calcium signaling pathway, ERBB4 signaling pathway, insulin, and salivary secretion, etc. (Fig. 5)

Functional enrichment analysis of predicted hub genes.
Fig. 5  Functional enrichment analysis of predicted hub genes.

OS analysis reveals prognostic significance of hub genes in GC patients

The outcomes from Kaplan–Meier plotting underscored the impact of two central genes (ERBB4 and MAPT) on GC prognosis. This analysis included 875 patients. Our findings indicate that ERBB4 and MAPT exhibit favorable associations with the overall survival of GC patients. Conversely, the remaining hub genes (H2BC21, H3C12, H2BC17, H3C2, H3C10, H2AC8, H3C8, H2BC14) were not present in the Kaplan-Meier Plotter database (Fig. 6).

Overall survival analysis of GC patients.
Fig. 6  Overall survival analysis of GC patients.

Here, (a) ERBB4 and (b) MAPT expression data-based (microarray) association study in the survival rate of patients with gastric cancer. A log-rank test was performed to evaluate the survival differences between the two curves. HR, Hazard Ratio; ERBB4, erythroblastic oncogene B; MAPT, microtubule-associated protein Tau.

Prediction of drug candidates for the top 10 hub genes

The NetworkAnalyst tool (www.networkanalyst.ca/ ) was employed to scrutinize potential drug candidates for the top 10 hub genes through protein-drug interaction analysis. This analysis leveraged the DrugBank database (version 5.0), which is exclusively personalized for human data. (25). The analysis concluded that only two drugs interact with the protein product of the MAPT hub gene. In contrast, other hub genes did not show any interaction with the enlisted drugs in the database. Figure 7 shows the protein-drug interaction network between the hub proteins of MAPT, and the proposed drugs were obtained with the help of the NetworkAnalyst tool, where the degree of interaction is represented by the area of the nodes. The tool suggested that docetaxel and paclitaxel from the DrugBank database (version 5.0) play a role in the treatment of many cancers, including GC, and are associated with the regulation of MAPT expression. Docetaxel is a taxoid antineoplastic agent used to treat various cancers, such as locally advanced or metastatic breast cancer, metastatic prostate cancer, gastric adenocarcinoma, and head and neck cancer.37,38 Similarly, paclitaxel is a taxoid chemotherapeutic agent used as a first-line and subsequent therapy for the treatment of advanced carcinoma of the ovary, and other various cancers, including breast and lung cancer.39

Protein-drug Interactions analysis with the products of <italic>MAPT</italic> hub genes.
Fig. 7  Protein-drug Interactions analysis with the products of MAPT hub genes.

MAPT, microtubule-associated protein Tau.


The TCGA research network has devised a genetic classification system for GC, encompassing four distinct subtypes: Epstein Barr virus positive, microsatellite instability (MSI), genomically stable, and chromosomally unstable (CIN). This classification is rooted in the analysis of genetic alterations within GC samples, offering valuable insights into the molecular basis of the malignancy. The TCGA classification has been popularly utilized in both preclinical and clinical studies to settle on treatment approaches and patient prognosis. For example, it aids in identifying specific therapeutic targets for different GC subtypes. A case in point would opt for immune checkpoint inhibitors for MSI-high tumors. Furthermore, it has proven to be instrumental in creating prognostic models for patient survival and guiding personalized treatment methods.40

The PD1/PDL1 pathway plays a critical role in the immune checkpoint system in GC. The PD1 receptors on immune cells interact with PDL1 ligands, which are expressed in both tumor cells and immune cells. This interaction curbs immune activity causing subsequent immune suppression and evasion of tumor. High PDL1 expression is usually connected to poor prognosis in GC patients, indicating its potential as a prognostic factor. Moreover, the PD1/PDL1 pathway has already been a target for immunotherapy in GC, with promising results from clinical trials using the PD1/PDL1 inhibitors—pembrolizumab and nivolumab for advanced cancer patients. This pathway is important because it regulates the immune response and serves as a target for personalized treatment options. However, further research is required to identify additional predictive markers, as not all patients with increased PDL1 expression respond to its inhibitors.40

The present study employed a comprehensive bioinformatics approach to identify key candidate genes and pathways associated with human GC. Through the integration of gene expression profiling, PPI analysis, pathway enrichment, and functional annotation analysis, the study identified 10 hub genes that may serve as potential biomarkers for GC. The identified hub genes included H2BC21, H3C12, H2BC17, H3C2, H3C10, ERBB4, H2AC8, H3C8, H2BC14, and MAPT.

One of the important hub genes, ERBB4 (also known as HER4) is a member of the epidermal growth factor receptor family of receptor tyrosine kinases (RTKs). This receptor has been implicated in the development and progression of various cancers, including GC.41 Several studies have shown that ERBB4 can promote the proliferation of GC cells through the PI3K/Akt signaling pathway.42–44 This pathway is a key regulator of cell growth, survival, and metabolism, and is frequently dysregulated in cancer. Upon ligand binding, ERBB4 undergoes activation, subsequently recruiting and activating PI3K, which, in turn, triggers Akt activation. The activated Akt pathway fosters cell survival and growth by phosphorylating downstream targets involved in essential processes such as cell cycle regulation, protein synthesis, and metabolism. In GC cells, ERBB4 has been found to promote proliferation by activating the PI3K/Akt pathway. Inhibition of ERBB4 or its downstream effectors, such as PI3K or Akt, can significantly reduce cell proliferation and induce apoptosis in GC cells. Therefore, targeting the ERBB4/PI3K/Akt pathway may represent a promising strategy for the treatment of GC.42–44

Another pivotal hub gene, known as the clustered histone gene group H3 (H3C2, H3C8, H3C10, H3C12), plays a crucial role in chromatin remodeling and is intricately associated with gastric adenocarcinoma.45 Numerous investigations have indicated that modifications in the expression of H3 cluster histone genes could play a pivotal role in the initiation and advancement of GC. For instance, Mitani et al.46 found that the tumor suppressor gene P21 WAP1/CIP1, which has a low level of H3 acetylation on promoter, resulted in its down-regulation in GC. Additionally, a study revealed a significant upregulation of the H3 cluster of histone genes in GC tissues.47 Furthermore, alterations in the post-translational modifications of histone proteins have also been implicated in GC. As an illustration, the dysregulation of histone H3 acetylation on lysine residues has been demonstrated in GC. Elevated levels of histone H3 acetylation have been connected to tumor progression and an unfavorable prognosis.46,48 In addition, alterations in the post-translational modifications of histone proteins have also been implicated in GC. For example, the acetylation of lysine residues on histone H3 has been shown to be dysregulated in GC, and increased levels of histone H3 acetylation are associated with tumor progression and poor prognosis.

Collectively, these studies suggest that alterations in the expression and modification of H3 cluster histone genes may play a role in the development and progression of GC. Further extensive investigations are needed to gain deeper insights into the intricate molecular mechanisms that underlie these findings and to pave the way for innovative therapeutic approaches aimed at both preventing and treating GC.

Another pivotal hub gene, MAPT, is closely linked to GC due to its expression pattern. Tau actively contributes to the stabilization and assembly of microtubules. Its primary expression is observed in neurons, where it crucially maintains axonal structure and function. However, recent studies have suggested that tau expression may also be involved in the development and progression of certain types of cancer, including GC.41 In one study, it was reported that there was a notable upregulation of tau expression in GC tissues when compared to adjacent noncancerous tissues.47 Furthermore, elevated tau expression was associated with advanced tumor stage, lymph node metastasis, and an unfavorable patient prognosis.49 The precise mechanisms that underlie the link between tau expression and GC remain partly elusive. However, it is plausible that these mechanisms encompass interactions with other proteins or modulation of signaling pathways that oversee critical cellular processes such as proliferation, survival, and migration. Overall, these studies suggest that the expression of MAPT may be associated with GC. However, further research is needed to better understand the role of tau in GC pathogenesis and to develop novel therapeutic strategies targeting tau for the prevention and treatment of this disease.

The hub mentioned above genes have previously been reported to be involved in various cellular processes, including nucleosome and chromatin assembly, ligand-gated ion channel activity, CXCR signaling receptor activity, systemic lupus erythematous, glycosphingolipid biosynthesis, IL-17 signaling pathway, pancreatic secretion, and viral carcinogenesis, which are recognized to be crucial in the emergence and progression of stomach cancer.50–52 The investigation additionally identified several novel genes, including H2BC21, H2BC17, H3BC14, and H2AC8 which have not previously been implicated in GC.

Through pathway enrichment analysis, a cluster of pivotal pathways correlated with GC emerged. These include gastric acid secretion, alcoholism, salivary secretion, ErbB4 signaling pathway, viral carcinogenesis, and retrograde endocannabinoid signaling pathways. These pathways, which are dysregulated across diverse cancers, including GC, play a significant role in crucial processes, such as cell proliferation and survival.

The findings of this study provide valuable insights into the molecular mechanisms underlying GC development and progression. The identified hub genes and pathways may serve as potential therapeutic targets for the development of novel therapies for GC treatment. Furthermore, the identified hub genes may serve as potential biomarkers for the early detection of GC. This study has several limitations. Limitations and potential directions for future research are that the stomach region from which the tumor samples were taken was not specified before collecting the pair-ended microarray datasets used in this analysis, and samples taken from the same disease stage are preferable for a better study of each form of cancer. However, the source of the microarray data was not mentioned, and all the linear correlations between gene expression levels that were known to exist were used in this investigation. Future research that incorporates nonlinear relationships more thoroughly may produce more accurate information about the interactions between proteins and possibly recommend new medicines. To quantify gene expression, RNA-Seq technology may provide more accurate data. However, paired RNA-Seq data were not available for this study, paired microarray data were used instead, which matched better and might yield more reliable results. In addition, the study did not investigate the regulatory mechanisms of the identified hub genes in GC, which warrants further investigation.


This study identified 1079 DEGs, with 638 upregulated and 441 downregulated, between human GC tissues and matched adjacent normal tissue specimens based on the GSE172032, GSE179581, and GSE181492 datasets. Further analysis of DEGs suggested that three types of hub genes namely, H3 Clustered Histone genes (H3C2, H3C8, H3C10, H3C12), HER4, and MAPT, could play critical roles in the progression of GC. The strong association of these predicted hub genes with the progression of GC has been identified in many studies by researchers. In summary, the present study provides a comprehensive analysis of key candidate genes and pathways in human GC using a bioinformatics approach. The identified hub genes and pathways provide valuable insights into the molecular mechanisms underlying GC development and progression and may serve as potential therapeutic targets and biomarkers for the early detection of GC.



differentially expressed gene


gastric cancer


gene expression omnibus: GO, gene ontology


log fold change


microsatellite instability


protein-protein interaction


the cancer genome atlas



We express our gratitude to the Bioinformatics Lab at the Department of Genetic Engineering and Biotechnology, Noakhali Science and Technology University, for permitting us to carry out the in-silico analysis.

Data sharing statement

No additional data are available.


The authors declare no funding was received during this study.

Conflict of interest

The authors declare that they have no conflict of interests related to this publication.

Authors’ contributions

Conceptualization: SDG, RKD; data curation: SH & SDG; formal analysis: SH; methodology: SH; writing – original draft: SH, IH; review and editing: SDG, RKD.


  1. Zali H, Rezaei-Tavirani M, Azodi M. Gastric cancer: prevention, risk factors and treatment. Gastroenterol Hepatol Bed Bench 2011;4(4):175-185 View Article PubMed/NCBI
  2. Nagini S. Carcinoma of the stomach: A review of epidemiology, pathogenesis, molecular genetics and chemoprevention. World J Gastrointest Oncol 2012;4(7):156-169 View Article PubMed/NCBI
  3. Rawla P, Barsouk A. Epidemiology of gastric cancer: global trends, risk factors and prevention. Prz Gastroenterol 2019;14(1):26-38 View Article PubMed/NCBI
  4. Peleteiro B, Severo M, La Vecchia C, Lunet N. Model-based patterns in stomach cancer mortality worldwide. Eur J Cancer Prev 2014;23(6):524-531 View Article PubMed/NCBI
  5. Hou R, Mu Z, Kang W, Liu Z, Na B, Niu W. Cancer mortality in 2020 and its trend analysis in Inner Mongolia during four time periods from 1973 to 2020. Front Oncol 2023;13:1096968 View Article PubMed/NCBI
  6. Garattini SK, Basile D, Cattaneo M, Fanotto V, Ongaro E, Bonotto M, et al. Molecular classifications of gastric cancers: Novel insights and possible future applications. World J Gastrointest Oncol 2017;9(5):194-208 View Article PubMed/NCBI
  7. Wang FH, Shen L, Li J, Zhou ZW, Liang H, Zhang XT, et al. The Chinese Society of Clinical Oncology (CSCO): clinical guidelines for the diagnosis and treatment of gastric cancer. Cancer Commun (Lond) 2019;39(1):10 View Article PubMed/NCBI
  8. Wang F, Xue Q, Xu D, Jiang Y, Tang C, Liu X. Identifying the hub gene in gastric cancer by bioinformatics analysis and in vitro experiments. Cell Cycle 2020;19(11):1326-1337 View Article PubMed/NCBI
  9. Lee YC, Chiang TH, Chou CK, Tu YK, Liao WC, Wu MS, et al. Association Between Helicobacter pylori Eradication and Gastric Cancer Incidence: A Systematic Review and Meta-analysis. Gastroenterology 2016;150(5):1113-1124.e5 View Article PubMed/NCBI
  10. Lei ZN, Teng QX, Tian Q, Chen W, Xie Y, Wu K, et al. Signaling pathways and therapeutic interventions in gastric cancer. Signal Transduct Target Ther 2022;7(1):358 View Article PubMed/NCBI
  11. He J, Qiu LX, Wang MY, Hua RX, Zhang RX, Yu HP, et al. Polymorphisms in the XPG gene and risk of gastric cancer in Chinese populations. Hum Genet 2012;131(7):1235-1244 View Article PubMed/NCBI
  12. He J, Wang MY, Qiu LX, Zhu ML, Shi TY, Zhou XY, et al. Genetic variations of mTORC1 genes and risk of gastric cancer in an Eastern Chinese population. Mol Carcinog 2013;52(Suppl 1):E70-E79 View Article PubMed/NCBI
  13. He J, Zhuo ZJ, Zhang A, Zhu J, Hua RX, Xue WQ, et al. Genetic variants in the nucleotide excision repair pathway genes and gastric cancer susceptibility in a southern Chinese population. Cancer Manag Res 2018;10:765-774 View Article PubMed/NCBI
  14. Li T, Gao X, Han L, Yu J, Li H. Identification of hub genes with prognostic values in gastric cancer by bioinformatics analysis. World J Surg Oncol 2018;16(1):114 View Article PubMed/NCBI
  15. Wang K, Yuen ST, Xu J, Lee SP, Yan HH, Shi ST, et al. Whole-genome sequencing and comprehensive molecular profiling identify new driver mutations in gastric cancer. Nat Genet 2014;46(6):573-582 View Article PubMed/NCBI
  16. Jiang P, Liu XS. Big data mining yields novel insights on cancer. Nat Genet 2015;47(2):103-104 View Article PubMed/NCBI
  17. Barrett T, Suzek TO, Troup DB, Wilhite SE, Ngau WC, Ledoux P, et al. NCBI GEO: mining millions of expression profiles—database and tools. Nucleic Acids Res 2005;33(Database issue):D562-D566 View Article PubMed/NCBI
  18. Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 2002;30(1):207-210 View Article PubMed/NCBI
  19. Batut B, van den Beek M, Doyle MA, Soranzo N. RNA-Seq Data Analysis in Galaxy. RNA Bioinformatics 2021;2284:367-392 View Article PubMed/NCBI
  20. Afgan E, Baker D, Batut B, van den Beek M, Bouvier D, Cech M, et al. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res 2018;46(W1):W537-W544 View Article PubMed/NCBI
  21. Vandel J, Gheeraert C, Staels B, Eeckhoute J, Lefebvre P, Dubois-Chevalier J. GIANT: galaxy-based tool for interactive analysis of transcriptomic data. Sci Rep 2020;10(1):19835 View Article PubMed/NCBI
  22. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 2015;43(7):e47 View Article PubMed/NCBI
  23. Ge SX, Jung D, Yao R. ShinyGO: a graphical gene-set enrichment tool for animals and plants. Bioinformatics 2020;36(8):2628-2629 View Article PubMed/NCBI
  24. Zhou G, Soufan O, Ewald J, Hancock REW, Basu N, Xia J. NetworkAnalyst 3.0: a visual analytics platform for comprehensive gene expression profiling and meta-analysis. Nucleic Acids Res 2019;47(W1):W234-W241 View Article PubMed/NCBI
  25. Xia J, Gill EE, Hancock RE. NetworkAnalyst for statistical, visual and network-based meta-analysis of gene expression data. Nat Protoc 2015;10(6):823-844 View Article PubMed/NCBI
  26. Xia J, Benner MJ, Hancock RE. NetworkAnalyst—integrative approaches for protein-protein interaction network analysis and visual exploration. Nucleic Acids Res 2014;42(Web Server issue):W167-W174 View Article PubMed/NCBI
  27. Xu J, Li Y. Discovering disease-genes by topological features in human protein-protein interaction network. Bioinformatics 2006;22(22):2800-2805 View Article PubMed/NCBI
  28. Hu Y, Zhang Y, Ren J, Wang Y, Wang Z, Zhang J. Statistical Approaches for the Construction and Interpretation of Human Protein-Protein Interaction Network. Biomed Res Int 2016;2016:5313050 View Article PubMed/NCBI
  29. Zhuang DY, Jiang L, He QQ, Zhou P, Yue T. Identification of hub subnetwork based on topological features of genes in breast cancer. Int J Mol Med 2015;35(3):664-674 View Article PubMed/NCBI
  30. Jin B, Wang W, Du G, Huang GZ, Han LT, Tang ZY, et al. Identifying hub genes and dysregulated pathways in hepatocellular carcinoma. Eur Rev Med Pharmacol Sci 2015;19(4):592-601 PubMed/NCBI
  31. Chang W, Ma L, Lin L, Gu L, Liu X, Cai H, et al. Identification of novel hub genes associated with liver metastasis of gastric cancer. Int J Cancer 2009;125(12):2844-2853 View Article PubMed/NCBI
  32. Langfelder P, Mischel PS, Horvath S. When is hub gene selection better than standard meta-analysis?. PLoS One 2013;8(4):e61505 View Article PubMed/NCBI
  33. Liu P, Ewald J, Pang Z, Legrand E, Jeon YS, Sangiovanni J, et al. ExpressAnalyst: A unified platform for RNA-sequencing analysis in non-model species. Nat Commun 2023;14(1):2995 View Article PubMed/NCBI
  34. Lánczky A, Győrffy B. Web-Based Survival Analysis Tool Tailored for Medical Research (KMplot): Development and Implementation. J Med Internet Res 2021;23(7):e27633 View Article PubMed/NCBI
  35. Sharma H, Navalkar A, Maji SK, Agrawal A. Analysis of drug–protein interaction in bio-inspired microwells. SN Applied Sciences 2019;1(8):819 View Article
  36. Grant GR, Manduchi E, Stoeckert CJ. Analysis and management of microarray gene expression data. Curr Protoc Mol Biol 2007;77:19.6.1-19.6.30 View Article PubMed/NCBI
  37. Imran M, Saleem S, Chaudhuri A, Ali J, Baboota S. Docetaxel: An update on its molecular mechanisms, therapeutic trajectory and nanotechnology in the treatment of breast, lung and prostate cancer. J Drug Deliv Sci Tec 2020;60:101959 View Article
  38. Ilson DH. Advances in the treatment of gastric cancer: 2019. Curr Opin Gastroenterol 2019;35(6):551-554 View Article PubMed/NCBI
  39. Sharifi-Rad J, Quispe C, Patra JK, Singh YD, Panda MK, Das G, et al. Paclitaxel: Application in Modern Oncology and Nanomedicine-Based Cancer Therapy. Oxid Med Cell Longev 2021;2021:3687700 View Article PubMed/NCBI
  40. Zhang Y, Yang Y, Chen Y, Lin W, Chen X, Liu J, et al. PD-L1: Biological mechanism, function, and immunotherapy in gastric cancer. Front Immunol 2022;13:1060497 View Article PubMed/NCBI
  41. Wu H, Huang M, Lu M, Zhu W, Shu Y, Cao P, et al. Regulation of microtubule-associated protein tau (MAPT) by miR-34c-5p determines the chemosensitivity of gastric cancer to paclitaxel. Cancer Chemother Pharmacol 2013;71(5):1159-1171 View Article PubMed/NCBI
  42. Xu J, Gong L, Qian Z, Song G, Liu J. ERBB4 promotes the proliferation of gastric cancer cells via the PI3K/Akt signaling pathway. Oncol Rep 2018;39(6):2892-2898 View Article PubMed/NCBI
  43. Song G, Zhang H, Chen C, Gong L, Chen B, Zhao S, et al. miR-551b regulates epithelial-mesenchymal transition and metastasis of gastric cancer by inhibiting ERBB4 expression. Oncotarget 2017;8(28):45725-45735 View Article PubMed/NCBI
  44. El-Gamal MI, Mewafi NH, Abdelmotteleb NE, Emara MA, Tarazi H, Sbenati RM, et al. A Review of HER4 (ErbB4) Kinase, Its Impact on Cancer, and Its Inhibitors. Molecules 2021;26(23):7376 View Article PubMed/NCBI
  45. Bilgiç F, Gerçeker E, Boyacıoğlu SÖ, Kasap E, Demirci U, Yıldırım H, et al. Potential role of chromatin remodeling factor genes in atrophic gastritis/gastric cancer risk. Turk J Gastroenterol 2018;29(4):427-435 View Article PubMed/NCBI
  46. Mitani Y, Oue N, Hamai Y, Aung PP, Matsumura S, Nakayama H, et al. Histone H3 acetylation is associated with reduced p21(WAF1/CIP1) expression by gastric carcinoma. J Pathol 2005;205(1):65-73 View Article PubMed/NCBI
  47. Rashid M, Shah SG, Verma T, Chaudhary N, Rauniyar S, Patel VB, et al. Tumor-specific overexpression of histone gene, H3C14 in gastric cancer is mediated through EGFR-FOXC1 axis. Biochim Biophys Acta Gene Regul Mech 2021;1864(4-5):194703 View Article PubMed/NCBI
  48. Wang GG, Allis CD, Chi P. Chromatin remodeling and cancer, Part I: Covalent histone modifications. Trends Mol Med 2007;13(9):363-372 View Article PubMed/NCBI
  49. Callari M, Sola M, Magrin C, Rinaldi A, Bolis M, Paganetti P, et al. Cancer-specific association between Tau (MAPT) and cellular pathways, clinical outcome, and drug response. Sci Data 2023;10(1):637 View Article
  50. Yu C, Chen J, Ma J, Zang L, Dong F, Sun J, et al. Identification of Key Genes and Signaling Pathways Associated with the Progression of Gastric Cancer. Pathol Oncol Res 2020;26(3):1903-1919 View Article PubMed/NCBI
  51. Dey L, Mukhopadhyay A. A systems biology approach for identifying key genes and pathways of gastric cancer using microarray data. Gene Reports 2021;22:101011 View Article
  52. Li Z, Zhou Y, Tian G, Song M. Identification of Core Genes and Key Pathways in Gastric Cancer using Bioinformatics Analysis. Russian Journal of Genetics 2021;57(8):963-971 View Article
  • Journal of Translational Gastroenterology
  • eISSN 2994-8754
Back to Top

A Systematic Exploration of Key Candidate Genes and Pathways in the Biogenesis of Human Gastric Cancer: A Comprehensive Bioinformatics Investigation

Shuhylul Hannan, Ithmam Hami, Rajib Kumar Dey, Shipan Das Gupta
  • Reset Zoom
  • Download TIFF