Introduction
Hepatocellular carcinoma (HCC) is a major global health challenge.1 Current diagnostic markers, such as alpha-fetoprotein (AFP), and imaging techniques, such as ultrasonography, lack sufficient accuracy for reliable screening.2 The Agile 3+ score, a simple noninvasive model that integrates liver stiffness measurement, serum biomarkers (AST, ALT, and platelet count), and clinical parameters (age, sex, and diabetes status), has been proposed for identifying advanced fibrosis in patients with suspected nonalcoholic fatty liver disease.3,4 However, its diagnostic performance in patients with HCC remains insufficiently investigated, underscoring the need for complementary biomarkers for HCC detection, as explored in this study. Large-scale prospective cohort studies with longitudinally collected blood samples stored in biobanks offer a promising approach for the discovery of HCC biomarkers.
Glycosylation is a key post-translational modification that enhances functional diversity.5 Aberrant glycosylation is a hallmark of various diseases, including cancer, and it can be used for disease diagnosis, staging, and potential therapeutic targeting.6 It plays a vital role in protein synthesis as well as in lipid and drug metabolism in humans.7 For example, monitoring the serum levels of oncofetal glycoproteins in patients with chronic liver disease is crucial for tracking progression to cirrhosis and HCC.8 The predominant forms of sialic acids in mammals are N-acetylneuraminic acid (Neu5Ac) and N-glycolylneuraminic acid (Neu5Gc).9 Humans do not synthesize Neu5Gc because of irreversible inactivation of the CMAH gene on chromosome 6p21.32, which encodes cytidine monophosphate–N-acetylneuraminic acid hydroxylase,10 and the absence of an active monooxygenase resulting from exon deletions or mutations that cause frameshifts in CMAH.11,12 Abnormal sialylation is often characterized by increased sialic acid levels on the cell surface. This not only elevates total sialic acid levels but also significantly alters the sialylation patterns of glycoproteins.13 Moreover, Neu5Gc preferentially accumulates in malignant tissues, leading to elevated levels of Neu5Gc-modified proteins in tumors.14–16
Our laboratory identified a novel lamprey immune protein (LIP) with lectin activity that specifically recognizes α2,3- or α2,6-glycosidic linkages of Neu5Gc-modified termini.17,18 Although Neu5Gc is not synthesized in healthy humans owing to the absence of the cmah gene, elevated levels of Neu5Gc-modified proteins are detectable in various tumor cells and tissues.19–21 N-glycan structures associated with Neu5Gc modification in patients with HCC have not been systematically analyzed. This study provides a theoretical foundation for the clinical development of tumor diagnostic markers and tumor-specific therapeutic targets and sheds light on the mechanisms underlying the abnormal Neu5Gc distribution on tumor cell surfaces.
Methods
Grouping and inclusion criteria
This study was approved by the Ethics Committees of Dalian Public Health Clinical Center (No. 2021-016-002), Affiliated Zhongshan Hospital of Dalian University (No. 2021053-1), and Liaoning Normal University (No. LL2020015) and was conducted from January 2020 to June 2022 in accordance with the Declaration of Helsinki (2024 revision).22 A total of 8,791 participants were enrolled, comprising a training cohort (n = 8,043) and a validation cohort (n = 748). The training cohort included 144 patients with chronic hepatitis B (CHB), 144 with chronic hepatitis C (CHC), 192 with liver cirrhosis (LC), 634 with HCC, 6,768 healthy individuals, and 203 postoperative samples. The validation cohort included 98 patients with CHB, 103 with LC, 192 with HCC, and 355 healthy individuals. All participants provided written informed consent. Participants were required to be at least 18 years old with no prior hepatitis treatment. Exclusion criteria were pregnancy, acute HBV infection, HIV coinfection, and chronic alcohol use. Chronic HBV infection was defined as HBsAg positivity for >6 months with elevated HBeAg status, ALT, or AST.23 LC was diagnosed based on ultrasonography, hypoalbuminemia, and prolonged prothrombin time.24 Patients with liver stiffness measurements obtained by transient elastography (FibroScan; Echosens, Paris, France) ≥14.6 kPa were preferentially included. HCC was diagnosed using ultrasonography or computed tomography of the liver masses and an AFP level of 400 ng/mL.25
Study participants and recruitment
This was a multi-center, cross-sectional study. Healthy participants were consecutively enrolled from physical examinations of faculty, staff, and freshmen at Liaoning Normal University (2021–2022). Patients with HCC, LC, and chronic hepatitis were consecutively recruited from Dalian Public Health Clinical Center and Zhongshan Hospital Affiliated of Dalian University. Strict inclusion and exclusion criteria were applied to exclude individuals with other malignancies, severe systemic diseases, or incomplete clinical data.
At Dalian Public Health Clinical Center, eligible patients were identified from the institutional biobank using standardized diagnostic criteria. All serum samples were stored under uniform conditions, labeled with QR codes and unique anonymous identifiers, and dispatched to the laboratory without group information. Laboratory personnel were blinded to clinical diagnoses during all testing procedures. Group assignments were only unlocked after completion of all laboratory analyses. Study subjects were recruited from Affiliated Zhongshan Hospital of Dalian University. Samples were collected and coded before laboratory testing and data documentation. Clinical information and patient grouping status were provided monthly by clinical physicians.
The validation set was assessed in two independent blinded phases: first, samples were tested blindly using QR codes; second, random samples were re-coded and retested blindly by independent laboratory investigators, with results cross-verified against final discharge diagnoses. For healthy participants, samples were coded by the university hospital before testing; individuals with persistently abnormal Neu5Gc levels were excluded after medical record review. All data were regularly cross-verified against electronic medical records to ensure diagnostic accuracy and minimize information bias.
LIP enzyme-linked immunosorbent assay (ELISA)
A standard lectin ELISA was performed following a previously described protocol,26 with modifications for serum versus urine sample handling, requiring post-dilution testing of serum samples. Serum samples were diluted 1:100, and 40 µL was coated onto a 96-well plate and incubated overnight at 4 °C. The following day, the plate was washed once with TBST, and a 1:9 dilution of 10X Carbo-Free Blocking Solution (VECTOR, Newark, CA, USA) was added for blocking at 37 °C for 2 h. After removing the blocking solution, LIP-biotin was added and incubated at 37 °C for 3 h. The wells were then washed thrice with TBST. This was followed by incubation with streptavidin at 37 °C for 1 h and four additional washes with TBST. TMB substrate solution was added for color development (5–8 min), and a stop solution was applied. Optical density was measured using an ELISA reader. LIP-biotin was purified and labeled using an IgG calibration protein and TMB chromogenic solution (Solarbio, Beijing, China).27
Lectin blot assay
The lectin blot procedure is similar to Western blotting but involves lectin-glycan binding rather than antigen-antibody interaction.28 Signal amplification relies on biotin-lectin recognition, which is detected on a PVDF membrane using enhanced chemiluminescence. Blood samples were diluted fivefold with physiological saline, denatured, subjected to electrophoresis, and transferred to a PVDF membrane. The membrane was fixed with acetone for 5–10 min, air-dried, baked at 60 °C for 20 min, and activated with methanol to enable LIP-biotin binding, followed by standard enhanced chemiluminescence imaging.29,30
Nano-liquid chromatography chip quadrupole time-of-flight mass spectrometry (nano-LC–Q-TOF MS)
The Coomassie-stained gel band corresponding to the LIP-biotin-identified protein was trypsin-digested for nano-LC–Q-TOF MS.31,32 Nano-RPLC was performed using mobile phase A (0.1% formic acid and 5% acetonitrile in water) on a C18 pre-column (100 µm × 3 cm, 3 µm, 150 Å) at 2 µL/min. This was followed by an 8-min desalting step using an Eksigent NanoLC-Ultra™ system (AB SCIEX LLC, Hong Kong, China).
An analytical C18 reversed-phase column (75 µm × 15 cm, 3 µm, 120 Å) was used with a 10-min gradient from 5% to 40% mobile phase B (0.1% formic acid and 95% acetonitrile in water). Mass spectrometry (MS) was conducted using a Triple TOF 5600+ system (AB SCIEX) with a Nanospray III ion source. The spray voltage was set at 2.3 kV, and curtain gas, nebulizer gas, and source temperature were maintained at 30, 14, and 150 °C, respectively. Data acquisition was performed in the information-dependent acquisition mode. A full-scan TOF-MS was performed at 250 ms, followed by up to 26 MS/MS scans per cycle for precursor ions (charge states 2+ to 5+, minimum intensity 200 cps). Each MS/MS scan had an accumulation time of 80 ms, with a total cycle time of 2.5 s. A constant collision energy was used, and a 3-s dynamic exclusion was applied to avoid repeated ion sampling.33
Quantitative N-glycoproteomics using stable isotopic diethyl labeling
Serum IgG glycoproteins from patients with HCC and healthy participants were analyzed using quantitative N-glycoproteomics with stable isotopic diethyl labeling to obtain comprehensive data on N-glycosylation modifications (see Supplementary Information for details).34 Proteins were reduced with 10 mM TCEP (CAS: 51805-45-9) at 55 °C for 1 h, alkylated with 20 mM iodoacetamide (IAA; CAS: 144-48-9) at room temperature (20–25 °C) for 30 min in the dark, and digested with trypsin at a 1:50 (w/w) ratio at 37 °C for 16 h.35 The digests were desalted using C18 SPE tips and eluted with 50% and 80% acetonitrile containing 0.1% trifluoroacetic acid. The eluates were combined and dried using a SpeedVac concentrator. Intact N-glycopeptides were enriched using ZIC-HILIC SPE tips,36 labeled by reductive diethylation with acetaldehyde and acetaldehyde-13C2, desalted, and dried for LC-MS analysis. For C18-RPLC-MS/MS (higher-energy collisional dissociation; HCD) analysis, a C18 analytical column (75 µm × 75 cm, 5 µm) and trap column (200 µm × 5 cm, 5 µm) were used. The flow rates were 5 µL/min for loading and 300 nL/min for analysis. The gradient was: 2% B for 12 min, 2%–40% B in 188 min, 40%–95% B in 10 min, 95% B for 5 min, and equilibration for 20 min. The ion transfer tube was maintained at 300 °C, with a spray voltage of 1.9 kV. The MS spectra were acquired from m/z 700–2,000 at a resolution of 60,000. The AGC target was 3 × 106 with a 20 ms maximum injection time. MS/MS spectra were acquired at a resolution of 30,000 using the Top20 DDA method with HCD and stepped NCEs (20%, 30%, and 31%). The AGC target was 5 × 105 with a 250 ms maximum injection time, an isolation window of 3.0 m/z, and a dynamic exclusion time of 20 s.37
Western blot (WB) and immunohistochemical staining
CMAH antibody (sc-365023, Santa Cruz, CA, USA) was used at a concentration of 0.3 µg/mL. The other antibodies (HIF-1α (D123654), VEGF (D360788), and GLUT1 (D160433)) were purchased from Sangon Biotech (Shanghai, China), and lectins (AAL [B-1395-1] and SNA [B-1305-2]) were purchased from Vector (USA). WB and immunohistochemical staining were performed according to standard protocols,38,39 with the recommended concentrations of antibodies/lectins used in this study.
Nested polymerase chain reaction (PCR)
Two pairs of CMAH gene primers were designed to obtain full-length transcripts from the human genome using NCBI data. The first primer pair was positioned at the 5′ and 3′ ends of the gene to amplify the entire CMAH transcript, resulting in a longer product. The second primer pair was designed to flank the functional domain and yield a shorter amplification product. PCR amplification was initially performed using the first primer pair targeting the larger fragment for 40 cycles. Subsequently, the amplified products were subjected to a second round of PCR using shorter-fragment primers, following established nested PCR protocols.40
Flow cytometry and imaging flow cytometry
Cells were incubated with Alexa Fluor 488-labeled LIP (Thermo Fisher Scientific, catalog #A30006) for 10 min, followed by washing with PBS to remove unbound proteins. The cells were then stained with propidium iodide (Thermo Fisher Scientific, catalog #R37169) at room temperature for 15 min, collected, centrifuged at 1,000 rpm for 5 min, resuspended in 500 µL of PBS, and analyzed by flow cytometry.41 The LIP, labeled with Alexa Fluor 488, was incubated with cells and subsequently detected by flow cytometry and fluorescence confocal microscopy. As the detection did not involve an antigen-antibody reaction, no isotype control antibodies were used. Instead, a positive control containing another commercial lectin, SNA, was used. The results were validated when SNA recognition matched the recognition pattern of LIP.42
Immunofluorescence detection of Neu5Gc on the cell surface
Cells were seeded in a four-chamber confocal dish (NEST, catalog #801002) at approximately 500 µL of cell suspension per well. After cell adhesion, the indicated drugs were added for co-incubation at specified time points, followed by replacement of the culture medium with fresh medium. LIP (0.2 µg/µL) was added and incubated for 2 h. Subsequently, a LIP monoclonal antibody, diluted 1:500 from an initial concentration of 1 µg/µL, was added and incubated for another 2 h. After washing with PBS, fluorescent secondary antibody (Thermo Fisher Scientific, catalog #A-11029, RRID: AB_2534088) was applied for 30 min. Fluorescence was detected using confocal microscopy following standard immunofluorescence protocols.43
Induction of isotopic glycans and sample collection for metabolomic analysis
Culture media were prepared according to the specific requirements of the cell lines. To eliminate Neu5Gc, human serum was used instead of fetal bovine serum, and cells were cultured in this medium for two to three passages. After more than 10 days of culture, Neu5Gc expression was undetectable using flow cytometry. Subsequently, 30 mM C13-labeled UDP-GlcNAc isotope was added, and the cells were incubated. Samples were collected at various time points, with the procedure involving flash-freezing the cells in liquid nitrogen, scraping them into microtubes, flash-freezing again, and storing at −80 °C.44
Non-targeted metabolomics detection
Initially, 500 µL of the extract was added to the sample, followed by two cycles of freezing in liquid nitrogen for 1 min, thawing, and vortexing for 30 s. The samples were then ultrasonicated in an ice water bath for 10 min and kept at −4 °C for 1 h. Subsequently, they were centrifuged at 13,800 × g at 4 °C for 15 min, and the supernatant was collected for analysis.45 The data obtained were due to isotope labeling of the substrate, which allowed for the identification of metabolites containing different carbon isotopes in various pathways. The analysis focused on C13-labeled intermediate metabolites, with predictions based on time-series data and the sequence of metabolic steps. Further refinement of the experiment can be achieved through metabolic flux analysis.46
Quantitative reverse-transcription PCR (qRT-PCR)
As described in previous studies, cells were lysed in RNAex Pro Reagent (AG21102, Accurate Biotechnology, Changsha, China) to extract RNA. The extracted RNA was then reverse-transcribed into cDNA using the Evo M-MLV RT Kit with gDNA Clean for qPCR II (AG11705, Accurate Biotechnology, Changsha, China). qRT-PCR was performed in 20 µL reaction volumes using TB Green® Fast qPCR Mix (AG11701, Accurate Biotechnology, Changsha, China), following standard qRT-PCR protocols.47 The details of the primers used are provided in Supplementary Table 1.
Study design and statistical analysis
This analysis was conducted in accordance with the reporting recommendations for tumor marker prognostic studies.48 Statistical analysis was performed using GraphPad Prism 8.0 (GraphPad). The diagnostic values of N-glycans and AFP were evaluated using a receiver operating characteristic (ROC) curve. The ROC cut-off (also called MID ROC) was the best threshold based on the ROC analysis. Diagnostic case-control studies were performed using the area under the ROC curve (AUC). Statistical significance was set at P < 0.05.49,50 Multivariate linear regression for the combined analysis of AFP and Neu5Gc was performed using R Software (Version 3.5.3).51
Results
Diagnostic value of detecting Neu5Gc in serum by LIP-ELISA for HCC
Previous studies demonstrated that LIP could recognize the N-glycosylated terminal Neu5Gc modification of the uromodulin protein in urine, aiding in the early diagnosis of bladder cancer.21 To explore the potential of Neu5Gc as a serum tumor biomarker detected by LIP, a prospective training cohort study was conducted using residual serum samples from routine physical examinations for faculty, staff, and incoming freshmen at Liaoning Normal University, with approval from the institutional ethics committee and broad informed consent obtained from all participants prior to sample collection. Serum Neu5Gc levels in healthy individuals aged 18–105 years were analyzed according to age and sex. Given the higher incidence of HCC in individuals over 50 years of age, stratification was performed based on age (Fig. 1A), with further comparisons made at 10-year intervals for men (Supplementary Fig. 1A) and women (Supplementary Fig. 1B). Our data showed no statistically significant differences in Neu5Gc levels across the different age groups, with a mean serum Neu5Gc level of 10.81 ng/µL in healthy subjects.
From 2015 to 2022, blood samples were collected from patients diagnosed with CHB (n = 144), CHC (n = 144), LC (n = 192), and HCC (n = 634) and analyzed using LIP-ELISA (Fig. 1B). The mean Neu5Gc levels (CNeu5Gc) in each group were: CHB, 14.46 ng/µL; CHC, 14.87 ng/µL; LC, 18.26 ng/µL; and HCC, 46.74 ng/µL. Diagnostic performance was systematically evaluated by ROC curve analysis for each disease group versus healthy controls, with complete ROC parameters as follows (Fig. 1C): HCC vs. healthy controls: AUC = 0.8551, standard error (SE) = 0.01894, 95% confidence interval (CI) = 0.8188–0.8915, P < 0.0001; LC vs. healthy controls: AUC = 0.7543, SE = 0.01953, 95% CI = 0.7160–0.7926, P < 0.0001; CHB vs. healthy controls: AUC = 0.6373, SE = 0.02873, 95% CI = 0.5810–0.6930, P < 0.0001; CHC vs. healthy controls: AUC = 0.6583, SE = 0.02671, 95% CI = 0.6059–0.7106, P < 0.0001. As the maximum likelihood ratio (Youden’s index)52 method failed to yield a cut-off with concurrent sensitivity and specificity >80%, the optimal serum Neu5Gc cut-off was determined as 22.85 ng/µL using the top-left closest point method (minimizing Euclidean distance to the perfect diagnostic point (0,1)), a gold-standard strategy for balancing clinical sensitivity and specificity.53,54 At this cut-off value, the key diagnostic indices for HCC identification were: sensitivity = 80.21%, specificity = 96.01%, positive predictive value = 89.74%, negative predictive value = 93.18%, and overall diagnostic accuracy = 92.46%. The previously reported specificity of 99.94% corresponded to a high-specificity subthreshold of the ROC curve, which was unsuitable for general HCC screening.
Further analysis of 588 samples according to tumor, node, and metastasis staging revealed that most cases were in the T0–T2 stages, which is consistent with the typical distribution pattern of tumor markers in early cancer screening.55,56 Serum Neu5Gc levels in patients with HCC were significantly higher than those in healthy participants across different stages (Fig. 1D). Our method of using LIP to detect Neu5Gc effectively identified HCC samples with abnormally elevated Neu5Gc levels. To assess the therapeutic implications of serum Neu5Gc levels, we analyzed samples from untreated patients (Fig. 1E and Supplementary Fig. 1C), those who had undergone fewer than three cycles of chemotherapy post-HCC surgery, and those who had completed more than three cycles. We found that Neu5Gc levels correlated with treatment status, suggesting its potential use as an indicator of therapeutic efficacy. Therefore, our detection method offers a novel biomarker for the clinical diagnosis of HCC and serves as a surrogate marker for evaluating the effectiveness of HCC treatment.
Identification of Neu5Gc-modified glycoprotein in serum and its combination with AFP for diagnosis
The primary Neu5Gc-modified glycoproteins in serum recognized by LIP were analyzed using the LIP-biotin-based lectin-Western blot (LIP-WB) method. The corresponding bands were subjected to enzymatic digestion on Coomassie-stained gels, followed by MS identification (Supplementary Fig. 2A). Proteomic analysis was performed using nano-LC–Q-TOF MS, and MS/MS spectra were searched using Sequest within Proteome Discoverer (Thermo Fisher) against the UniProt Homo sapiens IgG database (Supplementary Fig. 2B). The main Neu5Gc-modified molecule identified was IgG glycoprotein.
To validate the accuracy of the LIP-identified targets in serum, samples were collected from newly admitted patients between 2020 and 2022 to establish a validation cohort. A total of 748 samples were collected: 355 from healthy subjects and 192, 103, and 98 from patients with HCC, LC, and CHB, respectively. Notably, the full cohort of 634 HCC samples was simultaneously assigned for subsequent AFP diagnostic performance analysis. However, 42 samples were excluded due to the lack of both definite tumor staging and liver fibrosis data. For healthy controls, 7,211 samples were tested. Among them, 443 samples with extremely abnormal indices were excluded, and 3,994 of the remaining 6,768 samples with available AFP detection values were screened for subsequent combined statistical analysis. Changes in serum IgG glycosylated proteins were assessed using sandwich LIP-ELISA (Supplementary Fig. 2C), in which plates were pre-coated with an IgG antibody. The results showed that the content of Neu5Gc-modified proteins was significantly higher in the HCC group than in the other groups (Fig. 2A), with serum Neu5Gc levels in HCC patients approximately sevenfold higher than those in healthy individuals, and an AUC of 0.8559 in the validation cohort (Fig. 2B), consistent with direct serum Neu5Gc detection. The IgG protein was digested with N-glucosidase to remove all N-glycan modifications. As shown in Supplementary Figure 2D, the molecular weight of the sample decreased after enzymatic digestion. Before enzyme digestion, the sample could be recognized by LIP, but after enzyme digestion, it could not be recognized by LIP-biotin, indicating that the recognition site of LIP was directly related to N-glycosylation. The correlation with O-glycosylation modification was verified by an O-glycosidase digestion experiment (Supplementary Fig. 2E), indicating that relatively few sugar chains are present on O-glycosylation-modified IgG, which has little influence on molecular weight. After N-glucosidase digestion, the samples from each group were titrated using ITC with LIP (Supplementary Fig. 2F and G). The results showed no binding between LIP and IgG after N-glucosidase digestion in any group. These findings confirmed that LIP primarily recognizes N-glycosylated terminal Neu5Gc modifications of IgG proteins in serum.
To identify additional biomarkers for HCC diagnosis in combination with Neu5Gc, we analyzed AFP levels (Fig. 2C) using the clinical standard of CAFP >20 ng/mL. Among the 592 samples in the retrospective cohort (with clinical AFP measurements), 152 of the HCC samples met the diagnostic cut-off, resulting in a sensitivity of 25.67%. For a higher AFP threshold (CAFP >400 ng/mL),57,58 only 20 samples met the criteria, with a sensitivity of 3.3%. Interestingly, 36 AFP-positive samples exhibited low levels of Neu5Gc-modified proteins. Of the 440 AFP-negative HCC samples (AFP ≤ 20 ng/mL), 332 had Neu5Gc levels > 22.85 ng/µL. Parallel combined detection of AFP and Neu5Gc (defined as positive if either marker was above its cut-off) was further applied, achieving a sensitivity of 81.69%, specificity of 96.02%, and overall diagnostic accuracy of 94.64% for HCC. Multivariate linear regression analysis was additionally performed using R Software (Version 3.5.3) to quantify the combined effect: the regression equation was y = 1.8926 + 0.0024X1 (AFP) + 43.1077X2 (Neu5Gc), with an R2 of 0.7016 (70.2% explanatory power). AFP showed no significant independent effect (P = 0.0822), while Neu5Gc presented a significant positive independent association with the diagnostic outcome (P < 0.05). Thus, the combined use of LIP-detected Neu5Gc and AFP assays offers an effective dual-index diagnostic method (Fig. 2D and Table 1), which is particularly useful for screening HCC patients with low AFP levels. This dual approach significantly enhanced the sensitivity of HCC diagnosis.
Table 1Statistics of double-index combined diagnosis of HCC
| Sensitivity% | Specificity% | AUC |
|---|
| AFP | 36.11 | 96.3 | 0.6271 |
| AFP (20 µg/L) | 25 | 99.85 | |
| AFP (400 µg/L) | 3.28 | 100 | |
| Neu5Gc | 57.77 | 99.91 | 0.8441 |
| Neu5Gc (22.85 ng/µL) | 73.99 | 99.94 | |
| Fucose | 37.5 | 97.92 | 0.724 |
| Fucose (0.6084 OD) | 83.33 | 66.67 | |
| Neu5Gc+AFP | 81.93 | 99.94 | |
| Neu5Gc+Fucose | 91.67 | 66.67 | |
Characteristics of N-glycosylation and differences in glycan composition detection and expression alterations of glycan-related genes between healthy subjects and HCC
To further elucidate the differences in N-linked glycosylation of IgG between healthy subjects and those with HCC, we employed C18-reversed-phase liquid chromatography coupled with tandem mass spectrometry (C18-RPLC-MS/MS) using HCD to comprehensively characterize the N-glycan structures. This approach enabled the detection of differences in glycosylation modifications and potential biomarkers. Glycomic mass spectrometric analysis was conducted on the purified IgG protein, and the main workflow is depicted in Supplementary Figure 3A. Examples of molecular weight prediction and B/Y ion matching for specific monosaccharide and glycan structures are shown in Supplementary Figure 3B.
Subsequent analysis of glycopeptides meeting the criteria of a fold change > 1.5 and P < 0.05 identified a total of 244 glycopeptides. A volcano plot (Fig. 3A) illustrates the upregulation and downregulation of these glycopeptides, revealing that 220 glycopeptides were significantly elevated (indicated in red), whereas 24 were decreased (indicated in blue). Overall, the N-glycan modifications in the HCC group showed a ninefold increase compared to those in healthy subjects. The distribution of N-glycan types, including high-mannose (4%), hybrid (11%), and complex (85%), is presented in Figure 3B. As depicted in Supplementary Figure 3C, the cleavage sites of the basic B/Y ions and the corresponding mass spectrum peaks matched the complete glycan structures. This analysis identified 13 characteristic glycan structures shared by healthy subjects and HCC patients, while two unique glycan chains were identified in healthy subjects and six were specific to the HCC group. Terminal Neu5Gc-modified glycan structures are illustrated in Figure 3C. The observed differences in glycan structures could be attributed to enhanced glycosylation, driven by an increased abundance of multi-branched glycan chains. We classified and compared 220 glycan chain structures based on variations in their terminal monosaccharides (Fig. 3D). For Neu5Gc-terminated glycan structures, notable changes included a 2.7-fold increase in afucosylated forms, a 3.3-fold increase in core-fucosylated forms, and a 2.9-fold increase in terminal-fucosylated forms. Additionally, glycan chain profiling was performed for structures terminated by other monosaccharides, including mannose, Neu5Ac, galactose, core fucose, sialyl Lewis A (SLeA), and sialyl Lewis X (SLeX). Serum IgG-derived Neu5Gc-terminated N-glycoproteins in HCC patients were significantly elevated, and levels of other monosaccharide-modified glycans were also substantially increased in the HCC group.
We further analyzed public databases for enzymes involved in Neu5Gc biosynthesis (Fig. 4A) and glycan modification (Fig. 4B). Using data from The Cancer Genome Atlas and the UALCAN database (Supplementary Fig. 3D), 25 genes with significant expression changes were identified and subsequently validated by qRT-PCR using RNA extracted from peripheral blood leukocytes of clinical patients. qRT-PCR analysis demonstrated significantly higher expression levels of NEU1, CMAH, B3GALNT2, ALG2, FUT8, and ST3GAL2 in HCC than in the LC group (Fig. 4C), indicating their potential utility in the differential diagnosis of HCC. Conversely, SLC35A1, B3GALT4, and MGAT2 were most highly expressed in the LC group (Fig. 4D), while NANS exhibited the highest expression in the CHB group relative to other groups.
These findings suggest that a combination of these differentially expressed genes could serve as effective diagnostic markers to distinguish HCC, LC, and CHB. To validate this, a heat map for the 10 key differentially expressed genes was generated (Fig. 4E), and AUC analysis was performed for each individual gene (Fig. 4F). The differential expression profiles of these genes across groups effectively distinguished HCC from other liver diseases. Further investigation using the Human Protein Atlas database (Supplementary Fig. 3E) revealed that the expression levels of the corresponding proteins in HCC tissues also exhibited an identical upregulation trend. Therefore, this study identified a panel of N-glycosylation-related genes that could serve as valuable markers for differential diagnosis of HCC.
Analysis of CMAH gene expression and mutations in HCC patients
The elevated levels of Neu5Gc in patients with HCC may suggest alterations in the enzyme responsible for the hydroxylation of Neu5Ac and its subsequent conversion to Neu5Gc in vivo. To investigate this, we analyzed the expression of CMAH in HCC patients. Western blotting (Fig. 5A and B) and immunohistochemistry (IHC) (Fig. 5C and D) demonstrated that CMAH protein expression was significantly upregulated in both diethylnitrosamine-induced HCC and human HCC tissues. Peritumoral non-tumor liver tissues, which often exhibited pronounced cirrhosis, also showed moderate positivity for CMAH. In contrast, tissues with normal liver morphology at the tumor margin were negative for CMAH. Notably, nearly all human HCC tissue sections displayed intense, diffuse CMAH immunostaining, indicating selective upregulation of CMAH in HCC tumor tissues relative to normal hepatic tissues.
To further characterize CMAH genetic alterations in HCC, nested PCR targeting the human CMAH locus was performed on gDNA and cDNA reverse-transcribed from RNA isolated from peripheral blood leukocytes of HCC patients. As shown in Figure 5E1, agarose gel electrophoresis confirmed the high quality of the extracted RNA and DNA, supporting the credibility of the experimental results. Two rounds of nested PCR were conducted, generating shorter cDNA-derived amplicons and longer gDNA-derived fragments (Fig. 5E2). The amplified DNA was excised from the gel, purified (Fig. 5E3), and concentrated (Fig. 5E4) for subsequent cloning. The purified target fragment was ligated into a T-vector for Sanger sequencing. The ligation product was transformed into Escherichia coli competent cells, which were plated on selective solid medium, and single colonies were picked for propagation. Colony PCR was performed to screen for positive clones with successful ligation (Fig. 5E5). Qualified single colonies were expanded, and plasmids were extracted after reaching logarithmic growth (Fig. 5E6). PCR amplification with T-vector primers yielded plasmid samples suitable for sequencing (Fig. 5E7). Sanger sequencing revealed multiple alterations in the amplified sequences from the HCC samples, including single nucleotide additions or deletions (Supplementary Fig. 4A). BLAST analysis against the NCBI database confirmed that the amplified gene was the human CMAH gene (Supplementary Fig. 4B). Conversion of the nucleotide sequence to an amino acid sequence using ORF Finder showed a high degree of sequence homology (Supplementary Fig. 4C), confirming that the amplified product contained a complete CDS region with a continuous coding sequence. Motif analysis revealed substantial changes in the motif sequences of the HCC samples, including numerous gene mutations (Supplementary Fig. 4D). While motifs 1–5, which constitute the ULAG superfamily domain, were conserved in normal genomic sequences, post-sequencing results showed that only motif 1 remained conserved, while motifs 4–5 were replaced by motifs 6–9 (Supplementary Fig. 4E). This indicated significant alterations in the amino acid sequence of the HCC group, although the corresponding domain was still formed. Analysis of CMAH using the TSVdb gene splicing tool (Supplementary Fig. 4F) revealed multiple SNP mutations across various exons, which may contribute to aberrant transcription in tumors. According to the Ensembl database (Fig. 5F), which provides comprehensive genomic information, all detected mutations in CMAH were cataloged (Fig. 5G). The Ensembl platform supports studies on comparative genomics, sequence variation, transcriptional regulation, and gene annotation, providing a basis for understanding gene function and associated diseases. A diagram of the alternative splicing patterns of CMAH in HCC was constructed based on the sequencing results (Fig. 5H). Our experimental validation and data analysis suggest that CMAH may undergo mutations under tumor conditions, leading to a high likelihood of alternative splicing and potential gene reactivation. Additionally, the elevated expression of CMAH protein detected in the peripheral blood leukocytes and tissues of HCC patients indicates the presence of a CMAH-dependent Neu5Gc synthesis pathway in tumors, which is potentially detectable by LIP.
Enhancement of sialic acid synthesis in HCC cells via exogenous glycan intake in vitro
After successfully establishing a cellular hypoxia model using SMMC-7721 cells induced by CoCl2, the cells were incubated with 30 mM Neu5Gc (end product), CMP-Neu5Ac (intermediate product), or UDP-GlcNAc (HBP pathway product and a substrate for polysaccharide synthesis) for 48 h. Glycan substrates were replenished every 12 h, and the treated cells were collected for high-content detection (Supplementary Fig. 5A). The levels of Neu5Gc on the cell surface were analyzed using flow cytometry (Supplementary Fig. 5B) and confocal imaging (Supplementary Fig. 5C) with LIP. After culturing with human serum for 10 days, Neu5Gc was barely detected on the cell surface. However, once hypoxia-inducible factor (HIF) expression stabilized, exogenous glycan substrates (30 mM) were added to supplement glycosylation precursors. After 48 h, the levels of Neu5Gc were restored or exceeded the basal levels typically found in tumor cells. It was observed that the Neu5Gc content in the cell membrane significantly increased, particularly in hypoxia-treated cells. Additionally, metabolomic analysis was performed on samples collected at various time points (0, 1, 12, 24, 36, and 48 h), identifying a total of 2,966 metabolites, including differentially expressed metabolites (Supplementary Fig. 5D and E).
To elucidate the regulatory mechanism of sialic acid synthesis in HCC cells, we focused on tracking the levels of key intermediates at different time points to infer their metabolic sequence. Temporal changes in intermediate metabolites along the sialic acid synthesis pathway over time were observed (Fig. 6A and Supplementary Fig. 5F). This non-targeted metabolomics approach effectively uncovered the metabolic pathways involved. As shown in Figure 6B, isotope labels were detected for three metabolites, including glucose-6-phosphate, which is involved in glycolysis and gluconeogenesis. Additionally, five intermediate metabolites in the tricarboxylic acid cycle, acetyl-CoA, and malic acid were identified. Amino acid metabolism provided the carboxyl groups essential for sialic acid synthesis. The two metabolic pathways most directly linked to sialic acid synthesis were the hexosamine biosynthetic pathway (HBP) and polysaccharide synthesis pathway.
To validate the proposed sialic acid synthesis pathway inferred from the metabolomics data, we determined the main pathways based on intermediate metabolites and screened for key target inhibitors. N-acetyl-2,3-dehydro-2-deoxyneuraminic acid is an effective neuraminidase inhibitor. Acriflavine hydrochloride, a preservative with antitumor activity, effectively inhibits HIF-1.59 WB 117, an inhibitor of glucose transporter 1 (Glut 1), downregulates glycolysis, induces cell cycle arrest, and inhibits cancer cell growth both in vitro and in vivo.60 Tanifluoric acid, a precursor of nifluoric acid, is converted by esterases in vivo to reduce the synthesis and release of mucin in animal models and cell cultures.61 The expression of genes related to the HIF pathway (HIF-1, ARNT/HIF-1β), glucose transport (SLC2A1/Glut 1), and sialic acid metabolism (NANS, NANP, MAS, CMAH, and NEU1) was assessed by qRT-PCR (Fig. 6C). The results of inhibitor experiments showed that each inhibitor effectively targeted the primary pathway. Acriflavine, an upstream pathway inhibitor, also inhibits downstream targets such as Glut 1 and sialic acid-related synthetases.62 To verify the synergistic effects of these inhibitors, various combinations were tested to inhibit multiple pathways. As shown in Figure 6D, pairwise combinations excluding sialidase inhibitors were evaluated because sialidase inhibition alone did not affect sialic acid synthesis. The combined use of inhibitors confirmed their synergistic effects on the relevant pathways. Upregulation of the hypoxia and glucose transport pathways enhanced sialic acid synthesis, whereas sialic acid degradation downregulated its synthesis. Flow cytometry was subsequently used to measure the Neu5Gc content on the cell surface. As shown in Figures 6E and F, DANA alone effectively inhibited sialic acid degradation. The results obtained with other inhibitors, whether used alone or in combination, were largely consistent with the qRT-PCR results (Fig. 6C and D).
As illustrated in Supplementary Figures 5G and H, KEGG enrichment analysis demonstrated that the differential metabolites were mainly involved in amino acid metabolism and linked energy/carbohydrate metabolic pathways, with concomitant alterations in lipid, vitamin, and transport-related biological processes. Metabolite expression statistics at different time points revealed increased consumption of glucose-6-phosphate and 3-phospho-D-glycerophosphate during glycolysis and gluconeogenesis. In addition, the consumption of aspartic acid and N-acetylserine, which are related to succinic acid and amino acid metabolism, peaked at 24 h in the tricarboxylic acid cycle before output increased, leading to the synthesis of sialic acid in cells. This synthesis is associated with the HBP and polysaccharide synthesis pathway.
Effect of an exogenous high-sialic acid diet on Neu5Gc content in peripheral blood and liver tissue in vivo
Bioinformatic database analysis highlighted the critical role of the CMAH gene in its activated state, as it promotes the hydroxylation of Neu5Ac to Neu5Gc. To investigate this further, a CMAH gene knockout mouse model (CMAH−/−) was constructed (Supplementary Fig. 6A–D), wherein sialic acid metabolism resembled that of normal human metabolism. In the absence of the CMAH gene, Neu5Gc in humans comes exclusively from exogenous absorption and ingestion. In the experimental model, commercial Neu5Ac monosaccharide and mucin (extracted from porcine submandibular glands) were added to the diet at a concentration of 100 µg/g. As shown in Figure 7A, serum Neu5Gc levels in CMAH−/− mice increased progressively with the intake of an exogenous diet, eventually reaching levels comparable to those in wild-type mice. This indicates that endogenous synthesis of Neu5Gc can be fully compensated for by an exogenous source, maintaining Neu5Gc levels in the animal (Fig. 7B).
Time-series analysis and metabolomic data demonstrated that sialic acid synthesis is influenced by both exogenous intake and endogenous production. As shown in Figure 7C, mice fed a high-fat diet (HFD) exhibited a significant increase in blood lipid levels and Neu5Gc content. This prompted further examination of changes in the hypoxia pathway in mice stimulated by an exogenous diet, as depicted in Figure 7D and E. IHC for HIF-1α, GLUT1, VEGF, CMAH, LIP, and SNA lectin revealed that fatty lesions developed in the liver tissue of mice subjected to both HFD and HFD supplemented with sialic acid. In CMAH knockout mice, exogenous feeding upregulated HIF-1α expression, which was primarily localized in hepatocytes. Additionally, it increased the expression of GLUT1, enhancing glucose utilization without clear tissue specificity. VEGF expression was also upregulated, especially in regions with increased angiogenesis and fibroblasts. However, as expected, CMAH protein expression was barely detectable in the liver tissues of CMAH−/− mice, confirming the success of the knockout model.
When the HFD was combined with high sialic acid, there was a significant increase in the detection of LIP and SNA, indicating a synergistic crosstalk between the hypoxia pathway and sialic acid synthesis pathway, rather than mere corroboration. Collectively, these findings demonstrate that both exogenous dietary sialic acid intake and altered endogenous synthesis can modulate cellular metabolic pathways, forming a feedback loop that regulates Neu5Gc accumulation (Fig. 7F).
Discussion
In this study, a comprehensive platform for detecting blood Neu5Gc levels was established for diagnosing HCC, and the possible mechanisms underlying elevated Neu5Gc levels were evaluated. We also explored the changes in glycan-related genes during Neu5Gc anabolism and glycosylation, shedding light on the endogenous Neu5Gc synthesis pathway. Prior studies indicate that exogenous Neu5Gc, primarily from dietary red meat and dairy, is incorporated into cell-surface glycans and preferentially accumulates in cancer cells.63 Notably, the tumor-specific antigen GM3 (Neu5Gc) is present in various human cancers.64 Also, Neu5Gc promotes CRC cell proliferation through Wnt/β-catenin activation.65 This validates its oncogenic significance and potential as a pan-tumor biomarker for HCC and other malignancies. Additionally, we observed that Sprague–Dawley rats possess the CMAH gene with slight variations compared to humans. However, analysis of the expression of the human CMAH gene revealed its research value, warranting further investigation into its in vivo regulation. Humans carry a non-functional CMAH pseudogene due to an ancient Alu-mediated deletion,9–10 and human Neu5Gc is primarily derived from exogenous dietary sources such as red meat,66,67 providing a critical evolutionary context for our mechanistic exploration.
To simulate the human environment, we constructed a CMAH knockout mouse using the CRISPR-Cas9 platform. This allowed us to explore the compensatory mechanism of exogenous intake following the inactivation of the endogenous synthesis pathway. CMAH-deficient mouse models have been used to study Duchenne muscular dystrophy,68,69 metabolic disorders,70 atherosclerosis,11 and cancers71; our application to Neu5Gc metabolism and tumor immunology is a key innovation. We then employed isotope-labeled glycosaminoglycan substrates and tracked isotope incorporation at different time points to construct a non-targeted isotope-labeled metabolic pathway. This approach revealed potential metabolic routes for sialic acid substrates in cells, offering insights into the possible mechanisms of abnormally high Neu5Gc levels in HCC. Isotope tracing has been widely applied to map sialic acid metabolic fluxes in cancer cells, enabling unbiased identification of key pathway intermediates.72–74 Key findings in metabolomics were validated by feeding hepatic cancer tissue a high-sialic acid diet. We conducted experiments using gene knockout mice and added exogenous glycine to HCC cell lines to explore the mechanisms responsible for the high Neu5Gc levels on the surface of tumor cells. This revealed endogenous self-synthesis via the HBP hexosamine pathway. In this pathway, the intermediate metabolite UDP-GlcNAc is isomerized by epimerase, leading to sialic acid production. As a central hub linking glucose metabolism to protein glycosylation, the HBP modulates glycosylation via its rate-limiting enzymes and metabolites, facilitating HCC proliferation and metastasis.75–77 Notably, there may be CMAH proteins or CMAH protein subtypes in tumor cells that perform certain hydroxylation functions.78 Recent studies suggest that CMAH pseudogene-derived transcripts or truncated proteins may retain partial biological activity in certain cancers, possibly including HCC,79,80 which aligns with our observation of potential residual CMAH function.
However, no additional information is currently available regarding this pathway, necessitating further research. Generally, there are two main reasons for the high surface levels of Neu5Gc in tumor cells: endogenous synthesis and exogenous intake. The endogenous synthesis pathway includes reactivation of the CMAH-dependent pathway and compensation of the non-CMAH-dependent pathway. Nevertheless, exogenous intake is still dominant, which may depend on many factors, such as excessive intake of red meat caused by regional and individual differences in eating habits. Exogenous Neu5Gc enters cells via pinocytosis, enhanced in tumors by macropinocytosis and altered transporters.71 Hypoxia caused by the rapid proliferation of tumor cells significantly increases intracellular glucose metabolism. Because of the high metabolic characteristics of tumor cells, the uptake ability of exogenous glycosylated glycan substrates is stronger than that of normal cells, and Neu5Gc is highly expressed in different patients. Hypoxia further promotes Neu5Gc accumulation by upregulating sialyltransferases and enhancing exogenous glycan uptake.81,82
According to the theory of “aerobic glycolysis” in tumor cell metabolism, reprogramming of glucose metabolism is driven by continuous activation of oncogenes or inactivation of tumor suppressor genes.83 The hypoxic microenvironment plays a crucial role in cancer development and therapeutic response, leading to continuous interactions between cancer cells and stromal cells. The Warburg effect and tumor hypoxia collectively drive metabolic reprogramming that favors sialic acid synthesis and exogenous Neu5Gc uptake.84 In this study, we observed an increase in Neu5Gc accumulation on the tumor cell surface under hypoxic conditions, shedding light on the degree of glycosylation and its potential mechanisms. Our findings extend previous reports by linking hypoxia-induced metabolic reprogramming to aberrant Neu5Gc glycosylation in HCC, highlighting a novel regulatory axis.
However, as a pioneering effort, our study has certain limitations in terms of the experimental design and detection technology. In this study, glycan analysis was performed on a single protein in the serum. Although IgG is a highly abundant protein, it cannot fully represent the glycosylation profile of the total serum proteins. Additionally, the follow-up cohort in this study was a dynamic cohort with a duration of more than 10 years. However, this was presented in a cross-sectional manner in this study, failing to reflect the dynamic advantages of the cohort. Further stratified analysis was not performed to establish a clinical prediction model. Additionally, the mechanistic research was not sufficiently deep, and the integration of mouse and cell experiments was insufficient. The evidence presented here lacks direct intuitiveness, emphasizing the need for further validation of this pathway. To address this, we aimed to establish a metabolic flow detection platform focusing on sialic acids Neu5Ac and Neu5Gc to enhance the clarity of our findings. Future studies will integrate multi-omics data and longitudinal cohort analysis to develop a clinically applicable Neu5Gc-based diagnostic model for HCC, addressing current limitations and translating our findings into clinical practice.