Introduction
Wilson disease (WD) is an autosomal recessive disorder of copper metabolism caused by pathogenic variants in ATP7B (ATPase copper transporting beta).1 Owing to its marked clinical and biochemical heterogeneity, timely and accurate diagnosis requires integrated interpretation of clinical, biochemical, ophthalmologic, and genetic findings, most commonly within the Leipzig framework.1–5 Serum ceruloplasmin (Cp) and 24-hour urinary copper excretion (24-h UCE) remain central components of routine diagnostic assessment.1,3,6–8
In contemporary referral practice, however, 24-h UCE has a dual role: it is both a diagnostic biomarker for WD and a trigger for referral to specialized centers. When the test under evaluation also functions as a referral trigger, its apparent diagnostic performance becomes susceptible to referral bias and spectrum effects.9–11 Moreover, hypercupriuria is not specific to WD and may also occur in non-WD liver diseases, particularly in cholestatic disorders, severe acute liver injury, and states of hepatic synthetic dysfunction.12–14
We therefore hypothesized that the performance of 24-h UCE would be materially altered in hypercupriuric referrals, whereas Cp might retain stronger diagnostic value. To test this hypothesis, we compared Cp and 24-h UCE in a consecutive referral cohort and evaluated a Leipzig-aligned Cp three-zone framework for clinically pragmatic, probability-based triage.
Methods
Study design and patients
This was a retrospective, single-center observational study conducted at Nanjing Second Hospital. Consecutive patients undergoing evaluation for suspected WD between February 2017 and February 2025 were screened. Eligible patients had persistent hypercupriuria, defined as at least two consecutive 24-h UCE measurements above the upper limit of normal (ULN, 60 µg/24 h), obtained before final diagnostic adjudication and before initiation of any copper-directed therapy. The two qualifying measurements were required to be obtained during the same diagnostic episode, usually within the same hospitalization or outpatient diagnostic evaluation period and generally within 14 days. For diagnostic performance analyses, the first qualifying 24-h UCE measurement was used as the index urinary copper excretion (UCE) value, whereas the second consecutive elevated measurement was used to confirm persistent hypercupriuria for eligibility. Patients with discordant repeated 24-h UCE results were not considered to have persistent hypercupriuria unless an additional 24-h UCE measurement confirmed elevation above the ULN. Acute or subacute liver injury alone was not an exclusion criterion.
Prespecified exclusions included missing key diagnostic data, prior copper-directed therapy before the index 24-h UCE measurement, failure to confirm persistent hypercupriuria, and insufficient information for final diagnostic adjudication.
Diagnostic criteria and clinical grouping
Final WD status was adjudicated using a prespecified Leipzig-based diagnostic workflow before biomarker performance analyses.1,3,5,15 Patients with a Leipzig score ≥4 were classified as having WD. Patients with a Leipzig score of 3 or persistent diagnostic uncertainty underwent genetic testing whenever feasible.
Genetic evaluation included targeted ATP7B sequencing, ATP7B multiplex ligation-dependent probe amplification for exon-level deletions or duplications, or whole-exome sequencing when ATP7B variant information was available.16–18 Patients with biallelic pathogenic or likely pathogenic ATP7B variants were classified as genetically confirmed WD.1,3,5 In patients without biallelic ATP7B confirmation—including those without ATP7B testing, those with only monoallelic pathogenic or likely pathogenic ATP7B variants, and those without detected pathogenic ATP7B variants—WD classification was not based on clinical impression alone. Classification required fulfillment of the Leipzig diagnostic threshold using available non-genetic criteria, an integrated clinical assessment compatible with WD, and absence of a more plausible alternative diagnosis explaining the copper abnormalities. Missing or unavailable diagnostic items were conservatively assigned 0 points.
Patients were classified as non-WD when comprehensive evaluation did not provide sufficient Leipzig-based evidence for WD and an alternative liver disease diagnosis was established. In genetically tested patients, absence of biallelic pathogenic or likely pathogenic ATP7B variants was considered supportive evidence against WD but was not independently exclusionary.
Prespecified threshold framework and subgroup definitions
Cp was evaluated using a prespecified Leipzig-aligned three-zone framework: <0.10 g/L as the high-probability threshold, 0.10–0.20 g/L as the indeterminate zone, and >0.20 g/L as the low-probability threshold supporting de-prioritization of WD rather than definitive exclusion. Patients in the indeterminate zone underwent integrated Leipzig-based assessment rather than classification by Cp alone. This assessment included reassessment of all available Leipzig components, ophthalmologic evaluation for Kayser–Fleischer rings, neurologic assessment and brain MRI when clinically indicated, repeat biochemical testing when appropriate, evaluation for alternative liver diseases, and ATP7B testing when the Leipzig score remained equivocal, diagnostic information was incomplete, clinical suspicion persisted, or no alternative liver disease adequately explained the copper abnormalities.1,3,5,19
To explore referral-spectrum effects, non-WD controls were stratified into predefined severity subgroups: severe/coagulopathic non-WD (international normalized ratio (INR) ≥ 1.5),20,21 jaundiced non-WD (INR < 1.5 and total bilirubin ≥ 5 mg/dL), and mild non-WD (INR < 1.5 and total bilirubin < 5 mg/dL). Additional analyses were performed in referral subsets with higher urinary copper burden, defined as 24-h UCE ≥ 100 µg/24 h and ≥200 µg/24 h.1,19
Data collection and laboratory measurements
Baseline demographic, biochemical, and hematologic data were retrospectively extracted from hospital and laboratory information systems. All analyzed measurements were obtained before copper-directed therapy. For 24-h UCE, the first qualifying measurement obtained during the diagnostic episode was used as the index value for diagnostic performance analyses. For other laboratory parameters, when multiple pretreatment results were available, the value closest to the index UCE measurement or baseline diagnostic assessment was used. 24-h UCE was measured by inductively coupled plasma mass spectrometry, and serum Cp was measured by immunoturbidimetry.
Statistical analysis
Continuous variables are presented as median (IQR) or mean ± standard deviation, as appropriate, and categorical variables as counts (percentages). Between-group comparisons were performed using the Mann–Whitney U test or Student's t-test for continuous variables, as appropriate, and the chi-square test or Fisher's exact test for categorical variables. Diagnostic performance was assessed using receiver operating characteristic analysis with areas under the receiver operating characteristic curve (AUROCs) and 95% confidence intervals (CIs). AUROC 95% CIs were estimated using DeLong's method. Exploratory single-cutoff analyses used the Youden index. Threshold-based diagnostic indices included sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (PLR), negative likelihood ratio (NLR), and diagnostic accuracy. CIs for sensitivity, specificity, PPV, NPV, and diagnostic accuracy were calculated using the exact binomial method. CIs for PLR and NLR were calculated using the log method.
Among non-WD controls, determinants of 24-h UCE were examined using multivariable linear regression with log10-transformed 24-h UCE and heteroscedasticity-consistent robust standard errors using the HC3 estimator. To account for potential confounding by liver injury severity, multivariable-adjusted ROC analyses were performed for Cp and 24-h UCE. Logistic regression models were constructed with final WD status as the dependent variable. Cp or 24-h UCE was entered as the primary diagnostic marker, together with age, total bilirubin, INR, and albumin as covariates. Adjusted AUROCs with 95% CIs were generated from the predicted probabilities of each model. In an extended liver-injury–adjusted model, age, alanine aminotransferase (ALT), aspartate aminotransferase (AST), alkaline phosphatase (ALP), gamma-glutamyl transferase (GGT), total bilirubin, INR, and albumin were included as covariates.
As sensitivity analyses, diagnostic performance was re-evaluated after restricting the WD group to patients with biallelic pathogenic/likely pathogenic ATP7B variants and to those with at least one pathogenic/likely pathogenic ATP7B variant. These analyses were intended to examine whether the main findings were robust when WD classification was less dependent on Cp-containing Leipzig scoring. All analyses were performed using R version 4.3.0 and GraphPad Prism version 10.0.1. A two-sided P-value < 0.05 was considered statistically significant. This study is reported in accordance with the STROBE statement for observational studies.
Results
Study population
A total of 686 patients with hypercupriuria were screened between February 2017 and February 2025. After prespecified exclusions, 541 patients met the eligibility criteria and were included in the primary analytic cohort, comprising 65 untreated patients with WD and 476 adjudicated non-WD liver disease controls. Details of patient selection and final diagnostic classification are shown in Figure 1.
Among the 65 patients finally classified as WD, 9 did not undergo ATP7B testing because testing was declined. All 9 patients fulfilled the Leipzig diagnostic threshold based on integrated non-genetic evidence, with final Leipzig scores ranging from 5 to 7 and a median score of 6. The individual non-genetic diagnostic components supporting WD classification in these patients are shown in Supplementary Table 1.
The non-WD control group represented a broad tertiary referral liver disease spectrum, with drug-induced or toxic liver injury, autoimmune or cholestatic liver disease, viral hepatitis–related liver disease, biliary obstruction or cholangitis, and alcohol-related or metabolic fatty liver disease being the main diagnostic categories (Supplementary Table 2).
Baseline characteristics
Baseline characteristics are shown in Table 1. Patients with WD were younger than non-WD controls [28 (14–46) vs. 52 (40–60.25) years, P < 0.001], whereas sex distribution was similar [male sex: 49.23% vs. 48.95%, P = 0.966]. WD patients had higher 24-h UCE [201.1 (109.2–415.4) vs. 94.05 (72.3–148.18) µg/24 h, P < 0.001] and markedly lower Cp [0.08 (0.05–0.10) vs. 0.27 (0.22–0.35) g/L, P < 0.001]. In contrast, non-WD controls had higher total bilirubin, ALT, AST, ALP, and GGT, indicating more pronounced cholestatic and hepatocellular injury at baseline.
Table 1Baseline characteristics of the hypercupriuric study cohort (24-h UCE ≥ 60 µg/24 h)
| Characteristic | Non-WD (n = 476) | WD (n = 65) | P-value |
|---|
| Demographics | | | |
| Age at diagnosis, years | 52 (40, 60.25) | 28 (14, 46) | <0.001 |
| Male sex, n (%) | 233 (48.95) | 32 (49.23) | 0.966 |
| Copper indices | | | |
| 24-h UCE, µg/24 h | 94.05 (72.3, 148.18) | 201.1 (109.2, 415.4) | <0.001 |
| Cp, g/L | 0.27 (0.22, 0.35) | 0.08 (0.05, 0.1) | <0.001 |
| Liver biochemistry and synthetic function | | | |
| Total bilirubin, mg/dL | 4.52 (1.37, 12.84) | 1 (0.65, 1.82) | <0.001 |
| ALT, U/L | 140.2 (51.33, 452.2) | 50.8 (26.7, 123.9) | <0.001 |
| AST, U/L | 115.8 (52.75, 329.15) | 63.5 (32.7, 91.5) | <0.001 |
| ALP, U/L | 150.1 (106, 237.5) | 123.4 (90, 199) | 0.020 |
| GGT, U/L | 153.2 (78.95, 300) | 102 (53, 199) | 0.001 |
| Total protein, g/L | 62.85 (57.5, 69.3) | 66 (61.8, 71.1) | 0.014 |
| Albumin, g/L | 35.95 (31.5, 39.6) | 40 (30.4, 44.1) | 0.049 |
| Platelet count, ×109/L | 150 (103, 206) | 122 (71, 228) | 0.049 |
| Prothrombin time, s | 12.6 (11.38, 14.61) | 13.9 (11.98, 16) | 0.014 |
| INR | 1.12 (1.01, 1.31) | 1.25 (1.05, 1.46) | 0.002 |
Diagnostic performance of Cp and 24-h UCE before and after covariate adjustment
In the primary hypercupriuric cohort, Cp showed a higher AUROC than 24-h UCE for diagnosing WD (0.988 vs. 0.762; Table 2 and Fig. 2). After adjustment for age, total bilirubin, INR, and albumin, the AUROCs were 0.989 for Cp and 0.930 for 24-h UCE. In the extended liver-injury–adjusted model including age, ALT, AST, ALP, GGT, total bilirubin, INR, and albumin, the AUROCs were 0.992 for Cp and 0.950 for 24-h UCE (Supplementary Table 3).
Table 2Diagnostic performance of Cp and 24-h UCE in hypercupriuric referrals for suspected WD
| Marker | N, WD/non-WD | AUROC (95% CI) | Optimal cutoff | Sensitivity, % (95% CI) | Specificity, % (95% CI) | PPV, % (95% CI) | NPV, % (95% CI) | PLR (95% CI) | NLR (95% CI) | Diagnostic accuracy, % (95% CI) |
|---|
| 24-h UCE, µg/24 h | 65/ 476 | 0.762 (0.693–0.828) | 149.70 | 63.1 (50.2–74.7) | 75.6 (71.5–79.4) | 26.1 (19.4–33.7) | 93.8 (90.8–96.0) | 2.59 (2.03–3.30) | 0.49 (0.35–0.67) | 74.1 (70.2–77.8) |
| Cp, g/L | 65/ 476 | 0.988 (0.976–0.997) | 0.15 | 90.8 (81.0–96.5) | 97.7 (95.9–98.8) | 84.3 (73.6–91.9) | 98.7 (97.2–99.5) | 39.28 (21.79–70.80) | 0.09 (0.04–0.20) | 96.9 (95.0–98.2) |
The Youden-derived optimal cutoff for Cp was 0.15 g/L, which yielded a sensitivity of 90.8% and a specificity of 97.7%. By comparison, the optimal cutoff for 24-h UCE was 149.7 µg/24 h, corresponding to a sensitivity of 63.1% and a specificity of 75.6% (Table 2).
In sensitivity analyses according to ATP7B variant status, Cp consistently outperformed 24-h UCE among WD patients with biallelic pathogenic/likely pathogenic ATP7B variants and among those with at least one pathogenic/likely pathogenic ATP7B variant (AUROCs: 0.992 vs. 0.696 and 0.987 vs. 0.746, respectively; Supplementary Table 4).
Cp also outperformed 24-h UCE in referral subsets with higher urinary copper burden, including 24-h UCE ≥ 100 µg/24 h and ≥ 200 µg/24 h (AUROCs: 0.987 vs. 0.710 and 0.978 vs. 0.763, respectively; Supplementary Table 5).
Performance of the prespecified Cp three-zone framework
We evaluated a prespecified Leipzig-aligned Cp three-zone framework using thresholds of <0.10 g/L, 0.10–0.20 g/L, and >0.20 g/L (Fig. 3). In the overall cohort, Cp < 0.10 g/L identified 48 of 65 WD patients and only 1 of 476 non-WD controls, corresponding to a WD prevalence of 98.0% (48/49) in this high-probability zone. Cp 0.10–0.20 g/L represented an indeterminate zone, in which 15 of 90 patients had WD, corresponding to a WD prevalence of 16.7%. Cp > 0.20 g/L represented a low-probability zone, including 2 of 65 WD patients and 400 of 476 non-WD controls, with a WD prevalence of 0.5% (2/402) (Table 3).
Table 3Distribution of WD and non-WD patients across the three prespecified Cp probability zones
| Cp zone | Clinical interpretation | WD, n/N (%) | Non-WD, n/N (%) | Total, n | WD prevalence in zone |
|---|
| <0.10 g/L | High-probability zone | 48/65 (73.8) | 1/476 (0.2) | 49 | 48/49 (98.0) |
| 0.10–0.20 g/L | Indeterminate zone | 15/65 (23.1) | 75/476 (15.8) | 90 | 15/90 (16.7) |
| >0.20 g/L | Low-probability zone | 2/65 (3.1) | 400/476 (84.0) | 402 | 2/402 (0.5) |
Supplementary binary threshold analyses showed that Cp < 0.10 g/L had high specificity and PPV, whereas Cp > 0.20 g/L had high NPV (Supplementary Table 6). Similar patterns were observed across non-WD severity strata, although WD prevalence among patients with Cp > 0.20 g/L was higher in the severe/coagulopathic subgroup.
Three discordant Cp cases were identified (Supplementary Table 7). Two WD patients had Cp > 0.20 g/L and therefore fell into the low-probability zone; one had biallelic pathogenic ATP7B variants, whereas the other had a monoallelic pathogenic ATP7B variant but fulfilled the Leipzig diagnostic threshold based on integrated non-genetic evidence. Conversely, one non-WD patient with severe HBV-related liver injury had Cp < 0.10 g/L and fell into the high-probability zone.
Determinants of hypercupriuria among non-WD controls
To characterize factors associated with urinary copper excretion outside WD, we examined determinants of 24-h UCE among non-WD controls. In multivariable linear regression adjusted for age and sex, higher 24-h UCE was associated with higher INR, higher total bilirubin, and lower albumin (Table 4). Per 0.5-unit increase in INR, 24-h UCE increased by 20.51% (95% CI, 10.34% to 31.61%; P < 0.001). Per doubling of total bilirubin, 24-h UCE increased by 5.91% (95% CI, 3.21% to 8.67%; P < 0.001). Per 5 g/L increase in albumin, 24-h UCE decreased by 6.17% (95% CI, −9.84% to −2.36%; P = 0.002).
Table 4Multivariable determinants of 24-h UCE among hypercupriuric non-WD controls
| Predictor | Scaling | % change in 24-h UCE (95% CI) | P-value |
|---|
| INR | per 0.5-unit increase | 20.51% (10.34 to 31.61) | <0.001 |
| Total bilirubin | per doubling (log2) | 5.91% (3.21 to 8.67) | <0.001 |
| Albumin | per 5 g/L increase | −6.17% (−9.84 to −2.36) | 0.002 |
Discussion
In this retrospective cohort of untreated patients referred for suspected WD with persistent hypercupriuria, Cp showed substantially better diagnostic performance than 24-h UCE. In routine referral practice, elevated urinary copper may prompt further evaluation for suspected WD; consequently, the present cohort was already enriched for hypercupriuria before the diagnostic performance of 24-h UCE was assessed. This referral-spectrum effect is consistent with the principle that diagnostic accuracy varies according to patient spectrum and referral pathway.9–11 In this setting, 24-h UCE showed only modest standalone discrimination, whereas Cp remained highly informative.
The attenuated discrimination of 24-h UCE was closely related to the distribution of urinary copper among non-WD controls. Although WD patients were more frequently represented in higher 24-h UCE strata, non-WD controls were also present across all elevated UCE categories (Supplementary Fig. 1). This distributional overlap also motivated analyses in referral subsets with higher urinary copper burden, including 24-h UCE ≥ 100 µg/24 h and ≥ 200 µg/24 h (Supplementary Table 5). In non-WD controls, higher 24-h UCE was independently associated with higher total bilirubin, prolonged INR, and lower albumin (Table 4). Because biliary excretion is the major route of copper elimination, cholestasis and advanced liver injury may impair hepatic copper transport and excretion, thereby contributing to increased urinary copper excretion.22–26 Thus, in a referral cohort already selected for hypercupriuria, urinary copper may reflect hepatic excretory and synthetic dysfunction as well as WD-specific copper dysregulation. This interpretation is consistent with the covariate-adjusted ROC analyses, in which adjustment for age, total bilirubin, INR, and albumin substantially increased the AUROC of 24-h UCE, whereas the AUROC of Cp changed little (Supplementary Table 3).
By contrast, Cp showed a more stable diagnostic profile. Despite the broad non-WD liver disease spectrum, Cp retained excellent discrimination. The Youden-derived cutoff of 0.15 g/L was close to values reported in previous studies.14,27,28 However, a single ROC-derived cutoff is not necessarily the most clinically useful way to interpret Cp. Cp is a continuous biomarker, and dichotomizing it into a single positive or negative result may overinterpret borderline values.29 Within the Leipzig framework, Cp < 0.10 g/L carries greater diagnostic weight, whereas Cp > 0.20 g/L generally makes WD less likely.1,3,5 Therefore, the ROC-derived cutoff is best viewed as a summary of overall discrimination rather than as the preferred clinical decision threshold.
The prespecified Cp three-zone framework may be more clinically interpretable than a single dichotomous cutoff. In our cohort, Cp < 0.10 g/L defined a high-probability zone in which WD should be prioritized; Cp > 0.20 g/L defined a low-probability zone rather than a definitive exclusion zone; and Cp 0.10–0.20 g/L remained an indeterminate zone requiring integrated assessment. Patients in the indeterminate zone should be assessed within the full Leipzig-based framework rather than classified by Cp alone, because WD diagnosis requires integrated interpretation of clinical, biochemical, ophthalmologic, histologic, imaging, and genetic evidence.1,3,5,15ATP7B testing is not mandatory for every patient in the Cp indeterminate zone, particularly when integrated Leipzig-based assessment is not suggestive of WD and an alternative liver disease diagnosis is established. However, ATP7B testing should be strongly considered when the Leipzig score remains equivocal, especially with an initial Leipzig score of 3, when diagnostic information is incomplete, clinical suspicion persists, or no alternative liver disease adequately explains the copper abnormalities. Thus, the main value of the three-zone framework is to guide diagnostic prioritization and targeted genetic testing, not to replace comprehensive adjudication.
The predictive values observed in this study should be interpreted in relation to disease prevalence. Because PPV and NPV are prevalence-dependent, the values reported here primarily reflect the probability of WD within a high-risk hypercupriuric referral population rather than in an unselected screening population.30–32 Sensitivity and specificity are less directly prevalence-dependent but may still vary across clinical spectra. In particular, the low-probability Cp threshold appeared more context-dependent in patients with severe hepatic dysfunction, whereas the high-probability threshold retained very high specificity even in this subgroup. Because Cp is synthesized in the liver and is also influenced by inflammatory and hepatic synthetic status, Cp levels may be altered in severe non-WD liver disease independently of WD.12,19,33 This distinction supports the use of Cp thresholds as probability-based triage tools rather than absolute diagnostic rules.
The discordant Cp cases further support this cautious interpretation. Two WD patients had Cp > 0.20 g/L and would have been placed in the low-probability zone if Cp had been interpreted in isolation. One had biallelic pathogenic ATP7B variants and was genetically confirmed, whereas the other had a monoallelic pathogenic ATP7B variant but fulfilled the Leipzig diagnostic threshold based on integrated non-genetic evidence. Inflammatory activation, acute hepatic injury, or decompensation may increase measured Cp because Cp is an acute-phase reactant, potentially masking WD-associated hypoceruloplasminemia.6,19,34,35 Immunoreactive Cp assays may also fail to fully reflect functional copper incorporation.6 Conversely, one non-WD patient with severe HBV-related liver injury had Cp < 0.10 g/L, and ATP7B testing detected no pathogenic or likely pathogenic variants, consistent with secondary hypoceruloplasminemia in severe hepatic synthetic dysfunction. These discordant cases reinforce that Cp thresholds should guide diagnostic prioritization, but discordant Cp results should prompt integrated reassessment rather than automatic confirmation or exclusion of WD.
Several limitations should be acknowledged. First, this was a retrospective, single-center study and therefore remains susceptible to selection bias, although consecutive inclusion may have partly mitigated this concern. Second, patients with acute or severe non-WD liver injury were retained if they met the requirement for repeated 24-h UCE elevation and underwent final diagnostic adjudication. This was consistent with the referral-spectrum focus of the study, because such patients may be evaluated for suspected WD precisely because hepatic injury can be accompanied by increased urinary copper excretion.3,34 Nevertheless, this design also increased clinical heterogeneity. Third, repeated 24-h UCE measurements were not obtained according to a prospectively fixed interval, even though repeated elevation during the same diagnostic episode was required before copper-directed therapy. Fourth, incorporation bias cannot be fully excluded because Cp and 24-h UCE are components of the Leipzig score. We addressed this by using an integrated diagnostic workflow and by performing sensitivity analyses restricted to WD patients with biallelic pathogenic/likely pathogenic ATP7B variants and to those with at least one pathogenic/likely pathogenic ATP7B variant. Fifth, not all patients in the indeterminate Cp zone underwent ATP7B testing, reflecting real-world practice. Sixth, the WD cohort was relatively small compared with the non-WD referral cohort, reflecting both the rarity of WD and the restriction to treatment-naïve patients undergoing initial diagnostic evaluation. Finally, the study population consisted exclusively of Chinese patients, and the proposed three-zone Cp strategy requires external validation in ethnically diverse multicenter cohorts.