HCMMD: systematic evaluation of metabolites in body fluids as liquid biopsy biomarker for human cancers

Metabolomics is a rapidly expanding field in systems biology used to measure alterations of metabolites and identify metabolic biomarkers in response to disease processes. The discovery of metabolic biomarkers can improve early diagnosis, prognostic prediction, and therapeutic intervention for cancers. However, there are currently no databases that provide a comprehensive evaluation of the relationship between metabolites and cancer processes. In this review, we summarize reported metabolites in body fluids across pan-cancers and characterize their clinical applications in liquid biopsy. We conducted a search for metabolic biomarkers using the keywords (“metabolomics” OR “metabolite”) AND “cancer” in PubMed. Of the 22,254 articles retrieved, 792 were deemed potentially relevant for further review. Ultimately, we included data from 573,300 samples and 17,083 metabolic biomarkers. We collected information on cancer types, sample size, the human metabolome database (HMDB) ID, metabolic pathway, area under the curve (AUC), sensitivity and specificity of metabolites, sample source, detection method, and clinical features were collected. Finally, we developed a user-friendly online database, the Human Cancer Metabolic Markers Database (HCMMD), which allows users to query, browse, and download metabolite information. In conclusion, HCMMD provides an important resource to assist researchers in reviewing metabolic biomarkers for diagnosis and progression of cancers.


INTRODUCTION
Cancer is a major global public health problem, with a significant impact on mortality worldwide.In 2020, an estimated 10 million people died of cancer globally [1].Early diagnosis is crucial for effective cancer management and better prognosis.However, approximately 50% of cancers are diagnosed at advanced stages [2,3].Effective treatment of advanced cancer often involves the use of modern systemic and targeted drugs, which can be costly and may have limited efficacy [4].Early cancer detection has been AGING shown to provide substantial health benefits, including increased survival rates and reduced morbidity [2].Although several blood-based biomarkers, such as carcinoembryonic antigen (CEA) and prostate-specific antigen (PSA), have been used for cancer screening in the past few decades, their sensitivity and specificity have been found to be unsatisfactory, limiting their effectiveness [5].Therefore, there is a pressing need to identify biomarkers that exhibit high sensitivity and specificity for the early detection of cancer.
Metabolomics, which involves the comprehensive analysis of small molecule metabolites in cells, tissues, or whole organisms, has undergone rapid technological evolution in the past two decades [6][7][8].By measuring downstream chemical phenotypes of genomic, transcriptomic, and proteomic variability, metabolomics can provide a more comprehensive understanding of the biological system [6,9,10].Research has shown that metabolites play a crucial role in various diseases such as obesity, diabetes, cardiovascular disease, respiratory conditions, and cancer [6,11].Metabolomics has emerged as an accurate and non-invasive diagnostic tool, accompanied by the development of novel and sensitive measurement techniques [12,13].The uncontrolled proliferation of tumor cells requires metabolic regulation [14][15][16], and metabolic reprogramming is a hallmark of malignancy [17].In recent years, several highly sensitive and specific metabolic biomarkers have been identified in liquid biopsy studies.For instance, Sreekumar et al. reported that sarcosine had a diagnostic value with an AUC of 0.69 (95% CI: 0.55, 0.84) for prostate cancer [18].Soga et al. discovered that serum γ-glutamyl dipeptides had an AUC of 0.76 for hepatocellular carcinoma [19].Tyrosine and glutamine-leucine in serum had an AUC of 0.98 for the diagnosis of colorectal cancer [20].N 1 , N 12 -diacetylspermine in serum had an AUC of 0.65 (95% CI, 0.59 to 0.72) for the diagnosis of non-small-cell lung cancer [21].The AUC value of creatine nucleoside in urine was 0.79 for differential diagnosis between adrenocortical carcinoma and benign adrenal tumors [22].
This study aimed to comprehensively evaluate the role of metabolites in cancers.Cancer-related metabolites were searched from the PubMed database.The collected information includes cancer types, sample size, HMDB ID, metabolic pathway, area under the curve (AUC), sensitivity, specificity, sample source, detection method, and clinical features.Importantly, a user-friendly online database was developed, named Human Cancer Metabolic Markers Database (HCMMD), to assist users in querying, browsing, and downloading information about the cancer-related metabolites.

Advances in metabolomics
Metabolomic analysis is a technique used to analyze the type and content of small molecule metabolites in biological samples [23].Four major technologies are commonly used for metabolomics: gas chromatography mass spectrometry (GC-MS) [24], liquid chromatography mass spectrometry (LC-MS) [25], capillary electrophoresis mass spectrometry (CE-MS) [26], and nuclear magnetic resonance spectroscopy (NMR) [27].These techniques can assess changes in metabolic processes and provide a summary of alterations at the DNA, RNA, and protein levels [28].Metabolomics has been used to reveal the mechanisms of basal metabolic processes in diseases [29,30], and in some cases, it may be the most sensitive method for identifying the pathological state of cancer patients, as even small changes in gene or protein expression can lead to remarkable changes in protein activity and metabolite levels [31][32][33].Metabolomics can provide an effective method for screening for cancer, guiding treatment strategies, assessing efficacy, and tracking cancer progression [34][35][36][37].Additionally, it can help to identify therapeutic targets and promote drug discovery [38,39].

Database establishment
To search for cancer-related metabolites from PubMed, we used the keywords "metabolomics" OR "metabolite" AND "cancer".The eligible data included 22,254 articles published before July 30, 2022.We selected literature that focused on human tumors, samples from liquid biopsies, and individual biomarkers with diagnostic, prognostic, and predictive value.After filtering the results, we identified 792 studies with a total of 573,300 samples and 17,083 metabolic biomarkers.The included studies covered 24 types of cancer derived from 92 subtypes.We recorded information such as cancer type, sample size, HMDB ID, metabolic pathway, AUC, sensitivity, specificity, sample source, detection method, and clinical features.Figure 1 summarizes the basic information and diagnostic value of metabolic biomarkers in different cancers.Importantly, we created a new online database, called the HCMMD, which allows users to explore and analyze cancer-related metabolic biomarkers (Figure 2).

Digestive system tumors
Digestive system cancer encompasses oral cancer, esophageal cancer, gastric cancer, colorectal cancer, liver cancer, pancreatic cancer, and gallbladder cancer.AGING The five-year survival rates for liver cancer and pancreatic cancer are only 20% and 11%, respectively [40].A total of 315 articles were collected, covering 7 cancer types and 23 subtypes, with sample sizes ranging from 9 to 3,109 [20,41].The reports included 6,731 diagnostic biomarkers, 64 progressive biomarkers, and 163 prognostic biomarkers.Of these biomarkers, 3,609 were from serum samples, 1,619 from plasma samples, 974 from urine samples, and 460 from saliva samples.Among the biomarkers, 1,763 belonged to amino acids and 246 belonged to bile acids.Several studies have shown that bile acids are closely related to digestive system tumors [23,42,43].Of the biomarkers, 79 had an AUC ≥0.95, and 420 had an AUC ≥0.80.
To explore the underlying pathogenesis of digestive system cancer, metabolite pathway enrichment analysis was conducted (Supplementary Figure 1).As shown in Figure 3, the four most closely related pathways for digestive system cancer were glycine serine and threonine metabolism, arginine biosynthesis, alanine aspartate and glutamate metabolism, and valine leucine and isoleucine biosynthesis.

Reproductive system tumors
Reproductive system cancer comprises breast cancer, prostate cancer, endometrial cancer, cervical cancer, and ovarian cancer.Among all cancers, new cases of reproductive system cancer rank first [40].In this review, we collected 238 articles covering 5 cancer types and 14 subtypes.The minimum sample size was 12 [44], and the maximum sample size was 6,114 [45].These studies included 3,701 diagnostic biomarkers, 73 progression biomarkers, and 429 prognostic biomarkers.Of these metabolic markers, 1,462 biomarkers were from serum samples, 1,715 were from plasma samples, and 750 were from urine samples.Among the metabolic markers, 1,044 belonged to amino acids.A total of 213 diagnostic markers had an AUC ≥0.80.
Metabolite pathway enrichment analysis of diagnostic metabolites was performed for reproductive system cancers (Supplementary Figure 2).The four most closely related pathways for reproductive system cancer were glycine serine and threonine metabolism, alanine aspartate and glutamate metabolism, arginine and proline metabolism, and valine leucine and isoleucine biosynthesis.Figure 4 shows the three most significant pathways involved in reproductive system cancer.

Respiratory system tumors
Respiratory system cancer includes lung cancer and throat cancer.Lung cancer is the leading cause of cancer-related morbidity and mortality worldwide, with a five-year relative survival rate of only 22% [1,40].In this review, we collected 102 publications on respiratory system tumors, including 2 cancer types and 8 cancer subtypes.The sample size ranged from 14 to 1,196 [46,47].These publications included 1,976 diagnostic biomarkers and 204 prognostic biomarkers.Among these biomarkers, 1,335 biomarkers came from serum samples, and 467 biomarkers came from plasma samples.In addition, metabolites from breath samples were used to diagnose lung cancer in seven studies [46,[48][49][50][51][52][53].These studies identified 162 diagnostic markers with an AUC greater than or equal to 0.80.We also enriched the metabolic pathways of diagnostic metabolites in respiratory system cancers (Supplementary Figure 3).The main relevant pathways were aminoacyl-tRNA biosynthesis, arginine biosynthesis, glycine serine and threonine metabolism, and glyoxylate and dicarboxylate metabolism.

Nervous system tumors
Nervous system cancer, including brain cancer and other types of nervous system cancer, is one of the deadliest cancers, with 18,600 people in the United States dying from the disease in 2021 [40,56].Six cancer subtypes were reviewed in 16 articles, with sample sizes ranging from 17 to 220 [57,58].The studies included 223 diagnostic biomarkers, 31 progression biomarkers, and 13 prognostic biomarkers.Biomarkers were derived from serum samples (92 biomarkers), plasma samples (118 biomarkers), and cerebrospinal fluid samples (38 biomarkers).Of these studies, 24 diagnostic markers had an AUC of 0.80 or higher.A metabolic pathway enrichment analysis of diagnostic metabolites in nervous system cancers was conducted, and the four most relevant pathways were aminoacyl-tRNA biosynthesis, arginine biosynthesis, alanine aspartate and glutamate metabolism, and glyoxylate and dicarboxylate metabolism (Supplementary Figure 5).

Other tumors
In addition to the aforementioned common cancers, abnormal metabolites have also been observed in other types of cancer.We collected 65 articles for the diagnosis of other cancers, including six cancer types (thyroid cancer, myeloma, leukemia, lymphoma, skin cancer, and osteosarcoma) and 24 cancer subtypes.The sample sizes ranged from 10 to 846 [59,60].Those studies included 1,393 diagnostic biomarkers, 27 progression biomarkers, and 97 prognostic biomarkers.Biomarkers were mainly derived from three types of samples: 725 biomarkers in serum samples, 586 biomarkers in plasma samples, and 105 biomarkers in urine samples.Those publications contained a total of 66 diagnostic markers with an AUC of 0.80 or higher.

DISCUSSION
Cancer is one of the major threats to human health because of its high morbidity and mortality rates [61].Highly specific and sensitive diagnostic or prognostic biomarkers can improve the efficiency of treatment and prolong the survival of patients [2].Metabolomics has many exciting opportunities to promote the treatment of cancer [62].For example, metabolomics combined with other "omics" can uncover valuable drug targets [63][64][65].Metabolomics also has the potential to influence cancer screening and diagnosis.Since many studies have identified biomarkers in body fluids with high diagnostic value for human cancers [66,67].Zhou et al. reported that 4-Dodecylbenzenesulfonic acid, PC (30:1) and PC (44:5) were downregulated in the serum of colorectal adenoma patients compared to healthy subjects, with an AUC of 1.00 [68].Plasma levels of beta-sitosterol were upregulated in pancreatic cancer patients compared to healthy individuals with an AUC value of 0.99 [69].Plasma of hexadecasphinganine with an AUC value of 0.99 in the diagnosis of gastric cancer [70].Serum levels of hypoxanthine were upregulated in patients with lung adenocarcinoma compared to normal controls with an AUC value of 0.99 [71].Jové et al. found that hexanoic acid in the plasma had an AUC value of 1.00 for breast cancer diagnosis [72].Metabolite pathway enrichment analysis is a good method to discover potential pathogenesis of different systemic cancers.Glycine serine and threonine metabolism and arginine biosynthesis were enriched in each system of cancer.These two metabolic pathways may provide inspiration for future cancer research.
Although many liquid biopsy biomarkers with high diagnostic value in human cancers have been reported, there are still difficulties and challenges in the clinical application of these metabolites.First, there is a lack of multi-center, large-scale studies to validate the clinical feasibility and reproducibility of metabolic markers [73].Second, in order to incorporate biomarker assays into the clinical workflow, supporting assay resources, staff logistics, and technical education are needed, which can be costly in the clinic [73,74].Third, there are huge fluctuations in the concentration of metabolites in vivo, as well as a fragmented distribution of specialized small molecules in the body [75].In addition, metabolomics is diverse and chemically complex, and varies in different tumor lesions.For example, L-alanine is significantly downregulated in pancreatic cancer but significantly upregulated in colorectal cancer, which adds great difficulty for tumor screening [7,47,76].This article has some innovations that need to be clarified.First, previous studies mainly focused on biopsy markers of a single tumor and lacked a summary of diagnostic data on multiple tumor biopsies.This article summarizes diagnostic markers covering 24 tumor types.Second, our database contains more detailed information, such as AUC, accuracy, specificity, HMDB ID, metabolic pathway, sample source and so on.In addition, pathway analysis demonstrated that glycine, serine and threonine metabolism and arginine biosynthesis metabolic pathways were enriched in multiple cancer systems, suggesting that these two metabolic pathways play an important role in cancer diagnosis and treatment.

CONCLUSION
With the development of standardized protocols, the measurement of metabolomics has become cheaper and more convenient.Metabolomics plays an increasingly important role in cancers, alongside other diagnostic and prognostic tests in the clinic.To provide an important resource for users to query, browse, and download information on cancer-related metabolites, we have established a user-friendly website.

Figure 1 .
Figure 1.The basic information of metabolic biomarkers in different system cancers.

Figure 2 .
Figure 2. The web interface of the HCMMD database.

Figure 4 .
Figure 4. Three greatly important metabolic pathways in digestive system tumors.The metabolic process of metabolites and the