Background: The treatment of triple-negative breast cancer (TNBC) is one of the main focuses and key difficulties because of its heterogeneity, and the source of this heterogeneity is unclear.

Methods: Single-cell RNA (scRNA) and transcriptomics data of TNBC and normal breast samples were retrieved from Gene Expression Omnibus (GEO) database and TCGA-BRCA database. These cells were clustered using the t-SNE and UMAP method, and the marker genes for each cluster were found. We annotated the clusters using the published literature, CellMarker database and “SingleR” R package.

Results: A total of 1535 cells and 21785 genes from 6 TNBC patients and 2068 cells and 15868 genes from 3 normal breast tissues were used for downstream analyses. The scRNA data were divided into 14 clusters labeled into 8 cell types, including epithelial cells, immunocytes, CAFs/fibroblasts and etc. In the TNBC samples, CAFs were divided into three clusters and labelled as prCAFs, myCAFs and emCAFs, and the marker genes were DCN, FAP and RGS5, respectively. The prCAF subgroup is functionally characterized by promoting proliferation and multi drug resistance; myCAF subgroup is involved in constituting the extracellular matrix and collagen production, matrix composition and collagen production, and the emCAF functionally characterized by energy metabolism.

Conclusions: TNBC has inter- and intra-tumor heterogeneity, and CAF is one of the sources of this heterogeneity. CD74, SASH3, CD2, TAGAP and CCR7 served as significant marker genes with prognostic and therapeutic value.