The tsRNAs (tRFdb-3013a/b) serve as novel biomarkers for colon adenocarcinomas

The tsRNAs (tRNA-derived small RNAs) are a novel class of small non-coding RNAs derived from transfer-RNAs. Colon adenocarcinoma (COAD) is the most malignant intestinal tumor. This study focused on the identification and characterization of tsRNA biomarkers in colon adenocarcinomas. Data processing and bioinformatic analyses were performed with the packages of R and Python software. The cell proliferation, migration and invasion abilities were determined by CCK-8 and transwell assays. Luciferase reporter assay was used to test the binding of tsRNA with its target genes. With computational methods, we identified the tRNA fragments profiles within COAD datasets, and discriminated forty-two differentially expressed tsRNAs between paired colon adenocarcinomas and non-tumor controls. Among the fragments derived from the 3′ end of tRNA-His-GUG (a histidyl-transfer-RNA), tRFdb-3013a and tRFdb-3013b (tRFdb-3013a/b) were notably decreased in colon and rectum adenocarcinomas, especially, tRFdb-3013a/b might tend to be down-regulated in patients with lymphatic or vascular invasion present. The clinical survival of colorectal adenocarcinoma patients with low tRFdb-3013a/b expression was significantly worse than that of high expression patients. In colon adenocarcinoma cells, tRFdb-3013a could have inhibited cell proliferations, and reduced cell migration and invasion abilities. The enrichment analyses showed that most of tRFdb-3013a correlated-genes were enriched in the extracellular matrix associated GO terms, phagosome pathway, and a GSEA molecular signature pathway. Additionally, the 3′UTR of ST3GAL1 mRNA was predicted to contain the binding site of tRFdb-3013a/b, tRFdb-3013a/b might directly target and regulate ST3GAL1 expression in colon adenocarcinomas. These results suggested that tRFdb-3013a/b might serve as novel biomarkers for diagnosis and prognosis of colon adenocarcinomas, and act a key player in the progression of colon adenocarcinomas.

A conservative pipeline was implemented to get an accurate estimation of tsRNAs expression as demonstrated previously [2], the data processing and flow chart for tsRNAs identification are showed in Supplementary Figure 1A.The quality controls and data filtering of raw files on sncRNA-seq data were preprocessed via using Cutadapt and FastQC method.Then, clean reads of small RNA sequencing were re-mapped to the reference human genome (GRCh37 /hg19) and the sequences of tsRNAs annotation file via applying Bowtie software [6].After alignment, only the mapped reads could be quantified to count the number of reads belonging to each of the candidate tsRNAs with HTSeq software [7].Finally, the expression value of tsRNAs was calculated and normalized as transcripts per million reads (TPM) of total raw counts [8], and the average expression values less than one log2TPM were filtered to eliminate random degradation sequences.

The clinical specimens
Fifteen carcinoma tissues and five para-carcinoma tissues were obtained from patients diagnosed with colon adenocarcinomas undergoing surgical resections at the Department of Gastrointestinal Surgery, the First Affiliated Hospital of Chongqing Medical University from February 2022 to October 2023.After excisions, tissues were immediately frozen in liquid nitrogen for subsequent use.This study was approved by the Ethics Committees of Chongqing Medical University and the patients provided written informed consent.

Northern blotting
Total RNA was isolated from the cells using TRIzol reagent (Invitrogen, USA).Before loading, the RNA samples were denatured at 65° C for 5 min and chilled on ice immediately.The samples were separated on a 15% denaturing polyacrylamide gel and electrophoretically transferred to a charged nylon membrane (Labselect, China).Following cross-linking and pre-hybridization, the RNA was hybridized with the corresponding 3′-digoxigenin (DIG)-labeled DNA probe at 42° C overnight [17].After washing and blocking, the membrane was incubated with anti-DIG antibody solution (Servicebio, China).The chemiluminescence signal was captured and analyzed via a ChemiScopeS6 imaging system (Clinx, China).The DIG-labeled tRFdb-3013a/b probe (5'-TGGTGCCGTGACTCGGA-3′), the DIG-labeled tRNA-His-GUG probe (5'-CGGCCACAACGCAGAGTACT-3′) and the 5S rRNA probe (as an internal control) were synthesized by Sangon Biotech (Shanghai, China).

Limitations
In this study, tsRNAs refer to that fragments derived from the tRNAs based on the GtRNAdb, tRFdb and tRFexplorer databases by a conservative identification pipeline.With computational approaches, we determined the tsRNAs expression profile within the TCGA-COAD dataset.Particularly, our bioinformatic analysis has focused on the role of tRFdb-3013a and tRFdb-3013b, which are two of the tsRNA fragments derived from tRNA-His-GTG gene.Furthermore, there should be many ways, through diversification of methods, addressing the tRNA-derived fragments mining with a conservative pipeline.Recently, a tRNAderived fragments (tRFs) repository has been released, MINTbase v2.0, in which more than ten thousand tRFs were mined from the TCGA datasets with their own deterministic and exhaustive pipeline.There were some differences or limitations between some naming system of tsRNA, while compared to the profiles of tsRNAs that identified through tDRnamer and MINTbase method and our identification pipeline within COAD dataset.Take the example of tRFdb-3013a and tRFdb-3013b, two of the fragments derived from tRNA-His-GTG gene; as shown in the Supplementary Figure 8A, 8B, these are a total of forty-eight fragments derived from the 3′ end of mature tRNA-His-GTG in tDRnamer, among them, twenty fragment isoforms could be precisely aligned to the sequence of tRFdb-3013a, and twenty-eight fragment isoforms can be aligned to the sequence of tRFdb-3013b.Actually, two tRF isoforms (tDR-60:76-His-GTG-1-M2, whose MINTbase ID is tRF-17-8US5652; and tDR-55:76-His-GTG-1-M2, whose MINTbase ID is tRF-22-WB8US5652) were two of the most abundant fragments that identified from the TCGA datasets, and may be seen as a typical representative of tRFdb-3013a and tRFdb-3013b.Moreover, the primers for tRFdb-3013a/b detection with stem-loop qRT-PCR assay, were designed to match the sequence (TCCGAGTCACGGCA or TCGAATCCGAGTCACGGCA), hence the expression levels should refer to the fold-change of tRFdb-3013a/b, which may be detected, not only one fragment isoform, in the qRT-PCR experiments.In this case, tsRNAs generally refer to the small RNAs derived from the tRNAs, instead of the tRFs or tRNA isoforms with highly similar sequences, to avoid overestimation and artifacts that are presented in tDRnamer and MINTbase.
As mentioned in the GtRNAdb, there are about 64 classes of tRNAs which correspond to twenty-two kinds of amino acids in the human genome, since each amino acid has many different anticodons such as histidine, which has two specific anticodons and that makes up two isotypes of tRNAs (tRNA-His-ATG and tRNA-His-GTG).Hence it is conservatively estimated that not more than four tsRNAs derived from the 3′ end of two mature tRNA isotypes for histidine.Undoubtedly, it will be more complex in the human genome transcriptional system, of the tRNA-His-GTG isotypes, twenty tRNA isoforms (including tRNA-His-GTG-1-1, tRNA-His-GTG-1-2, tRNA-His-GTG-1-6, tRNA-His-GTG-1-7, and tRNA-His-GTG-1-8, etc.) were identified with almost similar sequences, and sometimes only individual bases differ between these twenty sequences (As shown in Figure 2A).Even though with respect to tDRnamer and MINTbase which contain thousands of tRFs in TCGA datasets, it is not surprising that only two hundred tsRNAs were identified in COAD samples based on the GtRNAdb, tRFdb and tRFexplorer databases with our conservative pipeline.