Research Paper Volume 15, Issue 18 pp 9293—9309

Biomedical generative pre-trained based transformer language model for age-related disease target discovery


Figure 5. Study of the top 200 age-related genes selected with the BioGPT-G model. Venn diagram of the intersection of age-related genes obtained with an established approach based on BioGPT-G (A) or PubMed (B) and GenAge database data. Hypergeometric p-value is shown. (C) GO enrichment analysis for the top 50 genes ranked by BioGPT-G as age-related. (D) The proposed position of the graph nodes corresponding to the proteins appeared in different age-related lists. (E) Box plot of the shortest path length between the nodes of proteins selected by BioGPT or random nodes and the nodes corresponding to the proteins both selected by BioGPT and most frequently co-mentioned with “aging”. One iteration out of 1000 is shown for the random nodes. Asterisks indicate permutational test p-value: **** - p < 0.00001.