Research Paper Volume 15, Issue 18 pp 9293—9309

Biomedical generative pre-trained based transformer language model for age-related disease target discovery


Figure 3. Variations of the token probability normalization. (A) Strategies for probability normalization at the step of the individual token probability retrieval (1) and final calculation of the total gene probability based on the tokens within its name (2). (B) Distribution of token lengths for protein-coding genes, for which the therapeutics are available (“is known target”) and not (“not known target”). (C) Validation metrics for the approaches of gene tokens normalization in the target identification task.