Abstract

Aging research has advanced significantly over the past century, from early studies on animal models to a current emphasis on clinical and translational applications. As research literature expands exponentially, traditional narrative reviews can no longer capture the field’s complexity, highlighting the need for new, unbiased synthesis tools. Here, we leverage advanced natural language processing (NLP) and machine learning (ML) techniques to analyze 461,789 abstracts related to aging published between 1925 and 2023. By integrating Latent Dirichlet Allocation (LDA), term frequency-inverse document frequency (TF-IDF) analysis, dimensionality reduction and clustering, we delineate a comprehensive thematic landscape of aging research. Our results show a clear shift: early decades focused on cellular and molecular mechanisms, while recent years emphasize clinical studies, especially neurodegenerative disorders. Notably, we identify a persistent divide between the biology of aging (BoA) and clinical research, with minimal conceptual overlap between them. Furthermore, we identify distinct clusters representing key biological processes, some of which may have previously been overlooked as cohesive research domains. Finally, we highlight both established and underexplored interconnections that could guide future research. This study outlines shifting priorities and translational gaps in aging research and offers a scalable, data-driven alternative to conventional reviews.