News

GCAT introduces a new open tool to facilitate the reuse of genomic data

- Projects, Research

The GCAT|Genomes for Life team, a strategic project of the Germans Trias i Pujol Research Institute (IGTP), has developed PolyGenie, a new tool designed to facilitate the exploration and reuse of genomic data by the research community. This initiative represents a further step in the project's strategy to promote FAIR data, reusable resources and open infrastructures for biomedical research. The platform and its application to the GCAT cohort are described in an article published in the journal NAR Genomics and Bioinformatics.

Generating data is only the first step in the scientific process. Its value increases when those data can be reused, combined with other sources of information and transformed into new knowledge. With this vision in mind, the GCAT team has contributed to the development of PolyGenie, a tool designed to facilitate the analysis, exploration and reuse of genomic data by the research community.

The platform was created to support so-called phenome-wide association studies (PheWAS), an approach that enables researchers to analyse how genetic predisposition to a particular disease or trait is associated with hundreds or thousands of other phenotypes, including diseases, lifestyle factors and molecular data. To do so, PolyGenie uses polygenic risk scores (PRS), which integrate the effects of thousands of genetic variants to estimate susceptibility to different traits and diseases.

More than 200,000 associations analysed using data from the GCAT cohort

To demonstrate its capabilities, the researchers applied PolyGenie to data from the GCAT cohort, a population-based study involving nearly 20,000 individuals aged between 40 and 65 in Catalonia. For this implementation, almost 5,000 genotyped participants were analysed, combining 135 polygenic risk scores with 1,483 different phenotypes, including diseases, lifestyle variables and metabolomic data. This enabled the evaluation of more than 200,000 potential associations between genetic risk and phenotypes, demonstrating the platform's ability to systematically explore the relationships between genetic risk and a wide range of health-related characteristics.

As an example, the team explored the relationships between a risk score associated with frailty and different clinical outcomes. The analyses showed that the prevalence of obesity increased progressively as the genetic risk of frailty increased. An association was also observed between this genetic risk and major depressive disorder, with a higher prevalence among women. This type of analysis illustrates PolyGenie's ability to identify shared patterns between diseases and biological factors, facilitating the generation of new research hypotheses.

Reusing genomic data to generate new knowledge

"Although tools already exist to calculate polygenic risk scores and other platforms are available to visualise results, there has so far been a lack of resources that facilitate the systematic application of this type of analysis across different cohorts. PolyGenie fills this gap through an open-source pipeline developed with Nextflow, designed to analyse polygenic risk scores regardless of the method used to generate them, making it easier to apply across different research settings", explains Natàlia Blay, GCAT researcher and co-author of the study.

Another strength of the platform is that it incorporates interactive visualisation tools that facilitate the exploration of results. In addition, it can be easily adapted to new cohorts through configuration files and metadata, without the need to modify the code.

For GCAT, this initiative represents another step forward in building open resources for research. Over recent years, the cohort has evolved from a population resource into a scientific platform that promotes the responsible reuse of data, collaboration between institutions and the development of new resources for the research community. PolyGenie exemplifies this evolution, transforming complex genomic information into a more accessible resource for researchers working in areas such as precision medicine, population genetics and the study of the biological determinants of health.

As a resource integrated into the Spanish node of ELIXIR Spain and connected to European infrastructures such as the European Genome-phenome Archive (EGA), GCAT is fully aligned with the principles of Open Science and FAIR data (Findable, Accessible, Interoperable and Reusable). "Open Science is not only about sharing data. It is also about creating the tools and infrastructures that enable those data to be transformed into knowledge that benefits society", says Xavier Farré, GCAT researcher and co-first author of the study. He adds that "initiatives such as PolyGenie demonstrate how public investment not only enables the generation of highly valuable scientific data, but also supports the development of the digital infrastructures needed to ensure that these data are accessible, reusable and useful".

The next step: incorporating data from the entire GCAT cohort

This advance has been made possible thanks to funding from the Resilience Funds through the GEPETO project (Genome Profiling in the GCAT, an Electronic Health Record Population-Based Cohort Study to Improve Prevention, Diagnosis and Treatment of Common Diseases Using Polygenic Risk Scores; TED2021-130626B-I00), funded by the Spanish Ministry of Science, Innovation and Universities since 2023. The main objective of this strategic project is to complete the genotyping of the entire GCAT cohort and make these data available to the scientific community as an open, interoperable and high-value resource for biomedical research.

The current study and demonstration of the PolyGenie tool were developed using data from the first 5,000 genotyped participants in the cohort. However, data from nearly 20,000 participants generated within the framework of the GEPETO project will be incorporated in the coming months, completing the population resource originally envisaged. This expansion will significantly increase the cohort's potential for genomic, epidemiological and precision medicine studies.

Reference

Farré X, Gasco M, Blay N, de Cid R. PolyGenie: a reproducible Nextflow pipeline for phenome-wide association studies using polygenic risk scores. NAR Genom Bioinform. 2026 Jun 9;8(2):lqag056. DOI: 10.1093/nargab/lqag056.