In the coming years, tens of millions of human genomes will be sequenced, largely in the healthcare setting. This clinical data is an extremely valuable and ever-growing resource. To date, however, most clinical data has been inaccessible to research scientists. The Canadian Genomic Data Commons (CGDC), is a digital infrastructure whose goal is to employ innovative methods to facilitate transfer of genomic data from the healthcare setting to a high-performance computing infrastructure that will support a national data-sharing research ecosystem, while ensuring privacy and security. The need for more open and inclusive research collaboration is particularly urgent in the field of rare diseases (RDs) and cancer, which together afflict millions of Canadians.
The CGDC infrastructure comprises two novel Canadian databases and a suite of tools for the study of RDs that will for the first time put Canadian labs on a common platform for sharing and accessing genomic data. These are: 1) the Canadian Open Genetics Repository (COGR), a database of clinically curated genetic data from Canadian laboratories; 2) the Canadian Genomic Aggregation Database (gnomAD-Canada), which will hold aggregated variant frequency data from tens of thousands of genomes; and 3) innovative open-source software tools for RD and cancer research.
