DNA sequence data from breast cancer patients and controls

Large genetic epidemiological studies have found associations between single nucleotide polymorphisms (SNPs) in the region of hormone metabolism genes and breast cancer risk. To try to find the causal genetic variants in these regions, we are performing whole genome and targeted high-throughput DNA sequencing in and around selected candidate genes for women with early onset breast cancer. Samples for this project are from the Australian Breast Cancer Family Study (approx 450), POSH study (England) and Asan Hospital, South Korea (approx 300, including whole genome sequencing data). The bioinformatics analysis is currently performed at VLSCI in collaboration with Dr K Mahmood. So far we have identified a few hundred coding and non-coding variants with putative deleterious effect. Work in the area of identifying causal variants is ongoing using selected UK10K whole genomes as a reference group. Given the complexity of the bioinformatic pipeline and planned expansion of analyses beyond single nucleotide polymorphisms there is a need to store the primary raw data (as FASTQ files). We plan to make this data repository available to other breast cancer researchers in the future alongside our software pipeline.