Butler enables rapid cloud-based analysis of thousands of human genomes, Nature Biotechnology (2020)

Sergei Yakneen  [1,2,6], Sebastian M. Waszak[1], PCAWG Technical Working Group[3], Michael Gertz[2], Jan O. Korbel  [1],[4] and PCAWG Consortium [5]


[1] European Molecular Biology Laboratory (EMBL), et. al.

[3] As a consortium member, Jongwhi Hong, Jongsun Jung, Genome Data Integration Centre, Syntekabio Inc., 187 Techno 2-ro, B512, Yuseong-gu, Daejeon, 34025, South Korea.

[5] PCAWG Consortium members



The use of whole genome sequence has increased recently with rapid progression of next-generation sequencing (NGS) technologies. However, storing raw sequence reads to perform large-scale genome analysis pose hardware challenges. Despite advancement in genome analytic platforms, efficient approaches remain relevant especially as applied to the human genome. In this study, an Integrated Genome Sizing (IGS) approach is adopted to speed up multiple whole genome analysis in high-performance computing (HPC) environment. The approach splits a genome (GRCh37) into 630 chunks (fragments) wherein multiple chunks can simultaneously be parallelized for sequence analyses across cohorts.