Use of NCTC 3000 Whole Genome Sequence data to evaluate bioinformatics tool, HINGE



NCTC 3000 is a large scale, collaborative whole genome sequencing project being undertaken jointly between the Culture Collections of Public Health England (CCPHE), the Wellcome Trust Sanger Institute (WTSI) and Pacific Bioscience. The project aims to sequence the genomes of 3000 Type and reference strains of bacteria from the National Collection of Type Cultures (NCTC). Long read whole genome sequence data from the project is available to the scientific communities via the European Nucleotide Archive (ENA).


Recently data from the project was used by Kamath et al. 2016 to evaluate and assess a new long read bioinformatics assembling tool for repeat resolution within sequences obtained, named HINGE. HINGE was evaluated on 997 bacterial genomes from the NCTC 3000 project and allows resolving repeat sequences graphically, rather than traditional contig based representation.

HINGE’s main feature is its “ability to produce an assembly graph with maximum levels of repeat resolution” (Kamath et al. 2016). It is an assembler that resolves repeats by adding “hinges” to reads containing unresolvable repeat sequences. HINGE is capable of performing the role of two assemblers such as de Bruijn graph and overlap based assemblers by repeat resolutions and error resilience of the assemblers respectively. 

