Document Type


Degree Name

Master of Science (MSc)



Program Name/Specialization

Integrative Biology


Faculty of Science

First Advisor

Gabriel Moreno-Hagelsieb

Advisor Role



The 16S rRNA gene is present within all bacteria, and contains nine variable regions interspersed within conserved regions of the gene. While conserved regions remain mostly constant over time, variable regions can be used for taxonomic identification purposes. Current methodologies for characterizing microbial communities, such as those used to study the human microbiome, involve sequencing short fragments of this ubiquitous gene, and comparing these fragments to reference sequences in databases to identify the microbes present. Traditionally, whole 16S rRNA sequences with more than 97% sequence identity (id) are assigned to a single operational taxonomic unit (OTUs); each OTU being a proxy for a single species. However, because of the short sequence lengths produced by next generation sequencing, a recent trend has been to instead sequence small fragments spanning one or more of the gene’s variable regions, and still cluster them as OTUs at 97% id.

This work evaluated the effectiveness of utilizing short fragments for OTU generation at different id thresholds compared to the complete 16S rRNA gene. Whole gene analysis may be effective for measuring diversity; however, the variable region source of these small fragments may require higher or lower id thresholds. How precisely should the pieces of this ‘genomic jigsaw’ be characterized and distinguished? Two algorithms, UCLUST and CD-HIT-EST, were used to cluster complete 16S rRNA sequences, as well as fragments spanning the V1-3 and V3-5 regions due to their widespread use in human microbiome research. These sequences were obtained from SILVA’s Living-Tree-Project (LTP) database. These clusters were produced at several id thresholds to evaluate how closely fragment clusters would resemble those obtained using complete genes. It was revealed that clustering small fragments, as well as fragment position, impacts OTU generation. However, results have suggested more appropriate id thresholds for these fragments to perhaps help us better assemble this microbial jigsaw puzzle. Clustering at 94% and 96% id for the V1-3 and V3-5 regions, respectively, generates similar results to whole gene clustering at 97%.

Convocation Year


Convocation Season