Skip to main content
U.S. flag

An official website of the United States government

How "simple" methodological decisions affect interpretation of population structure based on reduced representation library DNA sequencing: A case study using the lake whitefish

January 24, 2020

Reduced representation (RRL) sequencing approaches (e.g., RADSeq, genotyping by
sequencing) require decisions about how much to invest in genome coverage and sequencing
depth, as well as choices of values for adjustable bioinformatics parameters. To empirically
explore the importance of these “simple” methodological decisions, we generated two
independent sequencing libraries for the same 142 individual lake whitefish (Coregonus clupeaformis)
using a nextRAD RRL approach: (1) a larger number of loci at low sequencing
depth based on a 9mer (library A); and (2) fewer loci at higher sequencing depth based on a
10mer (library B). The fish were selected from populations with different levels of expected
genetic subdivision. Each library was analyzed using the STACKS pipeline followed by
three types of population structure assessment (FST, DAPC and ADMIXTURE) with iterative
increases in the stringency of sequencing depth and missing data requirements, as well as
more specific a priori population maps. Library B was always able to resolve strong population
differentiation in all three types of assessment regardless of the selected parameters,
largely due to retention of more loci in analyses. In contrast, library A produced more variable
results; increasing the minimum sequencing depth threshold (-m) resulted in a reduced
number of retained loci, and therefore lost resolution at high -m values for FST and ADMIXTURE, but not DAPC. When detecting fine population differentiation, the population map
influenced the number of loci and missing data, which generated artefacts in all downstream
analyses tested. Similarly, when examining fine scale population subdivision, library B was
robust to changing parameters but library A lost resolution depending on the parameter set.
We used library B to examine actual subdivision in our study populations. All three types of
analysis found complete subdivision among populations in Lake Huron, ON and Dore Lake,
SK, Canada using 10,640 SNP loci. Weak population subdivision was detected in Lake
Huron with fish from sites in the north-west, Search Bay, North Point and Hammond Bay,showing slight differentiation. Overall, we show that apparently simple decisions about
library construction and bioinformatics parameters can have important impacts on the interpretation of population subdivision. Although potentially more costly on a per-locus basis,
early investment in striking a balance between the number of loci and sequencing effort is
well worth the reduced genomic coverage for population genetics studies. More conservative
stringency settings on STACKS parameters lead to a final dataset that was more consistent
and robust when examining both weak and strong population differentiation. Overall,
we recommend that researchers approach “simple” methodological decisions with caution,
especially when working on non-model species for the first time.

Publication Year 2020
Title How "simple" methodological decisions affect interpretation of population structure based on reduced representation library DNA sequencing: A case study using the lake whitefish
DOI 10.1371/journal.pone.0226608
Authors Carly F. Graham, Douglas R. Boreham, Richard G. Manzon, Wendylee Stott, Joanna Y. Wilson, Christopher M. Somers
Publication Type Article
Publication Subtype Journal Article
Series Title PLoS ONE
Index ID 70208065
Record Source USGS Publications Warehouse
USGS Organization Great Lakes Science Center