

These results suggest that random sampling during sequencing leads to the low reproducibility of rare OTUs. Our data suggest that random sampling of rare OTUs (e.g., small associated fauna such as parasites) accounted for most of variation in OTU presence–absence, whereas biases associated with indexed PCRs accounted for a larger amount of variation in relative abundance patterns. However, 79 out of 86 of the unexpected OTUs were represented by <10 sequences that did not appear consistently across replicates. Moreover, the total number of reads for the 34 components of the mock sample differed by up to three orders of magnitude. The total number of Operational Taxonomic Units (OTUs) was ∼4 times higher than expected based on the composition of the mock sample. We used seven indexed broad-range primers and sequenced the resulting library on two consecutive Illumina MiSeq runs. We amplified a short fragment of the mitochondrial Cytochrome c Oxidase Subunit I (COI) for a single mock sample containing equimolar amounts of total genomic DNA from 34 marine invertebrates belonging to six phyla. Here we used one artificial mock community to explore the significance of abundance patterns and disentangle the effects of two potential biases on data reproducibility: indexed PCR primers and random sampling during Illumina MiSeq sequencing. The interpretation of differences in patterns of sequence abundance and the ecological relevance of rare sequences remain particularly uncertain. However, potential sources of bias that can affect the reproducibility of this method remain to be quantified. Mock: OTU that matched the COI sequence of a species included in the mock sample NCBI/BOLD: OTU that did not match a target OTU but had >98% similarity with a reference COI barcode in NCBI or BOLD SAP, OTU that did not match a target OTU nor a reference COI barcode in NCBI or BOLD but could be confidently assigned to higher a taxonomic level using a Bayesian phylogenetic approach implemented in the Statistical Assignment Package (SAP) Unknown: OTU that could not be confidently identified to any taxonomic group using the three approaches detailed above.ĭNA metabarcoding, the PCR-based profiling of natural communities, is becoming the method of choice for biodiversity monitoring because it circumvents some of the limitations inherent to traditional ecological surveys. Branch tips indicate the mean of identification of each OTU. Figure S3: Phylogenetic relationships between representative COI sequences (313 bp) of 120 OTUs detected in the mock sample estimated using a Maximum Likelihood approach We used a general time reversible nucleotide model with a proportion of invariant sites and among site rate heterogeneity modeled with a discrete gamma distribution (GTR +I +G) together with GARLI default settings, including stepwise-addition starting trees.
