Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

How to merge data from multiple sequencing runs

This document illustrates how to integrate data for a single study that was generated on more than one sequencing run. Importantly, these steps require that the same PCR primers were used for all sequencing runs.[1] Our general recommendation is that this be done at the feature table stage, rather than prior to quality control. With some quality control methods (notably DADA2) this is required, while with others it’s optional.

The data used in this guide were sequenced on two Illumina MiSeq sequencing runs, and were originally published in Meilander et al. (2024). The data used here are subsampled to 10% of the original input sequences so the commands can be run quickly. You can find the full dataset in the study’s Artifact Repository.

Overview of the process

At a high level, this process works as follows if merging data from two or more sequencing runs:

  1. Demultiplex each sequencing run, resulting in one “demux artifact” (e.g., SampleData[PairedEndSequencesWithQuality]) per sequencing run.

  2. Perform quality control on each “demux artifact” resulting in one FeatureTable and FeatureData[Sequence] per sequencing run.

  3. Merge all per-run FeatureTable artifacts into a single FeatureTable artifact.

  4. Merge all FeatureData[Sequence] artifacts into a single FeatureData[Sequence] artifact.

  5. Merge metadata, if necessary.

Obtain sample metadata and two “demux artifacts”

The following commands will download the sample metadata as tab-separated text, and two demultiplexed sequence artifacts representing two different sequencing runs. Note that in this example, all sample metadata from the full study is contained in a single sample metadata file. If that were not the case, you can merge your sample metadata (see How to merge metadata).

[Command Line]
[Python API]
[R API]
[View Source]
wget -O 'sample-metadata.tsv' \
  'https://amplicon-docs.qiime2.org/en/latest/data/merge-runs/sample-metadata.tsv'
[Command Line]
[Python API]
[R API]
[View Source]
wget -O 'demux-nano1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/merge-runs/demux-nano1.qza'
[Command Line]
[Python API]
[R API]
[View Source]
wget -O 'demux-nano2.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/merge-runs/demux-nano2.qza'

Sequence quality control

We’ll begin by performing quality control on the demultiplexed sequence artifacts independently, by calling DADA2’s denoise-paired command on each set of demultiplexed sequences. The trim and truncation parameters should be chosen based on your data. You should use the same parameters for each run.

[Command Line]
[Python API]
[R API]
[View Source]
qiime dada2 denoise-paired \
  --i-demultiplexed-seqs demux-nano1.qza \
  --p-trim-left-f 0 \
  --p-trunc-len-f 250 \
  --p-trim-left-r 0 \
  --p-trunc-len-r 250 \
  --o-representative-sequences asv-seqs-nano1.qza \
  --o-table asv-table-nano1.qza \
  --o-denoising-stats denoising-stats-nano1.qza \
  --o-base-transition-stats base-transition-stats-nano1.qza
qiime dada2 denoise-paired \
  --i-demultiplexed-seqs demux-nano2.qza \
  --p-trim-left-f 0 \
  --p-trunc-len-f 250 \
  --p-trim-left-r 0 \
  --p-trunc-len-r 250 \
  --o-representative-sequences asv-seqs-nano2.qza \
  --o-table asv-table-nano2.qza \
  --o-denoising-stats denoising-stats-nano2.qza \
  --o-base-transition-stats base-transition-stats-nano2.qza

Merging data

After quality control is complete, you’ll have two FeatureTable artifacts and two FeatureData[Sequence] artifacts. Merging at this stage simplifies downstream work. You can do this using the following two commands.

Merging FeatureTable artifacts

[Command Line]
[Python API]
[R API]
[View Source]
qiime feature-table merge \
  --i-tables asv-table-nano1.qza asv-table-nano2.qza \
  --o-merged-table asv-table.qza

Merging FeatureData[Sequence] artifacts

[Command Line]
[Python API]
[R API]
[View Source]
qiime feature-table merge-seqs \
  --i-data asv-seqs-nano1.qza asv-seqs-nano2.qza \
  --o-merged-data asv-seqs.qza

Downstream analysis

At this stage, you can continue on as if you had generated your data on a single sequencing run. For example, you can generate a summary of the merged feature table as follows.

[Command Line]
[Python API]
[R API]
[View Source]
qiime feature-table summarize \
  --i-table asv-table.qza \
  --m-metadata-file sample-metadata.tsv \
  --o-summary asv-table.qzv \
  --o-sample-frequencies sample-frequencies.qza \
  --o-feature-frequencies asv-frequencies.qza
Footnotes
  1. If you’re interested in combining sequencing runs where different PCR primers were used, the process is more challenging as you must be strategic about managing the known bias that this introduces. See Li et al (2025) for a discussion of this topic. Table 1 in particular outlines different approaches (this paper does not explicitly evaluate the methods but describes them and how commonly they have been used).

References
  1. Meilander, J., Herman, C., Manley, A., Augustine, G., Birdsell, D., Bolyen, E., Celona, K. R., Coffey, H., Cocking, J., Donoghue, T., Draves, A., Erickson, D., Foley, M., Gehret, L., Hagen, J., Hepp, C., Ingram, P., John, D., Kadar, K., … Caporaso, J. G. (2024). Upcycling Human Excrement: The Gut Microbiome to Soil Microbiome Axis. arXiv. 10.48550/ARXIV.2411.04148
  2. Caporaso, J. G., & Meilander, J. (2025). Upcycling Human Excrement: The Gut Microbiome to Soil Microbiome Axis (supporting data). Zenodo. 10.5281/ZENODO.13887456
  3. Callahan, B. J., McMurdie, P. J., Rosen, M. J., Han, A. W., Johnson, A. J. A., & Holmes, S. P. (2016). DADA2: high-resolution sample inference from Illumina amplicon data. Nature Methods, 13(7), 581. 10.1038/nmeth.3869
  4. Li, Z., Chen, Y., Sun, Y., McArthur, K., Carnes, M., Liu, T., Mueller, N. T., Page, G. P., Rossman, L., Smirnova, E., White, J. D., Kress, A. M., & Debelius, J. W. (2025). In harmony? A scoping review of methods to combine multiple 16S amplicon data sets. 10.1101/2025.02.11.637740