How to merge data from multiple sequencing runs

This document illustrates how to integrate data for a single study that was generated on more than one sequencing run. Importantly, these steps require that the same PCR primers were used for all sequencing runs.^[1] Our general recommendation is that this be done at the feature table stage, rather than prior to quality control. With some quality control methods (notably DADA2) this is required, while with others it’s optional.

The data used in this guide were sequenced on two Illumina MiSeq sequencing runs, and were originally published in Meilander et al. (2024). The data used here are subsampled to 10% of the original input sequences so the commands can be run quickly. You can find the full dataset in the study’s Artifact Repository.

Overview of the process¶

At a high level, this process works as follows if merging data from two or more sequencing runs:

Demultiplex each sequencing run, resulting in one “demux artifact” (e.g., SampleData[PairedEndSequencesWithQuality]) per sequencing run.
Perform quality control on each “demux artifact” resulting in one FeatureTable and FeatureData[Sequence] per sequencing run.
Merge all per-run FeatureTable artifacts into a single FeatureTable artifact.
Merge all FeatureData[Sequence] artifacts into a single FeatureData[Sequence] artifact.
Merge metadata, if necessary.

Obtain sample metadata and two “demux artifacts”¶

The following commands will download the sample metadata as tab-separated text, and two demultiplexed sequence artifacts representing two different sequencing runs. Note that in this example, all sample metadata from the full study is contained in a single sample metadata file. If that were not the case, you can merge your sample metadata (see How to merge metadata).

[Command Line]

[Python API]

[R API]

[View Source]

wget -O 'sample-metadata.tsv' \
  'https://amplicon-docs.qiime2.org/en/latest/data/merge-runs/sample-metadata.tsv'

from qiime2 import Metadata
from urllib import request

url = 'https://amplicon-docs.qiime2.org/en/latest/data/merge-runs/sample-metadata.tsv'
fn = 'sample-metadata.tsv'
request.urlretrieve(url, fn)
sample_metadata_md = Metadata.load(fn)

library(reticulate)

Metadata <- import("qiime2")$Metadata
request <- import("urllib")$request

url <- 'https://amplicon-docs.qiime2.org/en/latest/data/merge-runs/sample-metadata.tsv'
fn <- 'sample-metadata.tsv'
request$urlretrieve(url, fn)
sample_metadata_md <- Metadata$load(fn)

sample-metadata.tsv | download

[Command Line]

[Python API]

[R API]

[View Source]

wget -O 'demux-nano1.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/merge-runs/demux-nano1.qza'

Artifact <- import("qiime2")$Artifact

url <- 'https://amplicon-docs.qiime2.org/en/latest/data/merge-runs/demux-nano1.qza'
fn <- 'demux-nano1.qza'
request$urlretrieve(url, fn)
demux_nano1 <- Artifact$load(fn)

demux-nano1.qza | download | view

[Command Line]

[Python API]

[R API]

[View Source]

wget -O 'demux-nano2.qza' \
  'https://amplicon-docs.qiime2.org/en/latest/data/merge-runs/demux-nano2.qza'

demux-nano2.qza | download | view

Sequence quality control¶

We’ll begin by performing quality control on the demultiplexed sequence artifacts independently, by calling DADA2’s denoise-paired command on each set of demultiplexed sequences. The trim and truncation parameters should be chosen based on your data. You should use the same parameters for each run.

[Command Line]

[Python API]

[R API]

[View Source]

qiime dada2 denoise-paired \
  --i-demultiplexed-seqs demux-nano1.qza \
  --p-trim-left-f 0 \
  --p-trunc-len-f 250 \
  --p-trim-left-r 0 \
  --p-trunc-len-r 250 \
  --o-representative-sequences asv-seqs-nano1.qza \
  --o-table asv-table-nano1.qza \
  --o-denoising-stats denoising-stats-nano1.qza \
  --o-base-transition-stats base-transition-stats-nano1.qza
qiime dada2 denoise-paired \
  --i-demultiplexed-seqs demux-nano2.qza \
  --p-trim-left-f 0 \
  --p-trunc-len-f 250 \
  --p-trim-left-r 0 \
  --p-trunc-len-r 250 \
  --o-representative-sequences asv-seqs-nano2.qza \
  --o-table asv-table-nano2.qza \
  --o-denoising-stats denoising-stats-nano2.qza \
  --o-base-transition-stats base-transition-stats-nano2.qza

import rachis.plugins.dada2.actions as dada2_actions

asv_table_nano1, asv_seqs_nano1, denoising_stats_nano1, base_transition_stats_nano1 = dada2_actions.denoise_paired(
    demultiplexed_seqs=demux_nano1,
    trim_left_f=0,
    trunc_len_f=250,
    trim_left_r=0,
    trunc_len_r=250,
)
asv_table_nano2, asv_seqs_nano2, denoising_stats_nano2, base_transition_stats_nano2 = dada2_actions.denoise_paired(
    demultiplexed_seqs=demux_nano2,
    trim_left_f=0,
    trunc_len_f=250,
    trim_left_r=0,
    trunc_len_r=250,
)

dada2_actions <- import("rachis.plugins.dada2.actions")

action_results <- dada2_actions$denoise_paired(
    demultiplexed_seqs=demux_nano1,
    trim_left_f=0L,
    trunc_len_f=250L,
    trim_left_r=0L,
    trunc_len_r=250L,
)
asv_seqs_nano1 <- action_results$representative_sequences
asv_table_nano1 <- action_results$table
denoising_stats_nano1 <- action_results$denoising_stats
base_transition_stats_nano1 <- action_results$base_transition_stats
action_results <- dada2_actions$denoise_paired(
    demultiplexed_seqs=demux_nano2,
    trim_left_f=0L,
    trunc_len_f=250L,
    trim_left_r=0L,
    trunc_len_r=250L,
)
asv_seqs_nano2 <- action_results$representative_sequences
asv_table_nano2 <- action_results$table
denoising_stats_nano2 <- action_results$denoising_stats
base_transition_stats_nano2 <- action_results$base_transition_stats

asv_seqs_nano1, asv_table_nano1, denoising_stats_nano1, base_transition_stats_nano1 = use.action(
    use.UsageAction(plugin_id='dada2',
                    action_id='denoise_paired'),
    use.UsageInputs(demultiplexed_seqs=demux_nano1,
                    trim_left_f=0,
                    trunc_len_f=250,
                    trim_left_r=0,
                    trunc_len_r=250),
    use.UsageOutputNames(representative_sequences='asv_seqs_nano1',
                         table='asv_table_nano1',
                         denoising_stats='denoising_stats_nano1',
                         base_transition_stats='base_transition_stats_nano1'))

asv_seqs_nano2, asv_table_nano2, denoising_stats_nano2, base_transition_stats_nano2 = use.action(
    use.UsageAction(plugin_id='dada2',
                    action_id='denoise_paired'),
    use.UsageInputs(demultiplexed_seqs=demux_nano2,
                    trim_left_f=0,
                    trunc_len_f=250,
                    trim_left_r=0,
                    trunc_len_r=250),
    use.UsageOutputNames(representative_sequences='asv_seqs_nano2',
                         table='asv_table_nano2',
                         denoising_stats='denoising_stats_nano2',
                         base_transition_stats='base_transition_stats_nano2'))

asv-seqs-nano1.qza | download | view
asv-table-nano1.qza | download | view
denoising-stats-nano1.qza | download | view
base-transition-stats-nano1.qza | download | view
asv-seqs-nano2.qza | download | view
asv-table-nano2.qza | download | view
denoising-stats-nano2.qza | download | view
base-transition-stats-nano2.qza | download | view

Merging data¶

After quality control is complete, you’ll have two FeatureTable artifacts and two FeatureData[Sequence] artifacts. Merging at this stage simplifies downstream work. You can do this using the following two commands.

Merging `FeatureTable` artifacts¶

[Command Line]

[Python API]

[R API]

[View Source]

qiime feature-table merge \
  --i-tables asv-table-nano1.qza asv-table-nano2.qza \
  --o-merged-table asv-table.qza

feature_table_actions <- import("rachis.plugins.feature_table.actions")

action_results <- feature_table_actions$merge(
    tables=list(asv_table_nano1, asv_table_nano2),
)
asv_table <- action_results$merged_table

asv_table, = use.action(
    use.UsageAction(plugin_id='feature_table',
                    action_id='merge'),
    use.UsageInputs(tables=[asv_table_nano1, asv_table_nano2]),
    use.UsageOutputNames(merged_table='asv_table'))

asv-table.qza | download | view

Merging `FeatureData[Sequence]` artifacts¶

[Command Line]

[Python API]

[R API]

[View Source]

qiime feature-table merge-seqs \
  --i-data asv-seqs-nano1.qza asv-seqs-nano2.qza \
  --o-merged-data asv-seqs.qza

asv_seqs, = use.action(
    use.UsageAction(plugin_id='feature_table',
                    action_id='merge_seqs'),
    use.UsageInputs(data=[asv_seqs_nano1, asv_seqs_nano2]),
    use.UsageOutputNames(merged_data='asv_seqs'))

asv-seqs.qza | download | view

Downstream analysis¶

At this stage, you can continue on as if you had generated your data on a single sequencing run. For example, you can generate a summary of the merged feature table as follows.

[Command Line]

[Python API]

[R API]

[View Source]

qiime feature-table summarize \
  --i-table asv-table.qza \
  --m-metadata-file sample-metadata.tsv \
  --o-summary asv-table.qzv \
  --o-sample-frequencies sample-frequencies.qza \
  --o-feature-frequencies asv-frequencies.qza

action_results <- feature_table_actions$summarize(
    table=asv_table,
    metadata=sample_metadata_md,
)
asv_table_viz <- action_results$summary
sample_frequencies <- action_results$sample_frequencies
asv_frequencies <- action_results$feature_frequencies

use.action(
    use.UsageAction(plugin_id='feature_table',
                    action_id='summarize'),
    use.UsageInputs(table=asv_table,
                    metadata=sample_metadata),
    use.UsageOutputNames(summary='asv_table',
                         sample_frequencies='sample_frequencies',
                         feature_frequencies='asv_frequencies'))

asv-table.qzv | download | view
sample-frequencies.qza | download | view
asv-frequencies.qza | download | view

Footnotes¶

If you’re interested in combining sequencing runs where different PCR primers were used, the process is more challenging as you must be strategic about managing the known bias that this introduces. See Li et al (2025) for a discussion of this topic. Table 1 in particular outlines different approaches (this paper does not explicitly evaluate the methods but describes them and how commonly they have been used).
↩

References¶

Meilander, J., Herman, C., Manley, A., Augustine, G., Birdsell, D., Bolyen, E., Celona, K. R., Coffey, H., Cocking, J., Donoghue, T., Draves, A., Erickson, D., Foley, M., Gehret, L., Hagen, J., Hepp, C., Ingram, P., John, D., Kadar, K., … Caporaso, J. G. (2024). Upcycling Human Excrement: The Gut Microbiome to Soil Microbiome Axis. arXiv. 10.48550/ARXIV.2411.04148
Caporaso, J. G., & Meilander, J. (2025). Upcycling Human Excrement: The Gut Microbiome to Soil Microbiome Axis (supporting data). Zenodo. 10.5281/ZENODO.13887456
Callahan, B. J., McMurdie, P. J., Rosen, M. J., Han, A. W., Johnson, A. J. A., & Holmes, S. P. (2016). DADA2: high-resolution sample inference from Illumina amplicon data. Nature Methods, 13(7), 581. 10.1038/nmeth.3869
Li, Z., Chen, Y., Sun, Y., McArthur, K., Carnes, M., Liu, T., Mueller, N. T., Page, G. P., Rossman, L., Smirnova, E., White, J. D., Kress, A. M., & Debelius, J. W. (2025). In harmony? A scoping review of methods to combine multiple 16S amplicon data sets. 10.1101/2025.02.11.637740

Overview of the process¶

Obtain sample metadata and two “demux artifacts”¶

Sequence quality control¶

Merging data¶

Merging FeatureTable artifacts¶

Merging FeatureData[Sequence] artifacts¶

Downstream analysis¶

Merging `FeatureTable` artifacts¶

Merging `FeatureData[Sequence]` artifacts¶