This document illustrates how to integrate data for a single study that was generated on more than one sequencing run. Importantly, these steps require that the same PCR primers were used for all sequencing runs.[1] Our general recommendation is that this be done at the feature table stage, rather than prior to quality control. With some quality control methods (notably DADA2) this is required, while with others it’s optional.
The data used in this guide were sequenced on two Illumina MiSeq sequencing runs, and were originally published in Meilander et al. (2024). The data used here are subsampled to 10% of the original input sequences so the commands can be run quickly. You can find the full dataset in the study’s Artifact Repository.
Overview of the process¶
At a high level, this process works as follows if merging data from two or more sequencing runs:
Demultiplex each sequencing run, resulting in one “demux artifact” (e.g.,
SampleData[PairedEndSequencesWithQuality]) per sequencing run.Perform quality control on each “demux artifact” resulting in one
FeatureTableandFeatureData[Sequence]per sequencing run.Merge all per-run
FeatureTableartifacts into a singleFeatureTableartifact.Merge all
FeatureData[Sequence]artifacts into a singleFeatureData[Sequence]artifact.Merge metadata, if necessary.
Obtain sample metadata and two “demux artifacts”¶
The following commands will download the sample metadata as tab-separated text, and two demultiplexed sequence artifacts representing two different sequencing runs. Note that in this example, all sample metadata from the full study is contained in a single sample metadata file. If that were not the case, you can merge your sample metadata (see How to merge metadata).
wget -O 'sample-metadata.tsv' \
'https://amplicon-docs.qiime2.org/en/latest/data/merge-runs/sample-metadata.tsv'
from qiime2 import Metadata
from urllib import request
url = 'https://amplicon-docs.qiime2.org/en/latest/data/merge-runs/sample-metadata.tsv'
fn = 'sample-metadata.tsv'
request.urlretrieve(url, fn)
sample_metadata_md = Metadata.load(fn)
library(reticulate)
Metadata <- import("qiime2")$Metadata
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/merge-runs/sample-metadata.tsv'
fn <- 'sample-metadata.tsv'
request$urlretrieve(url, fn)
sample_metadata_md <- Metadata$load(fn)
sample_metadata = use.init_metadata_from_url(
'sample-metadata',
'https://www.dropbox.com/scl/fi/irosimbb1aud1aa7frzxf/sample-metadata.tsv?rlkey=f45jpxzajjz9xx9vpvfnf1zjx&st=nahafuvy&dl=1')sample-metadata.tsv| download
wget -O 'demux-nano1.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/merge-runs/demux-nano1.qza'
from qiime2 import Artifact
url = 'https://amplicon-docs.qiime2.org/en/latest/data/merge-runs/demux-nano1.qza'
fn = 'demux-nano1.qza'
request.urlretrieve(url, fn)
demux_nano1 = Artifact.load(fn)
Artifact <- import("qiime2")$Artifact
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/merge-runs/demux-nano1.qza'
fn <- 'demux-nano1.qza'
request$urlretrieve(url, fn)
demux_nano1 <- Artifact$load(fn)
demux_nano1 = use.init_artifact_from_url(
'demux_nano1',
'https://www.dropbox.com/scl/fi/hpsl1hxa0kj3njhes7p64/demux-10p.qza?rlkey=e5brlu9xn4qcrqaan11z2oi7d&st=r9or2kur&dl=1')wget -O 'demux-nano2.qza' \
'https://amplicon-docs.qiime2.org/en/latest/data/merge-runs/demux-nano2.qza'
url = 'https://amplicon-docs.qiime2.org/en/latest/data/merge-runs/demux-nano2.qza'
fn = 'demux-nano2.qza'
request.urlretrieve(url, fn)
demux_nano2 = Artifact.load(fn)
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/merge-runs/demux-nano2.qza'
fn <- 'demux-nano2.qza'
request$urlretrieve(url, fn)
demux_nano2 <- Artifact$load(fn)
demux_nano2 = use.init_artifact_from_url(
'demux_nano2',
'https://www.dropbox.com/scl/fi/73f6rmcq7lelug5qbuhl6/demux-10p.qza?rlkey=s0hoh326fes3z2usvgs62tv3c&st=peakjay3&dl=1')Sequence quality control¶
We’ll begin by performing quality control on the demultiplexed sequence artifacts independently, by calling DADA2’s denoise-paired command on each set of demultiplexed sequences.
The trim and truncation parameters should be chosen based on your data.
You should use the same parameters for each run.
qiime dada2 denoise-paired \
--i-demultiplexed-seqs demux-nano1.qza \
--p-trim-left-f 0 \
--p-trunc-len-f 250 \
--p-trim-left-r 0 \
--p-trunc-len-r 250 \
--o-representative-sequences asv-seqs-nano1.qza \
--o-table asv-table-nano1.qza \
--o-denoising-stats denoising-stats-nano1.qza \
--o-base-transition-stats base-transition-stats-nano1.qza
qiime dada2 denoise-paired \
--i-demultiplexed-seqs demux-nano2.qza \
--p-trim-left-f 0 \
--p-trunc-len-f 250 \
--p-trim-left-r 0 \
--p-trunc-len-r 250 \
--o-representative-sequences asv-seqs-nano2.qza \
--o-table asv-table-nano2.qza \
--o-denoising-stats denoising-stats-nano2.qza \
--o-base-transition-stats base-transition-stats-nano2.qzaimport rachis.plugins.dada2.actions as dada2_actions
asv_table_nano1, asv_seqs_nano1, denoising_stats_nano1, base_transition_stats_nano1 = dada2_actions.denoise_paired(
demultiplexed_seqs=demux_nano1,
trim_left_f=0,
trunc_len_f=250,
trim_left_r=0,
trunc_len_r=250,
)
asv_table_nano2, asv_seqs_nano2, denoising_stats_nano2, base_transition_stats_nano2 = dada2_actions.denoise_paired(
demultiplexed_seqs=demux_nano2,
trim_left_f=0,
trunc_len_f=250,
trim_left_r=0,
trunc_len_r=250,
)dada2_actions <- import("rachis.plugins.dada2.actions")
action_results <- dada2_actions$denoise_paired(
demultiplexed_seqs=demux_nano1,
trim_left_f=0L,
trunc_len_f=250L,
trim_left_r=0L,
trunc_len_r=250L,
)
asv_seqs_nano1 <- action_results$representative_sequences
asv_table_nano1 <- action_results$table
denoising_stats_nano1 <- action_results$denoising_stats
base_transition_stats_nano1 <- action_results$base_transition_stats
action_results <- dada2_actions$denoise_paired(
demultiplexed_seqs=demux_nano2,
trim_left_f=0L,
trunc_len_f=250L,
trim_left_r=0L,
trunc_len_r=250L,
)
asv_seqs_nano2 <- action_results$representative_sequences
asv_table_nano2 <- action_results$table
denoising_stats_nano2 <- action_results$denoising_stats
base_transition_stats_nano2 <- action_results$base_transition_statsasv_seqs_nano1, asv_table_nano1, denoising_stats_nano1, base_transition_stats_nano1 = use.action(
use.UsageAction(plugin_id='dada2',
action_id='denoise_paired'),
use.UsageInputs(demultiplexed_seqs=demux_nano1,
trim_left_f=0,
trunc_len_f=250,
trim_left_r=0,
trunc_len_r=250),
use.UsageOutputNames(representative_sequences='asv_seqs_nano1',
table='asv_table_nano1',
denoising_stats='denoising_stats_nano1',
base_transition_stats='base_transition_stats_nano1'))
asv_seqs_nano2, asv_table_nano2, denoising_stats_nano2, base_transition_stats_nano2 = use.action(
use.UsageAction(plugin_id='dada2',
action_id='denoise_paired'),
use.UsageInputs(demultiplexed_seqs=demux_nano2,
trim_left_f=0,
trunc_len_f=250,
trim_left_r=0,
trunc_len_r=250),
use.UsageOutputNames(representative_sequences='asv_seqs_nano2',
table='asv_table_nano2',
denoising_stats='denoising_stats_nano2',
base_transition_stats='base_transition_stats_nano2'))asv-seqs-nano1.qza| download | viewasv-table-nano1.qza| download | viewdenoising-stats-nano1.qza| download | viewbase-transition-stats-nano1.qza| download | viewasv-seqs-nano2.qza| download | viewasv-table-nano2.qza| download | viewdenoising-stats-nano2.qza| download | viewbase-transition-stats-nano2.qza| download | view
Merging data¶
After quality control is complete, you’ll have two FeatureTable artifacts and two FeatureData[Sequence] artifacts.
Merging at this stage simplifies downstream work.
You can do this using the following two commands.
Merging FeatureTable artifacts¶
qiime feature-table merge \
--i-tables asv-table-nano1.qza asv-table-nano2.qza \
--o-merged-table asv-table.qzaimport rachis.plugins.feature_table.actions as feature_table_actions
asv_table, = feature_table_actions.merge(
tables=[asv_table_nano1, asv_table_nano2],
)feature_table_actions <- import("rachis.plugins.feature_table.actions")
action_results <- feature_table_actions$merge(
tables=list(asv_table_nano1, asv_table_nano2),
)
asv_table <- action_results$merged_tableasv_table, = use.action(
use.UsageAction(plugin_id='feature_table',
action_id='merge'),
use.UsageInputs(tables=[asv_table_nano1, asv_table_nano2]),
use.UsageOutputNames(merged_table='asv_table'))Merging FeatureData[Sequence] artifacts¶
qiime feature-table merge-seqs \
--i-data asv-seqs-nano1.qza asv-seqs-nano2.qza \
--o-merged-data asv-seqs.qzaasv_seqs, = feature_table_actions.merge_seqs(
data=[asv_seqs_nano1, asv_seqs_nano2],
)action_results <- feature_table_actions$merge_seqs(
data=list(asv_seqs_nano1, asv_seqs_nano2),
)
asv_seqs <- action_results$merged_dataasv_seqs, = use.action(
use.UsageAction(plugin_id='feature_table',
action_id='merge_seqs'),
use.UsageInputs(data=[asv_seqs_nano1, asv_seqs_nano2]),
use.UsageOutputNames(merged_data='asv_seqs'))Downstream analysis¶
At this stage, you can continue on as if you had generated your data on a single sequencing run. For example, you can generate a summary of the merged feature table as follows.
qiime feature-table summarize \
--i-table asv-table.qza \
--m-metadata-file sample-metadata.tsv \
--o-summary asv-table.qzv \
--o-sample-frequencies sample-frequencies.qza \
--o-feature-frequencies asv-frequencies.qzaasv_frequencies, sample_frequencies, asv_table_viz = feature_table_actions.summarize(
table=asv_table,
metadata=sample_metadata_md,
)action_results <- feature_table_actions$summarize(
table=asv_table,
metadata=sample_metadata_md,
)
asv_table_viz <- action_results$summary
sample_frequencies <- action_results$sample_frequencies
asv_frequencies <- action_results$feature_frequenciesuse.action(
use.UsageAction(plugin_id='feature_table',
action_id='summarize'),
use.UsageInputs(table=asv_table,
metadata=sample_metadata),
use.UsageOutputNames(summary='asv_table',
sample_frequencies='sample_frequencies',
feature_frequencies='asv_frequencies'))asv-table.qzv| download | viewsample-frequencies.qza| download | viewasv-frequencies.qza| download | view
If you’re interested in combining sequencing runs where different PCR primers were used, the process is more challenging as you must be strategic about managing the known bias that this introduces. See Li et al (2025) for a discussion of this topic. Table 1 in particular outlines different approaches (this paper does not explicitly evaluate the methods but describes them and how commonly they have been used).
- Meilander, J., Herman, C., Manley, A., Augustine, G., Birdsell, D., Bolyen, E., Celona, K. R., Coffey, H., Cocking, J., Donoghue, T., Draves, A., Erickson, D., Foley, M., Gehret, L., Hagen, J., Hepp, C., Ingram, P., John, D., Kadar, K., … Caporaso, J. G. (2024). Upcycling Human Excrement: The Gut Microbiome to Soil Microbiome Axis. arXiv. 10.48550/ARXIV.2411.04148
- Caporaso, J. G., & Meilander, J. (2025). Upcycling Human Excrement: The Gut Microbiome to Soil Microbiome Axis (supporting data). Zenodo. 10.5281/ZENODO.13887456
- Callahan, B. J., McMurdie, P. J., Rosen, M. J., Han, A. W., Johnson, A. J. A., & Holmes, S. P. (2016). DADA2: high-resolution sample inference from Illumina amplicon data. Nature Methods, 13(7), 581. 10.1038/nmeth.3869
- Li, Z., Chen, Y., Sun, Y., McArthur, K., Carnes, M., Liu, T., Mueller, N. T., Page, G. P., Rossman, L., Smirnova, E., White, J. D., Kress, A. M., & Debelius, J. W. (2025). In harmony? A scoping review of methods to combine multiple 16S amplicon data sets. 10.1101/2025.02.11.637740