How to merge data from multiple sequencing runs

This document illustrates how to integrate data for a single study that was generated on more than one sequencing run. Our general recommendation is that this be done at the feature table stage, rather than prior to quality control. With some quality control methods (notably DADA2) this is required, while with others it’s optional.

The data used in this guide were sequenced on two Illumina MiSeq sequencing runs, and were originally published in Meilander et al. (2024). The data used here are subsampled to 10% of the original input sequences so the commands can be run quickly. You can find the full dataset in the study’s Artifact Repository.

Overview of the process¶

At a high level, this process works as follows if merging data from two or more sequencing runs:

Demultiplex each sequencing run, resulting in one “demux artifact” (e.g., SampleData[PairedEndSequencesWithQuality]) per sequencing run.
Perform quality control on each “demux artifact” resulting in one FeatureTable and FeatureData[Sequence] per sequencing run.
Merge all per-run FeatureTable artifacts into a single FeatureTable artifact.
Merge all FeatureData[Sequence] artifacts into a single FeatureData[Sequence] artifact.
Merge metadata, if necessary.

Obtain sample metadata and two “demux artifacts”¶

The following commands will download the sample metadata as tab-separated text, and two demultiplexed sequence artifacts representing two different sequencing runs. Note that in this example, all sample metadata from the full study is contained in a single sample metadata file. If that were not the case, you can merge your sample metadata (see How to merge metadata).

[Command Line]

[Python API]

[Galaxy]

[R API]

[View Source]

wget -O 'sample-metadata.tsv' \
  'https://amplicon-docs.qiime2.org/en/2025.4/data/merge-runs/sample-metadata.tsv'

from qiime2 import Metadata
from urllib import request

url = 'https://amplicon-docs.qiime2.org/en/2025.4/data/merge-runs/sample-metadata.tsv'
fn = 'sample-metadata.tsv'
request.urlretrieve(url, fn)
sample_metadata_md = Metadata.load(fn)

library(reticulate)

Metadata <- import("qiime2")$Metadata
request <- import("urllib")$request

url <- 'https://amplicon-docs.qiime2.org/en/2025.4/data/merge-runs/sample-metadata.tsv'
fn <- 'sample-metadata.tsv'
request$urlretrieve(url, fn)
sample_metadata_md <- Metadata$load(fn)

sample-metadata.tsv | download

[Command Line]

[Python API]

[Galaxy]

[R API]

[View Source]

wget -O 'demux-nano1.qza' \
  'https://amplicon-docs.qiime2.org/en/2025.4/data/merge-runs/demux-nano1.qza'

Artifact <- import("qiime2")$Artifact

url <- 'https://amplicon-docs.qiime2.org/en/2025.4/data/merge-runs/demux-nano1.qza'
fn <- 'demux-nano1.qza'
request$urlretrieve(url, fn)
demux_nano1 <- Artifact$load(fn)

demux-nano1.qza | download | view

[Command Line]

[Python API]

[Galaxy]

[R API]

[View Source]

wget -O 'demux-nano2.qza' \
  'https://amplicon-docs.qiime2.org/en/2025.4/data/merge-runs/demux-nano2.qza'

demux-nano2.qza | download | view

Sequence quality control¶

We’ll begin by performing quality control on the demultiplexed sequence artifacts independently, by calling DADA2’s denoise-paired command on each set of demultiplexed sequences. The trim and truncation parameters should be chosen based on your data. You should use the same parameters for each run.

[Command Line]

[Python API]

[Galaxy]

[R API]

[View Source]

qiime dada2 denoise-paired \
  --i-demultiplexed-seqs demux-nano1.qza \
  --p-trim-left-f 0 \
  --p-trunc-len-f 250 \
  --p-trim-left-r 0 \
  --p-trunc-len-r 250 \
  --o-representative-sequences asv-seqs-nano1.qza \
  --o-table asv-table-nano1.qza \
  --o-denoising-stats stats-nano1.qza
qiime dada2 denoise-paired \
  --i-demultiplexed-seqs demux-nano2.qza \
  --p-trim-left-f 0 \
  --p-trunc-len-f 250 \
  --p-trim-left-r 0 \
  --p-trunc-len-r 250 \
  --o-representative-sequences asv-seqs-nano2.qza \
  --o-table asv-table-nano2.qza \
  --o-denoising-stats stats-nano2.qza

import qiime2.plugins.dada2.actions as dada2_actions

asv_table_nano1, asv_seqs_nano1, stats_nano1 = dada2_actions.denoise_paired(
    demultiplexed_seqs=demux_nano1,
    trim_left_f=0,
    trunc_len_f=250,
    trim_left_r=0,
    trunc_len_r=250,
)
asv_table_nano2, asv_seqs_nano2, stats_nano2 = dada2_actions.denoise_paired(
    demultiplexed_seqs=demux_nano2,
    trim_left_f=0,
    trunc_len_f=250,
    trim_left_r=0,
    trunc_len_r=250,
)

Using the qiime2 dada2 denoise-paired tool:

Set "demultiplexed_seqs" to #: demux-nano1.qza
Set "trunc_len_f" to 250
Set "trunc_len_r" to 250
Expand the additional options section
1. Leave "trim_left_f" as its default value of 0
2. Leave "trim_left_r" as its default value of 0
Press the Execute button.

Once completed, for each new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name	"Name" to set (be sure to press [Save])
`#: qiime2 dada2 denoise-paired [...] : table.qza`	`asv-table-nano1.qza`
`#: qiime2 dada2 denoise-paired [...] : representative_sequences.qza`	`asv-seqs-nano1.qza`
`#: qiime2 dada2 denoise-paired [...] : denoising_stats.qza`	`stats-nano1.qza`

Using the qiime2 dada2 denoise-paired tool:

Set "demultiplexed_seqs" to #: demux-nano2.qza
Set "trunc_len_f" to 250
Set "trunc_len_r" to 250
Expand the additional options section
1. Leave "trim_left_f" as its default value of 0
2. Leave "trim_left_r" as its default value of 0
Press the Execute button.

Once completed, for each new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name	"Name" to set (be sure to press [Save])
`#: qiime2 dada2 denoise-paired [...] : table.qza`	`asv-table-nano2.qza`
`#: qiime2 dada2 denoise-paired [...] : representative_sequences.qza`	`asv-seqs-nano2.qza`
`#: qiime2 dada2 denoise-paired [...] : denoising_stats.qza`	`stats-nano2.qza`

dada2_actions <- import("qiime2.plugins.dada2.actions")

action_results <- dada2_actions$denoise_paired(
    demultiplexed_seqs=demux_nano1,
    trim_left_f=0L,
    trunc_len_f=250L,
    trim_left_r=0L,
    trunc_len_r=250L,
)
asv_seqs_nano1 <- action_results$representative_sequences
asv_table_nano1 <- action_results$table
stats_nano1 <- action_results$denoising_stats
action_results <- dada2_actions$denoise_paired(
    demultiplexed_seqs=demux_nano2,
    trim_left_f=0L,
    trunc_len_f=250L,
    trim_left_r=0L,
    trunc_len_r=250L,
)
asv_seqs_nano2 <- action_results$representative_sequences
asv_table_nano2 <- action_results$table
stats_nano2 <- action_results$denoising_stats

asv_seqs_nano1, asv_table_nano1, stats_nano1 = use.action(
    use.UsageAction(plugin_id='dada2',
                    action_id='denoise_paired'),
    use.UsageInputs(demultiplexed_seqs=demux_nano1,
                    trim_left_f=0,
                    trunc_len_f=250,
                    trim_left_r=0,
                    trunc_len_r=250),
    use.UsageOutputNames(representative_sequences='asv_seqs_nano1',
                         table='asv_table_nano1',
                         denoising_stats='stats_nano1'))

asv_seqs_nano2, asv_table_nano2, stats_nano2 = use.action(
    use.UsageAction(plugin_id='dada2',
                    action_id='denoise_paired'),
    use.UsageInputs(demultiplexed_seqs=demux_nano2,
                    trim_left_f=0,
                    trunc_len_f=250,
                    trim_left_r=0,
                    trunc_len_r=250),
    use.UsageOutputNames(representative_sequences='asv_seqs_nano2',
                         table='asv_table_nano2',
                         denoising_stats='stats_nano2'))

asv-seqs-nano1.qza | download | view
asv-table-nano1.qza | download | view
stats-nano1.qza | download | view
asv-seqs-nano2.qza | download | view
asv-table-nano2.qza | download | view
stats-nano2.qza | download | view

Merging data¶

After quality control is complete, you’ll have two FeatureTable artifacts and two FeatureData[Sequence] artifacts. Merging at this stage simplifies downstream work. You can do this using the following two commands.

Merging `FeatureTable` artifacts¶

[Command Line]

[Python API]

[Galaxy]

[R API]

[View Source]

qiime feature-table merge \
  --i-tables asv-table-nano1.qza asv-table-nano2.qza \
  --o-merged-table asv-table.qza

Using the qiime2 feature-table merge tool:

For "tables", use ctrl-(or command)-click to select the following inputs:
1. #: asv-table-nano1.qza
2. #: asv-table-nano2.qza
Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name	"Name" to set (be sure to press [Save])
`#: qiime2 feature-table merge [...] : merged_table.qza`	`asv-table.qza`

feature_table_actions <- import("qiime2.plugins.feature_table.actions")

action_results <- feature_table_actions$merge(
    tables=list(asv_table_nano1, asv_table_nano2),
)
asv_table <- action_results$merged_table

asv_table, = use.action(
    use.UsageAction(plugin_id='feature_table',
                    action_id='merge'),
    use.UsageInputs(tables=[asv_table_nano1, asv_table_nano2]),
    use.UsageOutputNames(merged_table='asv_table'))

asv-table.qza | download | view

Merging `FeatureData[Sequence]` artifacts¶

[Command Line]

[Python API]

[Galaxy]

[R API]

[View Source]

qiime feature-table merge-seqs \
  --i-data asv-seqs-nano1.qza asv-seqs-nano2.qza \
  --o-merged-data asv-seqs.qza

Using the qiime2 feature-table merge-seqs tool:

For "data", use ctrl-(or command)-click to select the following inputs:
1. #: asv-seqs-nano1.qza
2. #: asv-seqs-nano2.qza
Press the Execute button.

Once completed, for the new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name	"Name" to set (be sure to press [Save])
`#: qiime2 feature-table merge-seqs [...] : merged_data.qza`	`asv-seqs.qza`

asv_seqs, = use.action(
    use.UsageAction(plugin_id='feature_table',
                    action_id='merge_seqs'),
    use.UsageInputs(data=[asv_seqs_nano1, asv_seqs_nano2]),
    use.UsageOutputNames(merged_data='asv_seqs'))

asv-seqs.qza | download | view

Downstream analysis¶

At this stage, you can continue on as if you had generated your data on a single sequencing run. For example, you can generate a summary of the merged feature table as follows.

[Command Line]

[Python API]

[Galaxy]

[R API]

[View Source]

qiime feature-table summarize-plus \
  --i-table asv-table.qza \
  --m-metadata-file sample-metadata.tsv \
  --o-summary asv-table.qzv \
  --o-sample-frequencies sample-frequencies.qza \
  --o-feature-frequencies asv-frequencies.qza

Using the qiime2 feature-table summarize-plus tool:

Set "table" to #: asv-table.qza
Expand the additional options section
- For "metadata":
  - Press the + Insert metadata button to set up the next steps.
    1. Leave as Metadata from TSV
    2. Set "Metadata Source" to sample-metadata.tsv
Press the Execute button.

Once completed, for each new entry in your history, use the Edit button to set the name as follows:

(Renaming is optional, but it will make any subsequent steps easier to complete.)

History Name	"Name" to set (be sure to press [Save])
`#: qiime2 feature-table summarize-plus [...] : feature_frequencies.qza`	`asv-frequencies.qza`
`#: qiime2 feature-table summarize-plus [...] : sample_frequencies.qza`	`sample-frequencies.qza`
`#: qiime2 feature-table summarize-plus [...] : summary.qzv`	`asv-table.qzv`

action_results <- feature_table_actions$summarize_plus(
    table=asv_table,
    metadata=sample_metadata_md,
)
asv_table_viz <- action_results$summary
sample_frequencies <- action_results$sample_frequencies
asv_frequencies <- action_results$feature_frequencies

use.action(
    use.UsageAction(plugin_id='feature_table',
                    action_id='summarize_plus'),
    use.UsageInputs(table=asv_table,
                    metadata=sample_metadata),
    use.UsageOutputNames(summary='asv_table',
                         sample_frequencies='sample_frequencies',
                         feature_frequencies='asv_frequencies'))

asv-table.qzv | download | view
sample-frequencies.qza | download | view
asv-frequencies.qza | download | view

References¶

Meilander, J., Herman, C., Manley, A., Augustine, G., Birdsell, D., Bolyen, E., Celona, K. R., Coffey, H., Cocking, J., Donoghue, T., Draves, A., Erickson, D., Foley, M., Gehret, L., Hagen, J., Hepp, C., Ingram, P., John, D., Kadar, K., … Caporaso, J. G. (2024). Upcycling Human Excrement: The Gut Microbiome to Soil Microbiome Axis. arXiv. 10.48550/ARXIV.2411.04148
Caporaso, J. G., & Meilander, J. (2024). Upcycling Human Excrement: The Gut Microbiome to Soil Microbiome Axis (supporting data). Zenodo. 10.5281/ZENODO.13887456
Callahan, B. J., McMurdie, P. J., Rosen, M. J., Han, A. W., Johnson, A. J. A., & Holmes, S. P. (2016). DADA2: high-resolution sample inference from Illumina amplicon data. Nature Methods, 13(7), 581. 10.1038/nmeth.3869

How To Guides

How to export data for use outside of QIIME 2

How To Guides

How to cluster sequences into OTUs

Overview of the process¶

Obtain sample metadata and two “demux artifacts”¶

Sequence quality control¶

Merging data¶

Merging FeatureTable artifacts¶

Merging FeatureData[Sequence] artifacts¶

Downstream analysis¶

Merging `FeatureTable` artifacts¶

Merging `FeatureData[Sequence]` artifacts¶