This chapter will briefly introduce a few concepts that should help you learn QIIME 2 quickly.
What is QIIME 2?¶
To date, most people think of QIIME 2 as a microbiome marker gene (i.e., amplicon) analysis tool. That is where the project started, and what its predecessor QIIME 1 was. QIIME 2 began as a complete rewrite of QIIME 1, where we were attempting to address common feature requests from our users and reduce challenges that we saw our users encountering. This resulted in our developing unique functionality including our data provenance tracking system, a decentralized plugin-based ecosystem of tools, and the ability to use those through interfaces designed to support users with different computational backgrounds (Figure 1). Much of this functionality is not unique to microbiome marker gene analysis, but rather general to biological data science, and as result the scope of QIIME 2 is now broader than when we started. So, what is QIIME 2?
What most people think of as “QIIME 2” is what we refer to in our documentation as the amplicon distribution of QIIME 2, or simply the amplicon distribution. This is the microbiome marker gene analysis toolkit. This documentation site is for the amplicon distribution specifically, and we’re going to get to that very shortly.
The amplicon distribution is built on what we call the QIIME 2 Framework (or the framework). The framework is where the general-purpose functionality exists, including data provenance tracking, the plugin manager, and more. As an end user, you don’t really need to know anything about this, but it’s helpful to know that it exists and is different from the amplicon distribution to understand the ecosystem of tools. The amplicon distribution, and other tools such as MOSHPIT (formerly referred to as the metagenome distribution) and genome-sampler, are technically built on top of the QIIME 2 Framework.
The amplicon distribution of QIIME 2 includes a suite of plugins that provide broad analytic functionality that supports microbiome marker gene analysis from raw sequencing data through publication quality visualizations and statistics. There is not a single QIIME 2 workflow or command - rather it is a series of steps, and you choose which ones to apply. We provide general guidance through tutorials, like the Moving Pictures tutorial 🎥, and can provide more specific guidance on the QIIME 2 Forum. Any amplicon is supported - not just the 16S rRNA gene. The plugins that come with the amplicon distribution are listed in Available plugins. Other plugins can also be installed independently - your main source for discovery and installation instructions for these is the QIIME 2 Library[1].
Important concepts¶
The following sections briefly present some important concepts for understanding QIIME 2 tools. You don’t need to fully understand these to start using QIIME 2, but we think it will help you learn and build your bioinformatics skills if you have some brief exposure to these ideas. Links to where you can learn more are provided[2].
Interfaces¶
All of QIIME 2’s analytic functionality can be accessed through multiple different interfaces, and you can choose to work with the one (or more) of these that you think you’ll be most efficient with (Figure 1). For example, domain scientists without advanced computing backgrounds can using QIIME 2 through graphical interfaces such as Galaxy. Power users can work with QIIME 2 on the command line, enabling straight-forward access on high-performance compute clusters and cloud resources. Research software engineers and data scientists can use QIIME 2 through its Python 3 API, facilitating development of automated workflows and integration of QIIME 2 tools as a component in other systems. The ability to use the same analysis tools through different interfaces is a big part of what makes QIIME 2 accessible.

Figure 1:Examples of four QIIME 2 interfaces: QIIME 2 View, Galaxy, q2cli, and the Python 3 API.
You’re also free to use different interfaces for different steps - QIIME 2 won’t care. For example, a fairly common workflow is to use the command line (q2cli) for long-running jobs on a high-performance computing system, and then download the results and work with them in a Jupyter Notebook using the Python 3 API for the more exploratory iterative steps of an analysis.
Different interface options in tutorials¶
When you’re working with QIIME 2 tutorials, we’ll generally provide instructions that enable you to work in different interfaces[3]. This will look like the following, and you can choose to follow the instructions for the interface that you’re currently working with.
wget -O 'sample-metadata.tsv' \
'https://amplicon-docs.qiime2.org/en/latest/data/getting-started/sample-metadata.tsv'
qiime metadata tabulate \
--m-input-file sample-metadata.tsv \
--o-visualization sample-metadata-viz.qzv
from qiime2 import Metadata
from urllib import request
import qiime2.plugins.metadata.actions as metadata_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/getting-started/sample-metadata.tsv'
fn = 'sample-metadata.tsv'
request.urlretrieve(url, fn)
sample_metadata_md = Metadata.load(fn)
sample_metadata_viz_viz, = metadata_actions.tabulate(
input=sample_metadata_md,
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
sample-metadata.tsv
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /getting -started /sample -metadata .tsv - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 metadata tabulate
tool: - For "input":
- Perform the following steps.
- Leave as
Metadata from TSV
- Set "Metadata Source" to
sample-metadata.tsv
- Leave as
- Perform the following steps.
- Press the
Execute
button.
- For "input":
- Once completed, for the new entry in your history, use the
Edit
button to set the name as follows: - (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name "Name" to set (be sure to press [Save]) #: qiime2 metadata tabulate [...] : visualization.qzv
sample-metadata-viz.qzv
library(reticulate)
Metadata <- import("qiime2")$Metadata
metadata_actions <- import("qiime2.plugins.metadata.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/getting-started/sample-metadata.tsv'
fn <- 'sample-metadata.tsv'
request$urlretrieve(url, fn)
sample_metadata_md <- Metadata$load(fn)
action_results <- metadata_actions$tabulate(
input=sample_metadata_md,
)
sample_metadata_viz_viz <- action_results$visualization
sample_metadata = use.init_metadata_from_url(
'sample-metadata',
'https://data.qiime2.org/2025.4/tutorials/moving-pictures/sample_metadata.tsv')
sample_metadata_viz, = use.action(
use.UsageAction(plugin_id='metadata',
action_id='tabulate'),
use.UsageInputs(input=sample_metadata),
use.UsageOutputNames(visualization='sample_metadata_viz')
)
Artifacts and visualizations¶
One of the first things that new QIIME 2 users often notice is the .qza
and .qzv
files that QIIME 2 uses.
All files generated by QIIME 2 are either .qza
or .qzv
files, and these are simply zip
files that store your data alongside some QIIME 2-specific metadata.
Here is what you need to know about these:
.qza
files store QIIME 2 Artifacts on disk. QIIME 2 Artifacts represent data that are generated by QIIME 2 and intended to be used by QIIME 2, such as intermediary files in an analysis workflow.qza
stands for QIIME Zipped Artifact..qzv
files store QIIME 2 Visualizations on disk. QIIME 2 Visualizations represent data that are generated by QIIME 2 and intended to be viewed by humans, such as an interactive visualization.qzv
stands for QIIME Zipped Visualization..qza
and.qzv
files can be loaded with QIIME 2 View[4].- Because
.qza
and.qzv
files are simplezip
files, you can open them with any unzip utility, such as WinZip, 7Zip, orunzip
. You don’t need to have QIIME 2 installed to access the information in these files. For an example of how to get data out of these files usingunzip
, see Understanding QIIME 2 Archives, Artifacts, and Visualizations.
Confused by the term “artifact”?
The term “artifact” has multiple different meanings, so our usage is sometimes confusing for new QIIME 2 users. In the context discussed here, “artifact” means an object made or shaped by some agent or intelligence. This is common usage in data science and software engineering. In biology, it is also used to mean a finding or structure in an experiment or investigation that is not a true feature of the object under observation, but is a result of external action, the test arrangement, or an experimental error.
The definitions quoted here were obtained from Wiktionary (accessed 26 Feb 2025), and are used in accordance with the CC BY-SA 4.0 license.
Data provenance¶
QIIME 2 was designed to automatically document analysis workflows for users, ensuring that their bioinforamtics work is reproducible. This allows you, or consumers of your research, to discover exactly how any QIIME 2 result (i.e., Artifact or Visualization) was produced.
To achieve this, each QIIME 2 command is recorded when it is run, and that information is stored in all Artifacts and Visualizations that are created[5].
This means that unless you remove files from a .qza
or .qzv
file, your result’s data provenance is always stored alongside the data.
Whenever you or someone else needs that information, it’s there.
In addition to supporting reproducible bioinformatics, data provenance helps others provide you with technical support. If you’re running into an error or an odd result and request help, someone may ask you to share an Artifact or Visualization so they can view your data provenance. This will let them review in exact detail what you did to generate the result, and we’ve found that this makes the process of providing technical support much more efficient.
You can view your data provenance using QIIME 2 View (click the Provenance tab after loading your file), or by using Provenance Replay.
Plugins and actions¶
There is no microbiome-specific functionality (or even bioinformatics-specific functionality) in the QIIME 2 Framework. Rather all of the analysis functionality comes in the form of plugins to the framework.
Plugins define actions, which are the individual commands that you’ll run in an analysis workflow.
For example, the q2-feature-table
plugin, which is included in the amplicon distribution, defines the actions listed here.
If you did’t have the q2-feature-table
plugin installed, you wouldn’t have access to those actions.
Three types of actions can be defined by plugins: methods, visualizers, and pipelines. Methods are usually thought of intermediary analysis steps: they take artifacts and/or metadata as input, and they generate one or more artifacts as output. Visualizers represent terminal steps in an analysis: they take artifacts and/or metadata as input, and they generate one or more visualizations as output. Pipelines combine calls to other actions in a single action, and are often used to make repetitive sub-workflows easier to run. They take artifacts and/or metadata as input, and they generate one or more artifacts and/or visualizations as output.
Another thing to know about plugins is that anyone can create and distribute them. This is what makes QIIME 2 extensible. For example, if a student develops new analysis functionality that they want to use with QIIME 2, they can create their own QIIME 2 plugin[1]. If they want others to be able to use it, they can distribute that plugin on the QIIME 2 Library, or through any other means that they choose.
Artifact classes¶
All QIIME 2 artifacts are assigned exactly one artifact class, which indicates the semantics of the data (its semantic type) and the file format that is used to store it inside of the .qza
file.
When you see artifacts (or inputs or outputs to an action) described with terms that look like Phyogeny[Rooted]
or Phylogeny[Unrooted]
, that is the Artifact Class.
Artifact classes were developed to help users avoid misusing actions, and to help them discover new methods.
For example, if an action should only be applied to a rooted phylogenetic tree, the developer of that action should annotate its input as Phylogeny[Rooted]
.
This will ensure that if a user mistakenly tries to provide an unrooted phylogenetic tree, QIIME 2 can error to help the user avoid making a mistake that might waste time or create a misleading result.
If another action can take a rooted or an unrooted phylogenetic tree, that input would be annotated as Phylogeny[Rooted | Unrooted]
.
In graphical QIIME 2 interfaces, it’s possible to view the available actions based on what Artifact Class(es) they accept as input. This can allow a user to query the system with questions like “What actions are available to apply to a rooted phylogenetic tree?”, or “What actions are available to create a rooted phylogenetic tree from an unrooted phylogenetic tree?”.
If you’d like to learn more about Artifact Classes, see Semantic types, data types, file formats, and artifact classes in Developing with QIIME 2.
Next steps¶
Ok, that’s enough discussion about QIIME 2 for now: it’s time to start using it. Don’t worry if you feel like you don’t fully understand some of the details that were covered in this chapter right now. The goal of this chapter was to introduce these ideas, and they’ll be revisited throughout the documentation.
Deploying QIIME 2¶
You may now be wondering where and how you’ll deploy QIIME 2. QIIME 2 can be deployed on your personal computer (e.g., your laptop or desktop computer), a cluster computer such as one owned and maintained by your university or company, or on cloud computing resources. In How to deploy QIIME 2 these options for deploying QIIME 2 are described, and relevant references to the installation instructions are referenced. I recommend having a working deployment of QIIME 2 when you’re ready to start working through tutorials, so you can follow along on your own.
Learning with the tutorials¶
After you have a working deployment of QIIME 2, you can read and work through the Moving Pictures tutorial 🎥. This is the resource that most new users start with to learn. In this tutorial, you’ll carry out a full microbiome analysis, from raw sequence data through visualizations and statistics. This is a fairly typical amplicon analysis workflow, so after you understand it you can adapt it for your own analysis.
If you’d like to get more of a feel for what QIIME 2 can do before you invest in installing it, we also recommend the Moving Pictures tutorial 🎥. That document has all of the results pre-generated and linked from the document, so as you read you can interact with the results that would be generated by each step.
Getting help¶
The QIIME 2 Forum is where you can get free technical support and connect with other microbiome researchers. We look forward to seeing you there!
If you become interested in building and distributing your own QIIME 2 plugins, for marker gene or any other type of analysis, you can refer to our developer manual, Developing with QIIME 2.
When you’re ready to learn more about how the QIIME 2 Framework works, and how you can leverage it to become a QIIME 2 power user, you can refer to our book on that topic, Using QIIME 2. Using QIIME 2 provides information that is relevant across all QIIME 2 distributions and plugins, not just the amplicon distribution.
Transitioning all of our tutorials to provide instructions for different interfaces is a work in progress (as of 26 February 2025). In the meantime, some may include only command line instructions.
Data provenance is some of the metadata that is stored alongside your data in
.qza
and.qzv
files. Retaining provenance information without a centralized database is one of the reasons why QIIME 2 produces.qza
and.qzv
files, as opposed to just outputting data on its own (e.g., in.fasta
or.biom
files).
- Caporaso, J. G., Kuczynski, J., Stombaugh, J., Bittinger, K., Bushman, F. D., Costello, E. K., Fierer, N., Peña, A. G., Goodrich, J. K., Gordon, J. I., Huttley, G. A., Kelley, S. T., Knights, D., Koenig, J. E., Ley, R. E., Lozupone, C. A., McDonald, D., Muegge, B. D., Pirrung, M., … Knight, R. (2010). QIIME allows analysis of high-throughput community sequencing data. Nature Methods, 7(5), 335–336. 10.1038/nmeth.f.303