Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Getting started with QIIME 2

This chapter will briefly introduce a few concepts that should help you learn QIIME 2 quickly.

What is QIIME 2?

QIIME 2 is a microbiome marker gene (i.e., amplicon) analysis tool. It provides a suite of analytic tools that support microbiome marker gene analysis from raw sequencing data through publication quality visualizations and statistics. Any amplicon is supported - not just the 16S rRNA gene.

There is not a single QIIME 2 workflow or command - rather it is a series of steps, and you choose which ones to apply[1]. We provide general guidance through tutorials, like the Gut-to-soil axis tutorial 💩🌱, and can provide more specific guidance on the QIIME 2 Forum.

The tools that come with QIIME 2 are listed in Available plugins. Plugins that provide additional analytic functionality can be created, distributed, and installed independently[2]. Your main source for discovery and installation instructions for plugins not included in QIIME 2 is the QIIME 2 Library; for this reason we often refer to these as Library plugins.

Important concepts

The following sections briefly present some important concepts for understanding QIIME 2 tools. You don’t need to fully understand these to start using QIIME 2, but we think it will help you learn and build your bioinformatics skills if you have some brief exposure to these ideas. Links to where you can learn more are provided.

Interfaces

All of QIIME 2’s analytic functionality can be accessed through multiple different interfaces, and you can choose to work with the one (or more) of these that you think you’ll be most efficient with (Figure 1). For example, domain scientists without advanced computing backgrounds can using QIIME 2 through graphical interfaces such as Galaxy. Power users can work with QIIME 2 on the command line, enabling straight-forward access on high-performance compute clusters and cloud resources. Research software engineers and data scientists can use QIIME 2 through its Python 3 API, facilitating development of automated workflows and integration of QIIME 2 tools as a component in other systems. The ability to use the same analysis tools through different interfaces is a big part of what makes QIIME 2 accessible.

Screenshots of different types of interfaces that can be used to interact with QIIME 2.

Figure 1:Examples of four QIIME 2 interfaces: QIIME 2 View, Galaxy, q2cli, and the Python 3 API.

You’re also free to use different interfaces for different steps - QIIME 2 won’t care. For example, a fairly common workflow is to use the command line (q2cli) for long-running jobs on a high-performance computing system, and then download the results and work with them in a Jupyter Notebook using the Python 3 API for the more exploratory iterative steps of an analysis.

Different interface options in tutorials

When you’re working with QIIME 2 tutorials, we’ll generally provide instructions that enable you to work in different interfaces[4]. This will look like the following, and you can choose to follow the instructions for the interface that you’re currently working with.

[Command Line]
[Python API]
[R API]
[View Source]
wget -O 'sample-metadata.tsv' \
  'https://amplicon-docs.qiime2.org/en/2026.4/data/getting-started/sample-metadata.tsv'

qiime metadata tabulate \
  --m-input-file sample-metadata.tsv \
  --o-visualization sample-metadata-viz.qzv

Artifacts and visualizations

One of the first things that new QIIME 2 users often notice is the .qza and .qzv files that QIIME 2 uses. All files generated by QIIME 2 are either .qza or .qzv files, and these are simply zip files that store your data alongside some QIIME 2-specific metadata. Here is what you need to know about these:

Data provenance

QIIME 2 was designed to automatically document analysis workflows for users, ensuring that their bioinformatics work is reproducible. This allows you, or consumers of your research, to discover exactly how any QIIME 2 result (i.e., Artifact or Visualization) was produced.

To achieve this, each QIIME 2 command is recorded when it is run, and that information is stored in all Artifacts and Visualizations that are created[5]. This means that unless you remove files from a .qza or .qzv file, your result’s data provenance is always stored alongside the data. Whenever you or someone else needs that information, it’s there.

In addition to supporting reproducible bioinformatics, data provenance helps others provide you with technical support. If you’re running into an error or an odd result and request help, someone may ask you to share an Artifact or Visualization so they can view your data provenance. This will let them review in exact detail what you did to generate the result, and we’ve found that this makes the process of providing technical support much more efficient.

You can view your data provenance using QIIME 2 View (click the Provenance tab after loading your file), or by using Provenance Replay.

Plugins and actions

All of the analysis functionality in QIIME 2 comes in the form of plugins.

Plugins define actions, which are the individual commands that you’ll run in an analysis workflow. For example, the q2-feature-table plugin defines the actions listed here. If you did’t have the q2-feature-table plugin installed, you wouldn’t have access to those actions.

Three types of actions can be defined by plugins: methods, visualizers, and pipelines. Methods are usually thought of intermediary analysis steps: they take artifacts and/or metadata as input, and they generate one or more artifacts as output. Visualizers represent terminal steps in an analysis: they take artifacts and/or metadata as input, and they generate one or more visualizations as output. Pipelines combine calls to other actions in a single action, and are often used to make repetitive sub-workflows easier to run. They take artifacts and/or metadata as input, and they generate one or more artifacts and/or visualizations as output.

Another thing to know about plugins is that anyone can create and distribute them. This is what makes QIIME 2 extensible. For example, if a student develops new analysis functionality that they want to use with QIIME 2, they can create their own plugin[2]. If they want others to be able to use it, they can distribute that plugin on the QIIME 2 Library, or through any other means that they choose.

Artifact classes

All QIIME 2 artifacts are assigned exactly one artifact class, which indicates the semantics of the data (its semantic type) and the file format that is used to store it inside of the .qza file. When you see artifacts (or inputs or outputs to an action) described with terms that look like Phyogeny[Rooted] or Phylogeny[Unrooted], that is the Artifact Class.

Artifact classes were developed to help users avoid misusing actions, and to help them discover new methods. For example, if an action should only be applied to a rooted phylogenetic tree, the developer of that action should annotate its input as Phylogeny[Rooted]. This will ensure that if a user mistakenly tries to provide an unrooted phylogenetic tree, QIIME 2 can error to help the user avoid making a mistake that might waste time or create a misleading result. If another action can take a rooted or an unrooted phylogenetic tree, that input would be annotated as Phylogeny[Rooted | Unrooted].

In graphical QIIME 2 interfaces, it’s possible to view the available actions based on what Artifact Class(es) they accept as input. This can allow a user to query the system with questions like “What actions are available to apply to a rooted phylogenetic tree?”, or “What actions are available to create a rooted phylogenetic tree from an unrooted phylogenetic tree?”.

If you’d like to learn more about Artifact Classes, see Semantic types, data types, file formats, and artifact classes in Developing with QIIME 2.

Next steps

Ok, that’s enough discussion about QIIME 2 for now: it’s time to start using it. Don’t worry if you feel like you don’t fully understand some of the details that were covered in this chapter right now. The goal of this chapter was to introduce these ideas, and they’ll be revisited throughout the documentation.

Where to install QIIME 2

You may now be wondering where you’ll install QIIME 2. QIIME 2 can be deployed on your personal computer (e.g., your laptop or desktop computer), a cluster computer such as one owned and maintained by your university or company, or on cloud computing resources. In How to deploy QIIME 2 these options for deploying QIIME 2 are described.

Installing QIIME 2

To install the amplicon distribution of QIIME 2, refer to the instructions on the QIIME 2 Library.

I recommend having a working deployment of QIIME 2 when you’re ready to start working through tutorials, so you can follow along on your own.

Learning with the tutorials

After you have a working deployment of QIIME 2, you can read and work through the Gut-to-soil axis tutorial 💩🌱. This is the resource that most new users start with to learn. In this tutorial, you’ll carry out a full microbiome analysis, from raw sequence data through visualizations and statistics. This is a fairly typical amplicon analysis workflow, so after you understand it you can adapt it for your own analysis.

If you’d like to get more of a feel for what QIIME 2 can do before you invest in installing it, we also recommend the Gut-to-soil axis tutorial 💩🌱. That document has all of the results pre-generated and linked from the document, so as you read you can interact with the results that would be generated by each step.

Getting help

The QIIME 2 Forum is where you can get free technical support and connect with other microbiome researchers. We look forward to seeing you there!

Footnotes
  1. It’s like a Choose Your Own Adventure novel. 🏔️

  2. If you become interested in building and distributing your own plugins, for marker gene or any other type of analysis, you can refer to our developer manual, Developing with QIIME 2.

  3. For clarity, this will be renamed to something like Using rachis.

  4. Transitioning all of our tutorials to provide instructions for different interfaces is a work in progress (as of 26 February 2025). In the meantime, some may include only command line instructions.

  5. Data provenance is some of the metadata that is stored alongside your data in .qza and .qzv files. Retaining provenance information without a centralized database is one of the reasons why QIIME 2 produces .qza and .qzv files, as opposed to just outputting data on its own (e.g., in .fasta or .biom files).

References
  1. Caporaso, J. G., Kuczynski, J., Stombaugh, J., Bittinger, K., Bushman, F. D., Costello, E. K., Fierer, N., Peña, A. G., Goodrich, J. K., Gordon, J. I., Huttley, G. A., Kelley, S. T., Knights, D., Koenig, J. E., Ley, R. E., Lozupone, C. A., McDonald, D., Muegge, B. D., Pirrung, M., … Knight, R. (2010). QIIME allows analysis of high-throughput community sequencing data. Nature Methods, 7(5), 335–336. 10.1038/nmeth.f.303