Skip to article frontmatterSkip to article content

Understanding QIIME 2 Archives, Artifacts, and Visualizations

All files generated by QIIME 2 are either .qza or .qzv files, and these are simply zip files that store your data alongside some QIIME 2-specific metadata. We refer to these files collectively as QIIME 2 Archives.

You can unzip .qza or .qzv files with any unzip utility, such as WinZip, 7Zip, or unzip, and you don’t need to have QIIME 2 installed to do that. Allowing users to access their data without QIIME 2 was one of the earliest design goals of the system. This ensures that if QIIME 2 isn’t available to you for some reason, you can still access any data that you generated with QIIME 2.

Accessing data in a QIIME 2 Archive without using QIIME 2

It’s easy to open a .qza or .qzv file to access the contents. Here’s an example of how you’d do with this unzip, which is built-in on macOS and some Linux distributions. First, download a .qza file (or just try this on one you have on your computer already).

curl -sL \
  "https://docs.qiime2.org/2020.11/data/tutorials/moving-pictures/rep-seqs.qza" > \
  "sequences.qza"

Then, run unzip:

unzip sequences.qza

Notice that we haven’t used any QIIME 2 commands so far. We downloaded a file, and then unzipped it as we would any zip file.

If you look through the list of files that were created by the unzip command above, you’ll see there is a top-level directory in the output with a crazy-looking name. This directory contains copies of all of the files and directories from sequences.qza. Within this crazy-named directory the data directory contains a single file, sequences.fasta, which contains (you guessed it!) sequence data in fasta format. If, for example, you’re interested in getting your sequence data out of QIIME 2 to analyze it with another program, you can unzip your .qza file, and use the sequences.fasta file for what ever you need to do with it.

Accessing data in a QIIME 2 Archive using QIIME 2

QIIME 2 also provides some of its own utilities for getting data out of .qza and .qzv files. If you’re working with the QIIME 2 command line interface, the most relevant command is qiime tools export. You could run this on the .qza file we downloaded above as follows:

qiime tools export --input-path sequences.qza --output-path exported-sequences/

This command will unzip the archive, and take all of the files from the data directory and place them in exported-sequences. Thus if you do have QIIME 2 installed, you can get your data out of a QIIME 2 archive without all of the QIIME 2-specific metadata using this command.

Why does QIIME 2 create .qza and .qzv files?

You might wonder why we bothered with having QIIME 2 create these zip files in the first place, rather than just have it use the typical file formats like fasta, newick, biom, and so on. That has to do with the other stored in the zip file. The other files in the zip file are not intended to be viewed by a human, and you don’t need any of them to work with the file (or files) in the data directory. QIIME 2 uses the information in the provenance directory to record data provenance, helping you to ensure that your bioinformatics work will be reproducible. You can learn more about QIIME 2’s data provenance tracking functionality in Facilitating bioinformatics reproducibility with QIIME 2 Provenance Replay.

QIIME 2 also stores a unique identifier (a UUID) for every archive it produces in the metadata.yaml file inside of the archive, which facilitates data management. If at some point you begin generating a lot of QIIME 2 results, and want to store them in a database or otherwise uniquely identify them, these identifiers can be used for that purpose. An archive’s unique identifier is also the name of the directory that contains all of the files from the .qza when you unzip it. The metadata.yaml file also identifies the Artifact Class of its data.

What’s the difference between .qza files and .qzv files?

The .qza file extension is an abbreviation for QIIME Zipped Artifact, and the .qzv file extension is an abbreviation for QIIME Zipped Visualization. .qza files (which are often simply referred to as artifacts) are intermediary files in a QIIME 2 analysis, usually containing raw data of some sort. These files are generated by QIIME 2 and are intended to be consumed by QIIME 2. .qzv files (which are often simply referred to as visualizations) are terminal results in a QIIME 2 analysis, such as an interactive figure or the results of a statistical test. These files are generated by QIIME 2 and are intended to be consumed by a human.

References
  1. Keefe, C. R., Dillon, M. R., Gehret, E., Herman, C., Jewell, M., Wood, C. V., Bolyen, E., & Caporaso, J. G. (2023). Facilitating bioinformatics reproducibility with QIIME 2 Provenance Replay. PLOS Computational Biology, 19(11), e1011676. 10.1371/journal.pcbi.1011676