All files generated by QIIME 2 are either .qza
or .qzv
files, and these are simply zip files that store your data alongside some QIIME 2-specific metadata.
We refer to these files collectively as QIIME 2 Archives.
You can unzip .qza
or .qzv
files with any unzip utility, such as WinZip, 7Zip, or unzip, and you don’t need to have QIIME 2 installed to do that.
Allowing users to access their data without QIIME 2 was one of the earliest design goals of the system.
This ensures that if QIIME 2 isn’t available to you for some reason, you can still access any data that you generated with QIIME 2.
Accessing data in a QIIME 2 Archive without using QIIME 2¶
It’s easy to open a .qza
or .qzv
file to access the contents.
Here’s an example of how you’d do with this unzip
, which is built-in on macOS and some Linux distributions.
First, download a .qza
file (or just try this on one you have on your computer already).
curl -sL \
"https://docs.qiime2.org/2020.11/data/tutorials/moving-pictures/rep-seqs.qza" > \
"sequences.qza"
Then, run unzip
:
unzip sequences.qza
Notice that we haven’t used any QIIME 2 commands so far. We downloaded a file, and then unzipped it as we would any zip file.
If you look through the list of files that were created by the unzip command above, you’ll see there is a top-level directory in the output with a crazy-looking name.
This directory contains copies of all of the files and directories from sequences.qza
.
Within this crazy-named directory the data
directory contains a single file, sequences.fasta
, which contains (you guessed it!) sequence data in fasta format.
If, for example, you’re interested in getting your sequence data out of QIIME 2 to analyze it with another program, you can unzip your .qza
file, and use the sequences.fasta
file for what ever you need to do with it.
Accessing data in a QIIME 2 Archive using QIIME 2¶
QIIME 2 also provides some of its own utilities for getting data out of .qza
and .qzv
files.
If you’re working with the QIIME 2 command line interface, the most relevant command is qiime tools export
.
You could run this on the .qza
file we downloaded above as follows:
qiime tools export --input-path sequences.qza --output-path exported-sequences/
This command will unzip the archive, and take all of the files from the data
directory and place them in exported-sequences
.
Thus if you do have QIIME 2 installed, you can get your data out of a QIIME 2 archive without all of the QIIME 2-specific metadata using this command.
Why does QIIME 2 create .qza
and .qzv
files?¶
You might wonder why we bothered with having QIIME 2 create these zip files in the first place, rather than just have it use the typical file formats like fasta, newick, biom, and so on.
That has to do with the other stored in the zip file.
The other files in the zip file are not intended to be viewed by a human, and you don’t need any of them to work with the file (or files) in the data
directory.
QIIME 2 uses the information in the provenance
directory to record data provenance, helping you to ensure that your bioinformatics work will be reproducible.
You can learn more about QIIME 2’s data provenance tracking functionality in Facilitating bioinformatics reproducibility with QIIME 2 Provenance Replay.
QIIME 2 also stores a unique identifier (a UUID) for every archive it produces in the metadata.yaml
file inside of the archive, which facilitates data management.
If at some point you begin generating a lot of QIIME 2 results, and want to store them in a database or otherwise uniquely identify them, these identifiers can be used for that purpose.
An archive’s unique identifier is also the name of the directory that contains all of the files from the .qza
when you unzip it.
The metadata.yaml
file also identifies the Artifact Class of its data.
What’s the difference between .qza
files and .qzv
files?¶
The .qza
file extension is an abbreviation for QIIME Zipped Artifact, and the .qzv
file extension is an abbreviation for QIIME Zipped Visualization.
.qza
files (which are often simply referred to as artifacts) are intermediary files in a QIIME 2 analysis, usually containing raw data of some sort.
These files are generated by QIIME 2 and are intended to be consumed by QIIME 2.
.qzv
files (which are often simply referred to as visualizations) are terminal results in a QIIME 2 analysis, such as an interactive figure or the results of a statistical test.
These files are generated by QIIME 2 and are intended to be consumed by a human.
- Keefe, C. R., Dillon, M. R., Gehret, E., Herman, C., Jewell, M., Wood, C. V., Bolyen, E., & Caporaso, J. G. (2023). Facilitating bioinformatics reproducibility with QIIME 2 Provenance Replay. PLOS Computational Biology, 19(11), e1011676. 10.1371/journal.pcbi.1011676