Configuration

Tourmaline 2 uses three config files, one per step. Example names below reflect defaults; any filename is acceptable. Templates for each file are in the Tourmaline folder.

1. QA/QC configuration (Template: config_01_qaqc.yaml)

The qaqc config defines how Tourmaline imports, trims, and summarizes raw reads.

Core run settings (required)

run_name: my_run                 # unique identifier; used in output folders
output_dir: /path/to/results     # absolute or relative directory for outputs
paired_end: true                 # true for paired-end, false for single-end data
to_trim: true                    # enable primer trimming with Cutadapt
to_merge: false                  # enable vsearch merge (paired-end only)
to_filter: false                # enable feature filtering steps
assay_name: Bacteria-16S-V4V5-Parada  # for metadata reporting

Select assay name based on the NOAA Omics metabarcoding assays controlled vocabulary. If your assay is not available, please create an issue.

Input data sources (choose one)

raw_fastq_path: /abs/path/to/raw_fastqs       # directory of raw FASTQ(.gz)
trimmed_fastq_path: /abs/path/to/trimmed      # directory of already trimmed FASTQs
sample_manifest_file: 00-data/manifest.tsv    # QIIME 2 manifest (TSV or CSV)
preexisting_fastq_qza: 00-data/demux.qza      # existing demuxed artifact

Provide only the fields relevant to your data source; leave others blank. Manifest formats are detailed in steps/qaqc.md.

Primer trimming parameters (required when to_trim: true)

fwd_primer: GTGYCAGCMGCCGCGGTAA      # IUPAC supported
rev_primer: GGACTACNVGGGTWTCTAAT
discard_untrimmed: false             # discard reads without primer match
minimum_length: 50                   # post-trimming minimum length (bp)
trimming_threads: 5                  # threads for Cutadapt

Merging + compute options (required when to_merge: true)

maxdiffs: 20                         # vsearch merge mismatches
merge_stagger: --p-allowmergestagger # optional vsearch flag

2. Repseqs configuration (Template: config_02_repseqs.yaml)

Controls ASV generation, filtering, and optional diversity plots.

Core run settings (required)

run_name: my_run
output_dir: /path/to/results
asv_method: dada2pe          # one of: dada2pe | dada2se | deblur
asv_threads: 5               # threads passed to denoisers

Input data sources (choose one, otherwise will default to [output_dir]/[my_run-qaqc])

qaqc_run_name: my_qaqc_run         # reuse QA/QC outputs from another run
fastq_qza_file: /abs/path/demux.qza  # external demultiplexed sequences

If neither is supplied, the workflow expects demultiplexed reads from the QA/QC step with the same run_name.

Metadata + diversity options

sample_metadata_file: 00-data/metadata.tsv   # optional metadata for summaries/diversity
plot_diversity: true                         # produce alpha/core metrics outputs
alpha_max_depth: 500                         # required when plot_diversity is true
core_sampling_depth: 500                     # required when plot_diversity is true

DADA2 parameters (required when asv_method starts with dada2)

dada2_trunc_len_f: 245        # forward truncation length
dada2pe_trunc_len_r: 190      # reverse truncation (paired-end only)
dada2_trim_left_f: 0          # forward trim from left
dada2pe_trim_left_r: 0        # reverse trim from left
dada2_max_ee_f: 2             # forward max expected errors
dada2pe_max_ee_r: 2           # reverse max expected errors
dada2_trunc_q: 2              # truncate at quality score
dada2_pooling_method: pseudo  # independent | pseudo | pooled
dada2_chimera_method: consensus
dada2_min_fold_parent_over_abundance: 1
dada2_n_reads_learn: 1000000
dada2_hashed_feature_ids: --p-hashed-feature-ids  # optional

Deblur parameters (required when asv_method: deblur)

deblur_trim_length: 150       # final sequence length (bp)
deblur_trim_left: 0
deblur_mean_error: 0.005
deblur_min_reads: 2
deblur_min_size: 2
deblur_indel_max: 3
reference_seqs: 00-data/ref.qza  # required reference set

Post-denoising filtering (required if to_filter is True)

repseq_min_length: 0
repseq_max_length: 0
repseq_min_abundance: 0
repseq_min_prevalence: 0
repseq_min_frequency: 0
repseq_min_samples: 0

3. Taxonomy configuration (Template: config_03_taxonomy.yaml)

Defines how representative sequences are assigned taxonomy and summarized.

Core run settings (required)

run_name: my_run
output_dir: /path/to/results
classify_method: naive-bayes      # options: naive-bayes | consensus-blast | consensus-vsearch | bt2-blca
taxa_ranks: kingdom,phylum,class,order,family,genus,species
collapse_taxalevel: 7             # taxonomy level for collapsed table
classify_threads: 10

Input data sources (choose one)

repseqs_run_name: my_repseqs_run          # reuse outputs from another run
repseqs_qza_file: /abs/path/repseqs.qza   # external representative sequences
table_qza_file: /abs/path/table.qza       # external feature table

If no external inputs are supplied, the workflow uses artifacts produced by the Repseqs step with matching run_name.

Reference database parameters

database_name: silva-138_1
refseqs_file: 00-data/silva-seqs.qza    # required unless using pretrained classifier
taxa_file: 00-data/silva-tax.qza        # required unless using pretrained classifier
sample_metadata_file: 00-data/metadata.tsv  # optional for barplots

Naive Bayes options

pretrained_classifier: /abs/path/classifier.qza  # optional, overrides refseqs/taxa files
skl_confidence: 0.7                              # confidence threshold

Consensus BLAST/VSEARCH options

perc_identity: 0.8
query_cov: 0.8
min_consensus: 0.51

BT2-BLCA options

bowtie_database: /abs/path/bowtie2_index/   # optional; auto-built if omitted
confidence_thres: 0.8

Additional classifier flags

classify_params: --verbose   # appended to the chosen classifier command

See Running for multi-step invocation and External Data for conversions and artifact preparation tips.