Logo
Bioinformatics Tools

Bracken

Date Published

Links

Bracken is a companion tool to the Kraken family (Kraken, Kraken2, KrakenUniq) designed to turn Kraken classification reports into accurate abundance estimates at a chosen taxonomic level (for example, species). Kraken assigns reads to nodes in the taxonomic tree based on k-mer matches, but many reads map ambiguously to higher taxonomic ranks. Bracken uses the expected k-mer distribution of genomes in a Kraken database and the Kraken report for a sample to re-distribute reads probabilistically down the taxonomy and estimate how many reads belong to each taxon at a single level. The result is a corrected abundance estimate per taxon plus an optional recreated kraken-style report whose counts and percentages reflect the Bracken-adjusted numbers. The Bracken workflow is straightforward and intentionally matches the Kraken ecosystem. First, a Kraken-compatible database must be built or available; the database directory must contain the library fasta files and Kraken database files. Next, Bracken requires a database-specific k-mer distribution file (databaseXmers.kmer_distrib) that it can generate with bracken-build (or with the multi-step scripts if you prefer the granular approach). Typical bracken-build usage is: bracken-build -d ${KRAKEN_DB} -t ${THREADS} -k ${KMER_LEN} -l ${READ_LEN}. Key parameters are the k-mer length used to build the Kraken DB (default Kraken1/KrakenUniq = 31, Kraken2 = 35) and the read length of your sequencing data (e.g., 100). Bracken-build is multi-threaded and the developers recommend running it with 10–20 threads; later releases significantly improved build performance (v2.5 reported ~30x faster builds). After the k-mer distribution is prepared, run Kraken/Kraken2/KrakenUniq to produce a kraken report (Kraken2 and KrakenUniq produce --report directly; Kraken 1 requires kraken-report as a second step). Abundance estimation is then performed with the bracken command or the Python estimator (estimate_abundance.py / est_abundance.py). A typical command is: bracken -d ${KRAKEN_DB} -i ${SAMPLE}.kreport -o ${SAMPLE}.bracken -r ${READ_LEN} -l ${LEVEL} -t ${THRESHOLD}. Important options include -l to set the classification level (default 'S' for species; other levels include genus, family, etc.) and -t the read-count threshold (default 10), which prevents very low-count taxa from receiving redistributed reads. The algorithm reconstructs the expected number of reads per taxon using the precomputed k-mer-to-read distributions and the Kraken report counts, then outputs a .bracken abundance file. By default Bracken will also emit a new kraken-style report with redistributed reads included, where levels below the estimate-level are omitted, taxa under the threshold are excluded, percentages are recalculated, and unclassified reads are not included. The project includes utility scripts to generate the k-mer distribution from a full sweep of the database (kmer2read_distr and generate_kmer_distribution.py) for users who wish to run the more granular, manual workflow. Bracken is widely used in metagenomics pipelines where accurate taxon abundance matters: microbiome profiling, environmental sequencing surveys, pathogen detection in clinical metagenomics, spike-in quantification, and benchmarking/validation of classification databases. It integrates seamlessly with Kraken outputs and accepts Kraken, Kraken2, or KrakenUniq reports (note: Bracken requires the default Kraken report format and is not compatible with mpa-style reports). Practical considerations: build Bracken files for each read length you plan to analyze (the script will skip already-generated files), avoid placing multiple Bracken databases in the same folder (files can be overwritten), and if Kraken executables are not on PATH, point bracken-build to the installation with -x or specify kraken type with -y. The package is open-source (GPL) and maintained by Jennifer Lu and colleagues at Johns Hopkins; release notes document fixes (e.g., KrakenUniq compatibility and redistribution bug fixes) and performance improvements across versions.