BEDTools

BEDTools is a mature, command-line toolkit designed to perform “genome arithmetic”: set-theory style operations on genomic intervals. It provides a large collection of small, composable utilities that let you intersect, merge, count, complement, shuffle, and otherwise compare and manipulate interval data from standard genomics formats (BED, BAM, GFF/GTF, VCF and CRAM). Because each tool does one relatively simple job, complex analyses are normally built by piping several bedtools commands together on the UNIX command line or from scripts. Core capabilities include intersecting interval sets (find overlaps and reciprocal overlaps), computing coverage and depth across regions, merging adjacent/overlapping intervals, finding closest features, extracting sequence or flanking regions, calculating per-region statistics (e.g., GC content), shuffling regions for empirical null models, and converting between formats. Key commands you will see often are intersect, closest, coverage (or genomecov), merge, complement and shuffle. For example, identifying peaks present in two ChIP-seq experiments (50% reciprocal overlap) can be done with: $ bedtools intersect -a exp1.bed -b exp2.bed -f 0.50 -r and then finding the nearest non-overlapping gene for those peaks with: $ bedtools closest -a both.bed -b genes.bed -io > both.nearest.genes.txt BEDTools is built for integration and reproducibility. It runs as a standard UNIX command-line program, making it easy to embed in shell pipelines, Snakemake or Nextflow workflows, and HPC job scripts. A Python wrapper, pybedtools, provides programmatic access from Python for tighter scripting and interactive use. For reproducible deployment, community-maintained containers (BioContainers/Docker images) are published so you can run consistent versions across workstations, clusters, and cloud environments. Performance and input expectations: bedtools scales especially well when input files are pre-sorted by chromosome and start position; many operations offer a -sorted option that dramatically reduces memory and runtime. There are practical constraints to be aware of: BEDTools expects TAB-delimited, UNIX line endings (except for BAM/CRAM which are binary), requires consistent chromosome naming between files (e.g., "chr1" vs "1" will not match), and historically imposes a chromosome-size limit (documents note an upper bound around 512 Mb for certain operations). CRAM support is available (since recent versions) but requires you to set the CRAM_REFERENCE environment variable so bedtools can access the associated reference sequence. Bedtools is widely used in genomics—common use-cases include coverage analysis for targeted capture, identifying regions with missing coverage, comparing DNase/ATAC/ChIP-seq peak sets across cell types, extracting promoter or exon sequences, calculating GC content across exons, and preparing combined annotation tables (e.g., ChromHMM track matrices). The project is community maintained with many contributors and is well-cited in the literature; users are encouraged to consult the online documentation and tutorials for advanced examples and recommended workflows. By design BEDTools is modular, portable, and integrates with existing bioinformatics ecosystems: use it directly on the command line, call it from Python via pybedtools, include it in containerized pipelines via BioContainers, or chain it with other tools that consume common genomics formats. Its small, composable utilities make it a dependable backbone for interval-oriented analyses in variant calling, epigenomics, transcriptomics and other sequencing-driven workflows.

Links