DeepVariant

Background and approach DeepVariant is an open-source variant-calling pipeline developed by Google that applies modern deep learning to the problem of detecting small germline variants. Instead of rule-based heuristics, DeepVariant converts aligned sequencing reads into localized “pileup” image-like tensors, classifies each candidate site with a convolutional neural network (TensorFlow), and reports genotypes and variant confidence in standard VCF or gVCF files. The project also provides DeepTrio (trio-aware calling) and somatic adaptations; the core software relies on the Nucleus library (Python/C++) for reading/writing genomics file formats and for integration with TensorFlow models. Capabilities and supported data DeepVariant delivers state-of-the-art single-sample germline calling for diploid organisms and maintains high SNP/indel accuracy across multiple technologies and prep methods. Official models are trained on human data and there are case studies and supported model types for whole genome (WGS), whole exome (WES), PacBio HiFi, Oxford Nanopore (ONT_R104), and hybrid PacBio+Illumina inputs. It produces VCF/gVCF outputs ready for downstream annotation and joint genotyping; for cohort and scalable multi-sample workflows the project documents best practices using GLnexus. DeepTrio extends DeepVariant to trio or duo calling by leveraging parental data to improve child genotype calls. DeepVariant also provides an adapted somatic caller in some branches. Typical workflows and example use cases A typical DeepVariant run follows three logical stages: make_examples (convert aligned BAM/CRAM into pileup tensors and candidate loci), call_variants (CNN model inference that classifies tensors), and postprocess_variants (translate predictions into VCF/gVCF). The project recommends using the official Docker image for reproducible deployments; prebuilt binaries are also available. Example use cases include: high-accuracy single-sample germline calling for research WGS/WES, trio analyses using DeepTrio, hybrid workflows that combine short- and long-read data for difficult regions, and cohort-level joint genotyping via DeepVariant + GLnexus. The pipeline supports running on CPUs, GPUs and TPUs; there is an experimental mode that runs make_examples on CPU while call_variants runs on GPU to speed throughput and reduce wall-clock time. Deployment, integrations and practical notes DeepVariant is distributed via Docker (recommended), prebuilt binaries (gs://deepvariant), and can be built from source on Unix-like systems (Ubuntu build scripts provided). It integrates with standard genomics inputs (BAM/CRAM) and outputs (VCF/gVCF) so it fits into established pipelines and annotation tools. Key runtime flags include --model_type (WGS, WES, PACBIO, ONT_R104, HYBRID_PACBIO_ILLUMINA), --num_shards to parallelize make_examples, --vcf_stats_report for HTML reports, and haploid-related flags for X/Y handling and PAR beds. For cohort calling, follow the DeepVariant + GLnexus best-practices guide. The maintainers document approximate cloud costs (example: a single non-preemptible n1-standard-16 on Google Cloud ~ $11.80 to call a 30× WGS; preemptible VMs reduce that cost substantially) and provide metrics for runtime and accuracy across supported datatypes. Limitations and best practices Models included with DeepVariant are primarily trained on human data and the supported genotypes assume diploid ploidy/copy-number of two; non-human or polyploid use requires caution or retraining. The software is intended for research use and explicitly not a clinical/medical device. Binaries are compiled with SSE4/AVX optimizations; building/running on unsupported CPUs may fail. For reproducible results, use the recommended Docker image or the documented prebuilt binaries and follow the official Quick Start and detailed usage guides. Community support is available through the GitHub repository, issues, and external forums; the project encourages contributions but merges are coordinated by the maintainers. If you need to understand DeepVariant’s internal representations, refer to the “Looking through DeepVariant’s Eyes” blog post and the project documentation for best practices, model choices, and tuning tips.

Links